Jim Rohn once said, “you are the average of the five people you spend the most time with”. Jim said this from the perspective of an American entrepreneur. Lots of software engineers would agree with him since he is also talking about one of the best known algorithm in machine learning: k-nearest neighbors (abbreviated as kNN).
Let’s see a very simple example.
In the figure above, we are the green point, we have friends that are either blue or red, and we choose to be blue or red by following our closest friends. Although Jim said that we are the average of 3 best friends, 3 is not a fixed number in kNN algorithm. We can be an average of any odd number of friends.
Let’s say, we are the average of 3 friends. Among the 3 closest friends, we have 2 red friends and 1 blue friend, and by voting we choose to be red.
How about we choose to be the average of 5 closest friends? We expand our friend circle, and get 2 more blue friends. Now we have 3 blue friends and 2 red friends. By voting, we decide to be blue this time.
Now the idea is quite clear. Usually a classifier has two steps, so let’s explain kNN also in two steps:
train
This step basically puts data into the class without doing any preprocessing
predict
This steps is the juicy part where it finds the k closest neighbors to the input and predict the result basedon the labels of the k closest neighbors.