knn机器学习算法
Goal: To classify a query point (with 2 features) using training data of 2 classes using KNN.
目标:使用KNN使用2类的训练数据对查询点(具有2个要素)进行分类。
K最近邻居(KNN) (K- Nearest Neighbor (KNN))
KNN is a basic machine learning algorithm that can be used for both classifications as well as regression problems but has limited uses as a regression problem. So, we would discuss classification problems only.
KNN是一种基本的机器学习算法,可用于分类和回归问题,但作为回归问题用途有限。 因此,我们仅讨论分类问题。
It involves finding the distance of a query point with the training points in the training datasets. Sorting the distances and picking k points with the least distance. Then check which class these k points belong to and the class with maximum appearance is the predicted class.
它涉及在训练数据集中找到查询点与训练点之间的距离。 排序距离并选择距离最小的k个点。 然后检查这k个点属于哪个类别,并且外观最大的类别是预测的类别。
Red and green are two classes here, and we have to predict the class of star point. So, from the image, it is clear that the points of the red class are much closer than points of green class so the class prediction will be red for this point.
红色和绿色是这里的两个类别,我们必须预测星点的类别。 因此,从图像中可以明显看出,红色类别的点比绿色类别的点近得多,因此该类别的预测将是红色。
We will generally work on the matrix, and make use of "numpy" libraries to evaluate this Euclid’s distance.
通常,我们将在矩阵上工作,并使用“ numpy”库来评估该Euclid的距离。
Algorithm:
算法:
STEP 1: Take the distance of a query point or a query reading from all the training points in the training dataset.
步骤1:从训练数据集中的所有训练点获取查询点或查询读数的距离。
STEP 2: Sort the distance in increasing order and pick the k points with the least distance.
步骤2:按递增顺序对距离进行排序,并选择距离最小的k个点。
STEP 3: Check the majority of class in these k points.
步骤3:在这k点中检查大部分班级。
STEP 4: Class with the maximum majority is the predicted class of the point.
步骤4:具有最大多数的类别是该点的预测类别。
Note: In the code, we have taken only two features for a better explanation but the code works for N features also just you have to generate training data of n features and a query point of n features. Further, I have used numpy to generate two feature data.
注:在代码中,我们采取了只有两个功能,一个更好的解释,但该代码适用于N个特征也只是你要生成的n个特征和n个特征查询点的训练数据。 此外,我使用numpy生成了两个特征数据。
Python Code
Python代码
import numpy as np
def distance(v1, v2):
# Eucledian
return np.sqrt(((v1-v2)**2).sum())
def knn(train, test, k=5):
dist = []
for i in range(train.shape[0]):
# Get the vector and label
ix = train[i, :-1]
iy = train[i, -1]
# Compute the distance from test point
d = distance(test, ix)
dist.append([d, iy])
# Sort based on distance and get top k
dk = sorted(dist, key=lambda x: x[0])[:k]
# Retrieve only the labels
labels = np.array(dk)[:, -1]
# Get frequencies of each label
output = np.unique(labels, return_counts=True)
# Find max frequency and corresponding label
index = np.argmax(output[1])
return output[0][index]
# monkey_data && chimp data
# Data has 2 features
monkey_data = np.random.multivariate_normal([1.0,2.0],[[1.5,0.5],[0.5,1]],1000)
chimp_data = np.random.multivariate_normal([4.0,4.0],[[1,0],[0,1.8]],1000)
data = np.zeros((2000,3))
data[:1000,:-1] = monkey_data
data[1000:,:-1] = chimp_data
data[1000:,-1] = 1
label_to_class = {1:'chimp', 0 : 'monkey'}
## query point for the check
print("Enter the 1st feature")
x = input()
print("Enter the 2nd feature")
y = input()
x = float(x)
y = float(y)
query = np.array([x,y])
ans = knn(data, query)
print("the predicted class for the points is {}".format(label_to_class[ans]))
Output
输出量
Enter the 1st feature
3
Enter the 2nd feature
2
the predicted class for the points is chimp
翻译自: https://www.includehelp.com/ml-ai/k-nearest-neighbors-knn-algorithm.aspx
knn机器学习算法