操作流程
- 下载鸢尾花数据集
- 导入需要的包
- 读取数据并查看数据大小和长度
- 划分训练集和测试集
- 使用模型
- 评估算法
下载鸢尾花数据集
链接:https://pan.baidu.com/s/1RzZyXsaiJB3e611itF466Q?pwd=j484
提取码:j484
--来自百度网盘超级会员V1的分享
导入需要的包
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
from sklearn import metrics
查看数据大小和长度
读取五列数据,由于csv文件比sklearn鸢尾花种多了id列,不需要
iris_data=pd.read_csv('iris.csv', usecols=[ 1, 2, 3, 4,5])
查看数据集大小(150行5列)
iris_data.shape
# (150, 5)
数据详情
iris_data.head()
划分训练集和测试集合
载入特征和标签集
X = iris_data[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']]
y = iris_data['species']
划分为x和r(r是结果result的缩写,也就是这里的类别)
from sklearn.model_selection import train_test_splitX_train, X_test, r_train, r_test = train_test_split(X, r, random_state=0)
x划分的结果,一共150行按3:1的比例,X_train是112行四列特征,X_test是这112行的类别,所以后面直接fit(X_train,r_train)即可完成模型训练
print("X_train shape: {}".format(X_train.shape)) # X_train #shape: (112, 4)
print("r_train shape: {}".format(r_train.shape)) # r_train #shape: (112,)
r划分的结果,一共150行按3:1的比例,r_train是38行四列特征,r_test是这38行的类别,所以后面直接fit(x_test,r_test)即可完成测试集
print("X_test shape: {}".format(X_test.shape)) # X_test shape: (38, 4)
print("r_test shape: {}".format(r_test.shape)) # r_test shape: (38,)
调用模型
- 这里使用knn算法
引入knn算法
from sklearn.neighbors import KNeighborsClassifierknn = KNeighborsClassifier(n_neighbors=1)
开始训练
knn.fit(X_train, r_train)
开始用训练好的模型跑测试数据
r_pred = knn.predict(X_test)
print("Test set predictions: \n {}".format(r_pred))
结果
Test set predictions: ['virginica' 'versicolor' 'setosa' 'virginica' 'setosa' 'virginica''setosa' 'versicolor' 'versicolor' 'versicolor' 'virginica' 'versicolor''versicolor' 'versicolor' 'versicolor' 'setosa' 'versicolor' 'versicolor''setosa' 'setosa' 'virginica' 'versicolor' 'setosa' 'setosa' 'virginica''setosa' 'setosa' 'versicolor' 'versicolor' 'setosa' 'virginica''versicolor' 'setosa' 'virginica' 'virginica' 'versicolor' 'setosa''virginica']
模型评估
方法一
print("Test set score: {:.2f}".format(np.mean(y_pred == y_test)))
# Test set score: 0.97
方法二
print('Test set score: {:.2f}'.format(metrics.accuracy_score(r_pred, r_test))) # Test set score: 0.97
方法三
print("Test set score: {:.2f}".format(knn.score(X_test, r_test))) # Test set score: 0.97