机器学习基本库学习

前言

之前做过将近一年的python爬虫，python语言基础还是有的。眼下机器学习如日中天，项目中有用到机器学习对采集的内容进行分类的功能，闲暇之际学习一下相关的库，以期待money++

简介

numpy
python科学计算基础库
matplotlib
Matplotlib 是一个 Python 的 2D绘图库，它以各种硬拷贝格式和跨平台的交互式环境生成出版质量级别的图形
pandas
python数据分析库

NumPy基本知识点

数组输出设置

set_printoptions(precision=None, threshold=None, edgeitems=None, linewidth=None, suppress=None, nanstr=None, infstr=None, formatter=None)

precision 输出浮点数精度设置，默认为8
threshold 触发输出总结的阈值，可以设置为np.inf全部打印出来

切片来源

python序列切片地址可以写为[开始：结束：步长]，其中的开始、结束和步长都可以省略

range(10) =>[0,1,2,3,4,5,6,7,8,9]
开始start省略时，默认从第0项开始 range(10)[:10:2] => [0,2,4,6,8]
结尾省略的时候，默认到数组最后 range(10)[1::2] => [1,3,5,7,9]
开始和结尾不省略的时候,step缺省为1 range(10)[2:6:] => [2,3,4,5]
步长step=n;代表从start开始（start也算）每隔step间隔，取一个数，一直到结尾end range(20)[::3] => [0,3,6,9,12,15,18]
当step等于负数的时候，从右向左取数 range(10)[::-1] => [9,8,7,6,5,4,3,2,1,0]; range(10)[::-2] => [9,7,5,3,1]
a[start:end]：取得范围为[start,end)
a[:end]：取得范围为[0,end)
a[start:]：取得范围为从start到最后（包括最后）

多维切片

In [1]: import numpy as npIn [2]: a = np.arange(25).reshape((5,5))In [3]: a
Out[3]: 
array([[ 0,  1,  2,  3,  4],[ 5,  6,  7,  8,  9],[10, 11, 12, 13, 14],[15, 16, 17, 18, 19],[20, 21, 22, 23, 24]])# 各个维度分别取，冒号表示取这个维度的所有In [4]: a[:,2:5]
Out[4]: 
array([[ 2,  3,  4],[ 7,  8,  9],[12, 13, 14],[17, 18, 19],[22, 23, 24]])In [5]: a[:, None].shape
Out[5]: (5, 1, 5)# None代表新增加一个维度，它有一个别称叫newaxis， None放在哪一维，就会在哪一维上出现新的维度In [6]: a[:, None]
Out[6]: 
array([[[ 0,  1,  2,  3,  4]],      [[ 5,  6,  7,  8,  9]],      [[10, 11, 12, 13, 14]],      [[15, 16, 17, 18, 19]],      [[20, 21, 22, 23, 24]]])
In [7]: a[:,:, None].shape
Out[7]: (5, 5, 1)In [8]: a[..., None].shape
Out[8]: (5, 5, 1)

The dots (…) represent as many colons as needed to produce a complete indexing tuple. For example, if x is a rank 5 array (i.e., it has 5 axes), then

x[1,2,…] is equivalent to x[1,2,:,:,:],
x[…,3] to x[:,:,:,:,3]
x[4,…,5,:] to x[4,:,:,5,:].

>>> c = np.array( [[[  0,  1,  2],               # a 3D array (two stacked 2D arrays)
...                 [ 10, 12, 13]],
...                [[100,101,102],
...                 [110,112,113]]])
>>> c.shape
(2, 2, 3)
>>> c[1,...]                                   # same as c[1,:,:] or c[1]
array([[100, 101, 102],[110, 112, 113]])
>>> c[...,2]                                   # same as c[:,:,2]
array([[  2,  13],[102, 113]])

测试安装

import matplotlib.pyplot as plt
import numpy as npX = np.linspace(-np.pi,np.pi,256,endpoint=True)
(C,S)=np.cos(X),np.sin(X)#这里用到了Matplotlib和numpy模块,linspace在(−π,π)之间分成共256个小段，
#并把这256个值赋予X。C,S分别是cosine和sine值（X,C,S都是numpy数组）
plt.plot(X,C)
plt.plot(X,S)#进行显示
plt.show()