在机器学习中,往往需要归一化数据集,下面的公式可以把数据归一化到0~1区间:
newvalue = (oldvalue - min)/(max - min)
python实现的代码如下:
def autoNorm(dataSet):minVals = dataSet.min(0) # 取每一列的最小值maxVals = dataSet.max(0) # 取每一列的最大值ranges = maxVals - minValsnormDataSet = np.zeros(np.shape(dataSet))m = dataSet.shape[0]normDataSet = dataSet - np.tile(minVals, (m, 1))normDataSet = normDataSet/np.tile(ranges, (m, 1)) return normDataSet, ranges, minVals
例子:
import numpy as npgroup = np.array([[1, 2], [1, 3], [2, 2], [2, 3]])
newgroup, _, _ = autoNorm(group)
print(newgroup)# 输出:
[[0. 0.][0. 1.][1. 0.][1. 1.]]