1.Classification
Orange和sklearn一样,提供了Classification和Regression等机器学习的算法,具体使用如下:
import Orangedata = Orange.data.Table("voting")
lr = Orange.classification.LogisticRegressionLearner()
rf = Orange.classification.RandomForestLearner(n_estimators=100)
res = Orange.evaluation.CrossValidation(data, [lr, rf], k=5)print("Accuracy:", Orange.evaluation.scoring.CA(res))
print("AUC:", Orange.evaluation.scoring.AUC(res))
Learners and Classifiers
分类是由两个类型的对象组成:学习器和分类器。学习器考虑到class-labeled数据并返回一个分类器。给定前三个数据实例,分类器返回预测的分类:
import Orange
data = Orange.data.Table("voting")#学习器
learner = Orange.classification.LogisticRegressionLearner()#返回一个分类器
classifier = learner(data)#查看分类结果
classifier(data[:3])#预测数据
c_values = data.domain.class_var.values
for d in data[5:8]:c = classifier(d)print("{}, originally {}".format(c_values[int(classifier(d)[0])],d.get_class()))#统计错误
x = np.sum(data.Y != classifier(data))
Probabilistic Classification
找到分类器分配给每个类的概率大小。
data = Orange.data.Table("voting")
learner = Orange.classification.LogisticRegressionLearner()
classifier = learner(data)
target_class = 1
print("Probabilities for %s:" % data.domain.class_var.values[target_class])
probabilities = classifier(data, 1)
for p, d in zip(probabilities[5:8], data[5:8]):print(p[target_class], d.get_class())
Cross-Validation
data = Orange.data.Table("titanic")
lr = Orange.classification.LogisticRegressionLearner()
res = Orange.evaluation.CrossValidation(data, [lr], k=5)
print("Accuracy: %.3f" % Orange.evaluation.scoring.CA(res)[0])
print("AUC: %.3f" % Orange.evaluation.scoring.AUC(res)[0])
Handful of Classifiers
Orange包含很多种分类算法,大部分是从sklearn里边打包的过来的,如下:
import Orange
import randomrandom.seed(42)
data = Orange.data.Table("voting")
test = Orange.data.Table(data.domain, random.sample(data, 5))
train = Orange.data.Table(data.domain, [d for d in data if d not in test])tree = Orange.classification.tree.TreeLearner(max_depth=3)
knn = Orange.classification.knn.KNNLearner(n_neighbors=3)
lr = Orange.classification.LogisticRegressionLearner(C=0.1)learners = [tree, knn, lr]
classifiers = [learner(train) for learner in learners]target = 0
print("Probabilities for %s:" % data.domain.class_var.values[target])
print("original class ", " ".join("%-5s" % l.name for l in classifiers))c_values = data.domain.class_var.values
for d in test:print(("{:<15}" + " {:.3f}"*len(classifiers)).format(c_values[int(d.get_class())],*(c(d, 1)[0][target] for c in classifiers)))
2.Regression
回归和分类器相似,有一个学习器和回归器(回归模型),回归的学习器是接收数据并返回回归器,回归器是预测连续class的值。
import Orangedata = Orange.data.Table("housing")
learner = Orange.regression.LinearRegressionLearner()
model = learner(data)print("predicted, observed:")
for d in data[:3]:print("%.1f, %.1f" % (model(d)[0], d.get_class()))
Handful of Regressors
建立回归树模型:
data = Orange.data.Table("housing")
tree_learner = Orange.regression.SimpleTreeLearner(max_depth=2)
tree = tree_learner(data)
#输出树结构
print(tree.to_string())random.seed(42)
test = Orange.data.Table(data.domain, random.sample(data, 5))
train = Orange.data.Table(data.domain, [d for d in data if d not in test])lin = Orange.regression.linear.LinearRegressionLearner()
rf = Orange.regression.random_forest.RandomForestRegressionLearner()
rf.name = "rf"
ridge = Orange.regression.RidgeRegressionLearner()learners = [lin, rf, ridge]
regressors = [learner(train) for learner in learners]print("y ", " ".join("%5s" % l.name for l in regressors))for d in test:print(("{:<5}" + " {:5.1f}"*len(regressors)).format(d.get_class(),*(r(d)[0] for r in regressors)))
Cross Validation
data = Orange.data.Table("housing.tab")lin = Orange.regression.linear.LinearRegressionLearner()
rf = Orange.regression.random_forest.RandomForestRegressionLearner()
rf.name = "rf"
ridge = Orange.regression.RidgeRegressionLearner()
mean = Orange.regression.MeanLearner()learners = [lin, rf, ridge, mean]res = Orange.evaluation.CrossValidation(data, learners, k=5)
rmse = Orange.evaluation.RMSE(res)
r2 = Orange.evaluation.R2(res)print("Learner RMSE R2")
for i in range(len(learners)):print("{:8s} {:.2f} {:5.2f}".format(learners[i].name, rmse[i], r2[i]))