本次分享官网教程地址
https://mlr3book.mlr-org.com/chapters/chapter4/hyperparameter_optimization.html
型调优 当你对你的模型表现不满意时,你可能希望调高你的模型表现,可通过超参数调整或者尝试一个更加适合你的模型,本篇将介绍这些操作。本章主要包括3个部分的内容:超参数调整机器学习模型都有默认的超参数,但是这些超参数不能根据数据自动调整,往往不能得到更好的性能表现。但是手动调整往往也不能获得最佳的表现,mlr3包含自动调参的策略,在此包中实现自动调参,需要指定:搜索空间(search_space),优化算法(调参方法),评估方法(重抽样策略),评价指标。特征选择主要是通过mlr3filter和mlr3select包进行。嵌套重抽样调整超参数 很多人戏称调参的过程就像是"炼丹"!确实差不多,而且很多时候你调整后的结果可能还不如默认的结果好!这就好比打游戏,“一顿操作猛如虎,一看战绩0比5”!模型调优一定要基于对算法和数据的理解进行,不是随便调的。我们使用著名的糖尿病数据集进行演示,首先创建任务library(mlr3verse) ## 载入需要的程辑包:mlr3 task <- tsk("pima") print(task) ## <TaskClassif:pima> (768 x 9) ## * Target: diabetes ## * Properties: twoclass ## * Features (8): ## - dbl (8): age, glucose, insulin, mass, pedigree, pregnant, pressure, ## triceps选择算法,查看算法支持的超参数learner <- lrn("classif.rpart") learner$param_set ## <ParamSet> ## id class lower upper nlevels default value ## 1: cp ParamDbl 0 1 Inf 0.01 ## 2: keep_model ParamLgl NA NA 2 FALSE ## 3: maxcompete ParamInt 0 Inf Inf 4 ## 4: maxdepth ParamInt 1 30 30 30 ## 5: maxsurrogate ParamInt 0 Inf Inf 5 ## 6: minbucket ParamInt 1 Inf Inf <NoDefault[3]> ## 7: minsplit ParamInt 1 Inf Inf 20 ## 8: surrogatestyle ParamInt 0 1 2 0 ## 9: usesurrogate ParamInt 0 2 3 2 ## 10: xval ParamInt 0 Inf Inf 10 0 1 在这里我们选择调整复杂度参数cp和最小分支参数minsplit,并设定超参数的调整范围:search_space <- ps(cp = p_dbl(lower = 0.001, upper = 0.1),minsplit = p_int(lower = 1, upper = 10) ) search_space ## <ParamSet> ## id class lower upper nlevels default value ## 1: cp ParamDbl 0.001 0.1 Inf <NoDefault[3]> ## 2: minsplit ParamInt 1.000 10.0 10 <NoDefault[3]>然后选择重抽样方法和性能指标hout <- rsmp("holdout", ratio = 0.7) measure <- msr("classif.ce") 1 2 接下来进行调参有两种方法。方法一:通过tuninginstancesinglecrite和tuner训练模型 library(mlr3tuning) ## 载入需要的程辑包:paradoxevals20 <- trm("evals", n_evals = 20) # 设定何时停止训练# 统一放入instance中 instance <- TuningInstanceSingleCrit$new(task = task,learner = learner,resampling = hout,measure = measure,terminator = evals20,search_space = search_space ) instance ## <TuningInstanceSingleCrit> ## * State: Not optimized ## * Objective: <ObjectiveTuning:classif.rpart_on_pima> ## * Search Space: ## <ParamSet> ## id class lower upper nlevels default value ## 1: cp ParamDbl 0.001 0.1 Inf <NoDefault[3]> ## 2: minsplit ParamInt 1.000 10.0 10 <NoDefault[3]> ## * Terminator: <TerminatorEvals> ## * Terminated: FALSE ## * Archive: ## <ArchiveTuning> ## Null data.table (0 rows and 0 cols)关于何时停止训练,mlr3给出了5种方法:Terminate after a given time:一定时间后停止 Terninate after a given number of iterations:特定迭代次数后停止 Terminate after a specific performance has been reached:达到特定性能指标后停止 Terminate when tuning dose find a better configuration for a given number of iterations:在给定迭代次数中确实找到表现很好的参数组合后停止 A combination of above in ALL or ANY fashon:上面几种方法组合 然后还需要设置超参数搜索的方法:mlr3tuning目前支持以下超参数搜索的方法:Grid search:网格搜索 Random search:随机搜索 Generalized simulated annealing Non-Linear optimization # 这里选择网格搜索 tuner <- tnr("grid_search", resolution = 5) # 网格搜索 1 2 接下来就是进行训练模型,上面我们设置了网格搜索的分辨率是5,我们有2个超参数需要调整,所以理论上一共有5 * 5 = 25个组合,但是在前面的停止搜索的方法中我们选择了n_evals = 20,所有实际上在评价完20个组合后就会停止了!#lgr::get_logger("mlr3")$set_threshold("warn") #lgr::get_logger("bbotk")$set_threshold("warn") # 减少屏幕打印内容tuner$optimize(instance) ## INFO [20:51:28.312] [bbotk] Starting to optimize 2 parameter(s) with '<TunerGridSearch>' and '<TerminatorEvals> [n_evals=20, k=0]' ## INFO [20:51:28.331] [bbotk] Evaluating 1 configuration(s) ## 省略输出 ## INFO [20:51:29.306] [bbotk] uhash ## INFO [20:51:29.306] [bbotk] 58eb421d-f0ed-4246-8430-3c1832ae615c ## INFO [20:51:29.309] [bbotk] Finished optimizing after 20 evaluation(s) ## INFO [20:51:29.310] [bbotk] Result: ## INFO [20:51:29.310] [bbotk] cp minsplit learner_param_vals x_domain classif.ce ## INFO [20:51:29.310] [bbotk] 0.02575 3 <list[3]> <list[2]> 0.2130435 ## cp minsplit learner_param_vals x_domain classif.ce ## 1: 0.02575 3 <list[3]> <list[2]> 0.2130435查看调整好的超参数:instance$result_learner_param_vals ## $xval ## [1] 0 ## ## $cp ## [1] 0.02575 ## ## $minsplit ## [1] 3查看模型性能: instance$result_y ## classif.ce ## 0.2130435 1查看每一次迭代的结果,只有20个:instance$archive ## <ArchiveTuning> ## cp minsplit classif.ce runtime_learners timestamp batch_nr ## 1: 0.026 3 0.21 0.02 2022-02-27 20:51:28 1 ## 2: 0.075 8 0.21 0.00 2022-02-27 20:51:28 2 ## 3: 0.050 5 0.21 0.00 2022-02-27 20:51:28 3 ## 4: 0.001 1 0.30 0.00 2022-02-27 20:51:28 4 ## 5: 0.100 3 0.21 0.02 2022-02-27 20:51:28 5 ## 6: 0.026 5 0.21 0.02 2022-02-27 20:51:28 6 ## 7: 0.100 8 0.21 0.01 2022-02-27 20:51:28 7 ## 8: 0.001 8 0.27 0.00 2022-02-27 20:51:28 8 ## 9: 0.001 5 0.28 0.00 2022-02-27 20:51:28 9 ## 10: 0.100 5 0.21 0.02 2022-02-27 20:51:28 10 ## 11: 0.075 10 0.21 0.00 2022-02-27 20:51:28 11 ## 12: 0.050 10 0.21 0.01 2022-02-27 20:51:28 12 ## 13: 0.075 5 0.21 0.00 2022-02-27 20:51:28 13 ## 14: 0.050 8 0.21 0.01 2022-02-27 20:51:29 14 ## 15: 0.001 10 0.26 0.00 2022-02-27 20:51:29 15 ## 16: 0.050 3 0.21 0.00 2022-02-27 20:51:29 16 ## 17: 0.050 1 0.21 0.02 2022-02-27 20:51:29 17 ## 18: 0.100 10 0.21 0.00 2022-02-27 20:51:29 18 ## 19: 0.075 1 0.21 0.01 2022-02-27 20:51:29 19 ## 20: 0.026 1 0.21 0.00 2022-02-27 20:51:29 20 ## warnings errors resample_result ## 1: 0 0 <ResampleResult[22]> ## 2: 0 0 <ResampleResult[22]> ## 3: 0 0 <ResampleResult[22]> ## 4: 0 0 <ResampleResult[22]> ## 5: 0 0 <ResampleResult[22]> ## 6: 0 0 <ResampleResult[22]> ## 7: 0 0 <ResampleResult[22]> ## 8: 0 0 <ResampleResult[22]> ## 9: 0 0 <ResampleResult[22]> ## 10: 0 0 <ResampleResult[22]> ## 11: 0 0 <ResampleResult[22]> ## 12: 0 0 <ResampleResult[22]> ## 13: 0 0 <ResampleResult[22]> ## 14: 0 0 <ResampleResult[22]> ## 15: 0 0 <ResampleResult[22]> ## 16: 0 0 <ResampleResult[22]> ## 17: 0 0 <ResampleResult[22]> ## 18: 0 0 <ResampleResult[22]> ## 19: 0 0 <ResampleResult[22]> ## 20: 0 0 <ResampleResult[22]>接下来就可以把训练好的超参数应用于模型,重新应用于数据:learner$param_set$values <- instance$result_learner_param_vals learner$train(task) 1 2 这个训练好的模型就可以用于预测了,使用learner$predict()即可!以上步骤写起来有些复杂,与tidymodels相比不够简洁好理解,我刚开始学习的时候经常记不住,后来版本更新后终于有了简便写法:instance <- tune(task = task,learner = learner,resampling = hout,measure = measure,search_space = search_space,method = "grid_search",resolution = 5,term_evals = 25 ) ## INFO [20:51:29.402] [bbotk] Starting to optimize 2 parameter(s) with '<TunerGridSearch>' and '<TerminatorEvals> [n_evals=25, k=0]' ## INFO [20:51:29.403] [bbotk] Evaluating 1 configuration(s) ## INFO [20:51:29.411] [mlr3] Running benchmark with 1 resampling iterations ## 省略。。。 ## INFO [20:51:30.535] [bbotk] 0.02575 10 <list[3]> <list[2]> 0.2347826instance$result_learner_param_vals ## $xval ## [1] 0 ## ## $cp ## [1] 0.02575 ## ## $minsplit ## [1] 10 instance$result_y ## classif.ce ## 0.2347826 learner$param_set$values <- instance$result_learner_param_vals learner$train(task)mlr3也支持同时设定多个性能指标:measures <- msrs(c("classif.ce","time_train")) # 设定多个评价指标evals20 <- trm("evals", n_evals = 20)instance <- TuningInstanceMultiCrit$new(task = task,learner = learner,resampling = hout,measures = measures,search_space = search_space,terminator = evals20 )tuner$optimize(instance) ## INFO [20:51:30.595] [bbotk] Starting to optimize 2 parameter(s) with '<TunerGridSearch>' and '<TerminatorEvals> [n_evals=20, k=0]' ## INFO [20:51:30.597] [bbotk] Evaluating 1 configuration(s) ## 省略输出。。。查看结果:instance$result_learner_param_vals ## [[1]] ## [[1]]$xval ## [1] 0 ## ## [[1]]$cp ## [1] 0.0505 ## ## [[1]]$minsplit ## [1] 1 ## ## ## [[2]] ## [[2]]$xval ## [1] 0 ## ## [[2]]$cp ## [1] 0.07525 ## ## [[2]]$minsplit ## [1] 1 ## ## ## [[3]] ## [[3]]$xval ## [1] 0 ## ## [[3]]$cp ## [1] 0.07525 ## ## [[3]]$minsplit ## [1] 10 ## ## ## [[4]] ## [[4]]$xval ## [1] 0 ## ## [[4]]$cp ## [1] 0.1 ## ## [[4]]$minsplit ## [1] 8 ## ## ## [[5]] ## [[5]]$xval ## [1] 0 ## ## [[5]]$cp ## [1] 0.02575 ## ## [[5]]$minsplit ## [1] 3 ## ## ## [[6]] ## [[6]]$xval ## [1] 0 ## ## [[6]]$cp ## [1] 0.07525 ## ## [[6]]$minsplit ## [1] 8 ## ## ## [[7]] ## [[7]]$xval ## [1] 0 ## ## [[7]]$cp ## [1] 0.1 ## ## [[7]]$minsplit ## [1] 3 ## ## ## [[8]] ## [[8]]$xval ## [1] 0 ## ## [[8]]$cp ## [1] 0.1 ## ## [[8]]$minsplit ## [1] 5 ## ## ## [[9]] ## [[9]]$xval ## [1] 0 ## ## [[9]]$cp ## [1] 0.02575 ## ## [[9]]$minsplit ## [1] 5 ## ## ## [[10]] ## [[10]]$xval ## [1] 0 ## ## [[10]]$cp ## [1] 0.07525 ## ## [[10]]$minsplit ## [1] 5 ## ## ## [[11]] ## [[11]]$xval ## [1] 0 ## ## [[11]]$cp ## [1] 0.0505 ## ## [[11]]$minsplit ## [1] 8 ## ## ## [[12]] ## [[12]]$xval ## [1] 0 ## ## [[12]]$cp ## [1] 0.0505 ## ## [[12]]$minsplit ## [1] 3 ## ## ## [[13]] ## [[13]]$xval ## [1] 0 ## ## [[13]]$cp ## [1] 0.07525 ## ## [[13]]$minsplit ## [1] 3 ## ## ## [[14]] ## [[14]]$xval ## [1] 0 ## ## [[14]]$cp ## [1] 0.0505 ## ## [[14]]$minsplit ## [1] 5 ## ## ## [[15]] ## [[15]]$xval ## [1] 0 ## ## [[15]]$cp ## [1] 0.02575 ## ## [[15]]$minsplit ## [1] 1 instance$rusult_y ## NULL以上就是第一种方法,接下来介绍第二种方法。方法二:通过autotuner训练模型 这种方式方法把调整参数、将调整好的参数应用于模型放到一起了,但是也需要提前设定好各种需要的参数。task <- tsk("pima") # 创建任务leanrer <- lrn("classif.rpart") # 选择学习器search_space <- ps(cp = p_dbl(0.001, 0.1),minsplit = p_int(1,10) ) # 设定搜索范围terminator <- trm("evals", n_evals = 10) # 设定停止标志tuner <- tnr("random_search") # 选择搜索方法resampling <- rsmp("holdout") # 选择重抽样方法measure <- msr("classif.acc") # 选择评价指标# 训练 at <- AutoTuner$new(learner = learner,resampling = resampling,search_space = search_space,measure = measure,tuner = tuner,terminator = terminator )自动选择最优参数并作用于数据:at$train(task) ## INFO [20:51:31.873] [bbotk] Starting to optimize 2 parameter(s) with '<OptimizerRandomSearch>' and '<TerminatorEvals> [n_evals=10, k=0]' ## INFO [20:51:31.882] [bbotk] Evaluating 1 configuration(s) ##省略巨多输出 ## INFO [20:51:32.332] [bbotk] 0.02278977 3 <list[3]> <list[2]> 0.7695312 at$predict(task) ## <PredictionClassif> for 768 observations: ## row_ids truth response ## 1 pos pos ## 2 neg neg ## 3 pos neg ## --- ## 766 neg neg ## 767 pos neg ## 768 neg neg这个方法也有个简便写法:auto_learner <- auto_tuner(learner = learner,resampling = resampling,measure = measure,search_space = search_space,method = "random_search",term_evals = 10 )auto_learner$train(task) ## INFO [20:51:32.407] [bbotk] Starting to optimize 2 parameter(s) with '<OptimizerRandomSearch>' and '<TerminatorEvals> [n_evals=10, k=0]' ## INFO [20:51:32.414] [bbotk] Evaluating 1 configuration(s) ## INFO [20:51:32.421] [mlr3] Running benchmark with 1 resampling iterations ## INFO [20:51:32.425] [mlr3] Applying learner 'classif.rpart' on task 'pima' (iter 1/1) ##省略巨多输出 auto_learner$predict(task) ## <PredictionClassif> for 768 observations: ## row_ids truth response ## 1 pos pos ## 2 neg neg ## 3 pos neg ## --- ## 766 neg neg ## 767 pos neg ## 768 neg neg超参数设定的方法 每次单独设置超参数的范围等可能会显得比较笨重无聊,mlr3也提供另外一种可以在选择学习器时进行设定超参数的方法。# 在选择学习器时设置超参数范围 learner <- lrn("classif.svm") learner$param_set$values$kernel <- "polynomial" learner$param_set$values$degree <- to_tune(lower = 1, upper = 3)print(learner$param_set$search_space()) ## <ParamSet> ## id class lower upper nlevels default value ## 1: degree ParamInt 1 3 3 <NoDefault[3]>但其实这样也有问题,这个方法要求你对算法很熟悉,能够记住所有超参数记忆它们在mlr3中的拼写!但很显然这有点困难,所有我还是推荐第一种,每次单独设置,记不住还可以查看一下具体的超参数。参数依赖 某些超参数只有在某些条件下才有效,比如支持向量机(SVM),它的degree参数只有在kernel是polynomial时才有效,这种情况也可以在mlr3中设置好。library(data.table) search_space = ps(cost = p_dbl(-1, 1, trafo = function(x) 10^x), # 可进行数据变换kernel = p_fct(c("polynomial", "radial")),degree = p_int(1, 3, depends = kernel == "polynomial") # 设置参数依赖 ) rbindlist(generate_design_grid(search_space, 3)$transpose(), fill = TRUE) ## cost kernel degree ## 1: 0.1 polynomial 1 ## 2: 0.1 polynomial 2 ## 3: 0.1 polynomial 3 ## 4: 0.1 radial NA ## 5: 1.0 polynomial 1 ## 6: 1.0 polynomial 2 ## 7: 1.0 polynomial 3 ## 8: 1.0 radial NA ## 9: 10.0 polynomial 1 ## 10: 10.0 polynomial 2 ## 11: 10.0 polynomial 3 ## 12: 10.0 radial NA
超参数设置
超参数设置是通过paradox
包完成的。
reference-based objects
paradox
是ParamHelpers
的重写版,完全基于R6对象。
library("paradox")ps = ParamSet$new() ps2 = ps ps3 = ps$clone(deep = TRUE) print(ps) # ps2和ps3是一样的 ## <ParamSet> ## Empty.
ps$add(ParamLgl$new("a")) print(ps) ## <ParamSet> ## id class lower upper nlevels default value ## 1: a ParamLgl NA NA 2 <NoDefault[3]>
设定参数范围(parameter space)
paradox包里面的超参数主要有以下类型:
ParamInt: 整数
ParamDbl: 浮点数(小数)
ParamFct: 因子
ParamLgl: 逻辑值,TRUE / FALSE
ParamUty: 能取代任意值的参数
设定超参数范围的完整写法(前面几篇用到的是简写):
library("paradox")
parA = ParamLgl$new(id = "A")
parB = ParamInt$new(id = "B", lower = 0, upper = 10, tags = c("tag1", "tag2"))
parC = ParamDbl$new(id = "C", lower = 0, upper = 4, special_vals = list(NULL))
parD = ParamFct$new(id = "D", levels = c("x", "y", "z"), default = "y")
parE = ParamUty$new(id = "E", custom_check = function(x) checkmate::checkFunction(x))