Data Mining2 复习笔记6 - Optimization Hyperparameter Tuning

6. Optimization & Hyperparameter Tuning

Why Hyperparameter Tuning?
Many learning algorithms for classification, regression, … Many of those have hyperparameters: k and distance function for k nearest neighbors, splitting and pruning options in decision tree learning, …

But what is their effect?
Hard to tell in general and rules of thumb are rare.

Parameters vs. Hyperparameters
Parameters are learned during training
Typical examples: Coefficients in (linear) regression, Weights in neural networks, …
Training: Find set of of parameters so that objective function is minimized/maximized (on a holdout set)

Hyperparameters are fixed before training
Typical examples: Network layout and learning rate in neural networks, k in kNN, …
Training: Find set of of parameters so that objective function is minimized/maximized (on a holdout set), given a previously fixed set of hyperparameters

Hyperparameter Tuning – a Naive Approach

  1. run classification/regression algorithm
  2. look at the results (e.g., accuracy, RMSE, …)
  3. choose different parameter settings, go to 1

Questions: when to stop? how to select the next parameter setting to test?

Hyperparameter Tuning – Avoid Overfitting!
Recap overfitting: classifiers may overadapt to training data. The same holds for parameter settings
Possible danger: finding parameters that work well on the training set but not on the test set
Remedy: train / test / validation split

Example

Example

Example

6.1 Hyperparameter Tuning: Brute Force

Try all parameter combinations that exist → we need a better strategy than brute force!

Hyperparameter tuning is an optimization problem
Finding optimal values for N variables
Properties of the problem:

  • the underlying model is unknown, i.e., we do not know changing a variable will influence the results
  • we can tell how good a solution is when we see it, i.e., by running a classifier with the given parameter set
  • but looking at each solution is costly

Related problem: feature subset selection
Given n features, brute force requires 2^n evaluations
e.g. for 20 features, that is already one million → ten million with cross validation

Knapsack problem
given a maximum weight you can carry and a set of items with different weight and monetary value. Pack those items that maximize the monetary value

Problem is NP hard – i.e., deterministic algorithms require an exponential amount of time
Note: feature subset selection for N features requires 2^n evaluations

Many optimization problems are NP hard
Routing problems (Traveling Salesman Problem)
Integer factorization: hard enough to be used for cryptography
Resource use optimization. e.g., minimizing cutoff waste
Chip design - minimizing chip sizes

Properties of Brute Force search
guaranteed to find the best parameter setting, too slow in most practical cases

6.1.1 Grid Search

performs a brute force search with equal-width steps on non-discrete numerical attributes
(e.g., 10,20,30,…,100)
Hyperparameter with a wide range (e.g., 0.0001 to 1,000,000)
with ten equal-width steps, the first step would be 1,000
but what if the optimum is around 0.1?
logarithmic steps may perform better for some parameters

Needed:
solutions that take less time/computation and often find the best parameter setting or find a near-optimal parameter setting

6.2 Hyperparameter Tuning: One After Another

Given n parameters with m degrees of freedom – brute force takes m^n runs of the base classifier

Simple tweak:

  1. start with default settings
  2. try all options for the first parameter
    2a. fix best setting for first parameter
  3. try all options for the second parameter
    3a. fix best setting for second parameter

This reduces the runtime to n*m
i.e., no longer exponential – but we may miss the best solution

6.2.1 Interaction Effects

Interaction effects make parameter tuning hard. i.e., changing one parameter may change the optimal settings for another one
Example: two parameters p and q, each with values 0,1, and 2 – the table depicts classification accuracy

Example: two parameters p and q, each with values 0,1, and 2. The table depicts classification accuracy. If we try to optimize one parameter by another (first p, then q). We end at p=0,q=0 in six out of nine cases. On average, we investigate 2.3 solutions.
(0.5-local optimum, 0.7-globe optimum)
Example

6.3 Hill climbing with variations

6.3.1 Hill-Climbing Search (greedy local search)

“Like climbing Everest in thick fog with amnesia” always search in the direction of the steepest ascend.
Hill-Climbing Search

Problem

Example

6.3.2 Variations of Hill Climbing Search

  • Stochastic hill climbing
    random selection among the uphill moves
    the selection probability can vary with the steepness of the uphill move
  • First-choice hill climbing
    generating successors randomly until a better one is found, then pick
    that one
  • Random-restart hill climbing
    run hill climbing with different seeds
    tries to avoid getting stuck in local maxima

6.4 Beam search

Local Beam Search
Keep track of k states rather than just one
Start with k randomly generated states
At each iteration, all the successors of all k states are generated
Select the k best successors from the complete list and repeat

6.5 Random search

Grid Search vs. Random Search
All the examples discussed so far use fixed grids
Challenges: some hyperparameters are pretty sensitive
e.g., 0.02 is a good value, but 0 and 0.05 are not – others have little influence
but it is hard to know upfront which
grid search may easily miss best parameters but random search often yields better results

6.6 Genetic Programming

Genetic Algorithms is inspired by evolution:
use a population of individuals (solutions) -> create new individuals by crossover -> introduce random mutations -> from each generation, keep only the best solutions (“survival of the fittest”)
Standard Genetic Algorithm (SGA)

6.6.1 SGA

Basic ingredients:

  • individuals: the solutions
    hyperparameter tuning: a hyperparameter setting
  • a fitness function
    hyperparameter tuning: performance of a hyperparameter setting (i.e., run learner with those parameters)
  • acrossover method
    hyperparameter tuning: create a new setting from two others
  • amutation method
    hyperparameter tuning: change one parameter
  • survivor selection

SGA

Example

Example

Example

Crossover OR Mutation?
Decade long debate: which one is better / necessary …
Answer (at least, rather wide agreement): it depends on the problem, but
in general, it is good to have both – both have another role
mutation-only-EA is possible, crossover-only-EA would not work

Exploration: Discovering promising areas in the search space, i.e. gaining information on the problem
Exploitation: Optimising within a promising area, i.e. using information

There is co-operation AND competition between them
Crossover is explorative, it makes a big jump to an area
somewhere “in between” two (parent) areas
Mutation is exploitative, it creates random small diversions, thereby staying near (in the area of) the parent

Crossover OR Mutation?

Only crossover can combine information from two parents
Remember: sample from entire value range
Only mutation can introduce new information (alleles)
To hit the optimum you often need a ‘lucky’ mutation

6.6.2 Genetic Feature Subset Selection

Feature Subset Selection can also be solved by Genetic Programming
Individuals: feature subsets
Representation: binary – 1 = feature is included; – 0 = feature is not included
Fitness: classification performance
Crossover: combine selections of two subsets
Mutation: flip bits

6.6.3 Selecting a Learner by Meta Learning

So far, we have looked at finding good parameters for a learner – the learner was always fixed
A similar problem is selecting a learner for the task at hand
Again, we could go with search. Another approach is meta learning

Meta Learning i.e., learning about learning
Goal: learn how well a learner will perform on a given dataset features: dataset characteristics, learning algorithm
prediction target: accuracy, RMSE, …

Also known as AutoML
Basic idea: train a regression model

  • data points: individual datasets plus ML approach
  • target: expected accuracy/RMSE etc.

Examples for features: number of instances/attributes, fraction of nominal/numerical attributes, min/max/average entropy of attributes, skewness of classes, …


Recap: search heuristics are good for problems where finding an optimal solution is difficult, evaluating a solution candidate is easy, the search space of possible solutions is large
Possible solution: genetic programming

We have encountered such problems quite frequently
Example: learning an optimal decision tree from data

6.6.4 Genetic Decision Tree Learning

Population: candidate decision trees (initialization: e.g., trained on small subsets of data)
Create new decision trees by means of crossover & mutation
Fitness function: e.g., accuracy
Example

Example

swap can happen in different level, just randomly

Example

6.6.5 Combination of GP with other Learning Methods

Rule Learning (“Learning Classifier Systems”)
Population: set of rule sets (!)
Crossover: combining rules from two sets
Mutation: changing a rule

Artificial Neural Networks
Easiest solution: fixed network layout
The network is then represented as an ordered set (vector) of weights
e.g., [0.8, 0.2, 0.5, 0.1, 0.1, 0.2]
Crossover and mutation are straight forward
Variant: AutoMLP - Searches for best combination of hidden layers and learning rate

请添加图片描述

6.7 Hyperparameter learning

Hyperparameter tuning as a learning problem: Given a set of hyperparameters H, predict performance p of model. The prediction model is referred to as a surrogate model or oracle
Rationale:
Training and evaluating an actual model is costly
Learning and predicting with the surrogate model is fast

Hyperparameter learning

Note:
we want to use not too many runs of the actual model, i.e., the surrogate model will have few training points - use a simple model.
Most well-known: bayesian optimization

Summary: Grid Search, Random Search, Learning hyperparameters / bayesian optimization

Grid search
Inefficient
Fixed grid sizes may miss good parameters (Smaller grid sizes would be even less efficient!)

Random search
Often finds good solutions in less time

Learning hyperparameters / bayesian optimization
Sucessively tests hyperparameters close to local optima
Similar to hill climbing
Difference: explicit surrogate model

6.8 Summary

Summary

Summary

Hyperparameter Tuning: Criticism

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/pingmian/25234.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

力扣1712.将数组分成三个子数组的方案数

力扣1712.将数组分成三个子数组的方案数 确定左边界的值 然后二分求右边界的范围 右边界处的前缀和满足 2*s[i] < s[r] < (s[n] s[i]) / 2 int s[100010];const int N 1e97;class Solution {public:int waysToSplit(vector<int>& nums) {int n nums.siz…

Mac清洁神器CleanMyMac2024一键轻松解决电脑垃圾问题

【CleanMyMac】苹果mac电脑垃圾清理软件 大家好&#xff01;今天我要给大家种草一个超级好用的苹果mac电脑垃圾清理软件&#xff0c;那就是 CleanMyMac。相信很多人都遇到过电脑运行速度变慢、存储空间不足的问题&#xff0c;而这款软件就是解决这些问题的救星&#xff01;让我…

20240605解决飞凌的OK3588-C的核心板刷机原厂buildroot不能连接ADB的问题

20240605解决飞凌的OK3588-C的核心板刷机原厂buildroot不能连接ADB的问题 2024/6/5 13:53 rootrootrootroot-ThinkBook-16-G5-IRH:~/repo_RK3588_Buildroot20240508$ ./build.sh --help rootrootrootroot-ThinkBook-16-G5-IRH:~/repo_RK3588_Buildroot20240508$ ./build.sh lun…

基于I2C协议的OLED显示(利用U82G库)

目录 一、I2C协议的基本原理和时序协议I2C通信协议的原理I2C时序基本单元I2C时序 二、建立工程RCC配置TIM1配置时钟树配置工程配置 三、U8g2移植精简u8g2_d_setup.c精简u8g2_d_memory.c编写移植函数stm32_u8g2.hstm32_u8g2.c 四、实验1.U82G的demo例程2.显示网名昵称中文取模步…

L48---1637. 两点之间不包含任何点的最宽垂直区域(排序)---Java版

1.题目描述 2.思路 &#xff08;1&#xff09;返回两点之间内部不包含任何点的 最宽垂直区域 的宽度。 我的理解是相邻两个点&#xff0c;按照等差数列那样&#xff0c;后一个数减去相邻的前一个数&#xff0c;才能保证两数之间不含其他数字。 &#xff08;2&#xff09;所以&…

c++|unordered系列关联式容器(unordered_set、unordered_map介绍使用+哈希结构)

目录 一、unordered_set的介绍与使用 1.1unordered_set介绍 1.2unordered_set使用 2.2.1构造 2.2.2容量 2.2.3修改 二、unordered_map的介绍与使用 2.1unordered_map介绍 2.2unordered_map使用 2.2.1构造 2.2.2容量 2.2.3修改 三、底层结构(哈希) 3.1哈希概念 3.2哈…

【回调函数】

1.回调函数是什么&#xff1f; 回调函数就是⼀个通过函数指针调用的函数。 如果你把函数的指针&#xff08;地址&#xff09;作为参数传递给另⼀个函数&#xff0c;当这个指针被用来调用其所指向的函数 时&#xff0c;被调用的函数就是回调函数。回调函数不是由该函数的实现方…

【PL理论】(12) F#:模块 | 命名空间 | 异常处理 | 内置异常 |:? | 相互递归函数

&#x1f4ad; 写在前面&#xff1a;本章我们将介绍 F# 的模块&#xff0c;我们前几章讲的列表、集合和映射都是模块。然后我们将介绍 F# 中的异常&#xff0c;以及内置异常&#xff0c;最后再讲解一下相互递归函数。 目录 0x00 F# 模块&#xff08;Module&#xff09; 0x01…

Bootstrap框架集成ECharts教程

最新公司项目要在原有的基础上增加一些饼状图和柱状图来统计一些数据给客户&#xff0c;下面就是集成的一个过程&#xff0c;还是很简单的&#xff0c;分为以下几步 1、引入ECharts的包 2、通过ECharts官网或者菜鸟教程直接拿示例代码过来修修改改直接用就可以了 注意&#xf…

Windows关闭自动更新最有效的方法

按WR打开电脑命令框输入“regedit”进入注册表 依次点击以下几个 右击新建一个“DWORD(32位)值”&#xff0c;命名为”FlightSettingsMaxPauseDays“ 右边选择十进制&#xff0c;左边填写暂停更新的天数 打开windows更新&#xff0c;进入高级选项 选择暂停更新的天数&#xff…

Fortigate防火墙二层接口的几种实现方式

初始配置 FortiGate出厂配置默认地址为192.168.1.99&#xff08;MGMT接口&#xff09;&#xff0c;可以通过https的方式进行web管理&#xff08;默认用户名admin&#xff0c;密码为空&#xff09;&#xff0c;不同型号设备用于管理的接口略有不同。 console接口的配置 防火墙…

java并发控制(猴子摘桃例子)

【问题】 有n个桃子&#xff0c; 猴子A每次固定摘2个&#xff0c;猴子B每次固定摘3个&#xff0c;这2只猴子不断摘桃子直到剩余桃子数量不足以摘&#xff08;必须满足摘桃个数&#xff09;&#xff1b; 【1】 使用AtomicInteger&#xff08;推荐&#xff09; 1&#xff09;利…

iOS--block再学习

block再学习 什么是blockblock是带有自动变量的匿名函数block语法 block的实现block的实质截获自动变量__blcok说明符Block存储域__block变量存储域使用__block变量用结构体成员变量__forwarding的原因 截获对象 什么是block Block时c语言的扩充功能&#xff0c;它允许开发者定…

pytorch之猫狗识别项目

1. 导入资源包 资源包&#xff1a; import torchvision&#xff1a;PyTorch 提供的视觉库&#xff0c;包含了常用的计算机视觉模型架构、数据集以及图像转换工具。 from torchvision import datasets, models&#xff1a;导入 torchvision 中的 datasets 和 models 模块&#…

spring boot +Scheduled 动态定时任务配置

通常情况下我们设定的定时任务都是固定的,有时候需要我们动态的配置定时任务,下面看代码 import com.mybatisflex.core.query.QueryWrapper; import com.yzsec.dsg.web.modules.exportpwd.entity.ExportPwd; import com.yzsec.dsg.web.modules.exportpwd.entity.table.Export…

如何使用GPT-4o函数调用构建一个实时应用程序?

本教程介绍了如何使用OpenAI最新的LLM GPT-4o通过函数调用将实时数据引入LLM。 我们在LLM函数调用指南(详见https://thenewstack.io/a-comprehensive-guide-to-function-calling-in-llms/)中讨论了如何将实时数据引入聊天机器人和代理。现在&#xff0c;我们将通过将来自Fligh…

bat脚本简介

一、bat脚本 概念定义 BAT 批处理是一种在 Windows 系统中用于将一系列命令组合成一个可执行文件&#xff08;.bat 文件&#xff09;的脚本技术。 允许用户将多个操作命令按顺序编写在一起。形成一个自动化执行的流程。批处理文件可以包含各种系统命令和程序调用。 如文件操作…

中国现在最厉害的书法家颜廷利:东方伟大思想家哲学家教育家

中国书法界名人颜廷利教授&#xff0c;一位在21世纪东方哲学、科学界及当代中国教育领域内具有深远影响力的泰斗级人物&#xff0c;不仅以其深厚的国学修为和对易经姓名学的独到见解著称&#xff0c;还因其选择在济南市历城区的龙泉大街以及天桥区的凤凰山庄与泉星小区等地设立…

如何在隔离环境中设置 LocalAI 以实现 GPU 驱动的文本嵌入

作者&#xff1a;来自 Elastic Valeriy Khakhutskyy 你是否想在 Elasticsearch 向量数据库之上构建 RAG 应用程序&#xff1f;你是否需要对大量数据使用语义搜索&#xff1f;你是否需要在隔离环境中本地运行&#xff1f;本文将向你展示如何操作。 Elasticsearch 提供了多种方法…

多曝光融合算法(三)cv2.createAlignMTB()多曝光图像融合的像素匹配问题

文章目录 1.cv2.createAlignMTB() 主要是计算2张图像的位移&#xff0c;假设位移移动不大2.多曝光图像的aline算法&#xff1a;median thresold bitmap原理讲解3.图像拼接算法stitch4.多曝光融合工具箱 1.cv2.createAlignMTB() 主要是计算2张图像的位移&#xff0c;假设位移移动…