深度学习算法和机器学习算法_啊哈! 4种流行的机器学习算法的片刻

深度学习算法和机器学习算法

Most people are either in two camps:

大多数人都在两个营地中:

  • I don’t understand these machine learning algorithms.

    我不了解这些机器学习算法。
  • I understand how the algorithms work, but not why they work.

    我理解的算法是如何工作的,但不是为什么他们的工作。

This article seeks to explain not only how algorithms work, but give an intuitive understanding of why they work, to deliver that lightbulb aha! moment.

本文试图解释算法不仅是如何工作的,但给的为什么他们的工作,以交付灯泡AHA一个直观的了解! 时刻。

决策树 (Decision Trees)

Decision Trees divide the feature space using horizontal and vertical lines. For example, consider a very simplistic Decision Tree below, which has one conditional node and two class nodes, indicating a condition and under which category a training point that satisfies it will fall into.

决策树使用水平线和垂直线划分要素空间。 例如,考虑下面一个非常简单的决策树,该决策树具有一个条件节点和两个类节点,指示一个条件以及满足该条件的训练点将属于哪个类别。

Image for post

Note that there is a lot of overlap between the fields marked as each color and the data points within that area that actually are that color, or (roughly) entropy. The decision tree is constructed to minimize the entropy. In this scenario, we can add an additional layer of complexity. If we were to add another condition; if x is less than 6 and y is larger than 6, we can designate points in that area as red. The entropy has been lowered with this move.

请注意,标记为每种颜色的字段与该区域内实际上是该颜色或(大致) 的数据点之间存在很多重叠。 构造决策树以最小化熵。 在这种情况下,我们可以增加一层复杂性。 如果要添加另一个条件; 如果x小于6 y大于6,我们可以将该区域中的点指定为红色。 此举降低了熵。

Image for post

Each step, the Decision Tree algorithm attempts to find a method to build the tree such that the entropy is minimized. Think of entropy more formally as the amount of ‘disorder’ or ‘confusion’ a certain divider (the conditions) has, and its opposite as ‘information gain’ — how much a divider adds information and insight to the model. Feature splits that have the highest information gain (as well as a lowest entropy) are placed at the top.

在每个步骤中,决策树算法都会尝试找到一种构建树的方法,以使熵最小化。 将熵更正式地看作是某个分隔线(条件)所具有的“混乱”或“混乱”的数量,而与之相反的是“信息增益”,即分隔线为模型增加了多少信息和洞察力。 具有最高信息增益(以及最低熵)的要素拆分位于顶部。

Image for post

The conditions may split their one-dimensional features somewhat like this:

条件可能会将其一维特征分解为如下形式:

Image for post

Note that condition 1 has clean separation, and therefore low entropy and high information gain. The same cannot be said for condition 3, which is why it is placed near the bottom of the Decision Tree. This construction of the tree ensures that it can remain as lightweight as possible.

请注意,条件1具有清晰的分隔,因此熵低且信息增益高。 条件3不能说相同,这就是为什么它位于决策树底部附近的原因。 树的这种构造确保其可以保持尽可能轻巧。

You can read more about entropy and its use in Decision Trees as well as neural networks (cross-entropy as a loss function) here.

您可以在此处阅读有关熵及其在决策树以及神经网络(交叉熵作为损失函数)中的用法的更多信息。

随机森林 (Random Forest)

Random Forest is a bagged (bootstrap aggregated) version of the Decision Tree. The primary idea is that several Decision Trees are each trained on a subset of data. Then, an input is passed through each model, and their outputs are aggregated through a function like a mean to produce a final output. Bagging is a form of ensemble learning.

随机森林是决策树的袋装(引导聚合)版本。 主要思想是对数个决策树分别训练一个数据子集。 然后,输入通过每个模型,并且它们的输出通过类似平均值的函数进行汇总以产生最终输出。 套袋是合奏学习的一种形式。

Image for post

There are many analogies for why Random Forest works well. Here is a common version of one:

有许多类比说明为什么随机森林运作良好。 这是其中一个的通用版本:

You need to decide which restaurant to go to next. To ask someone for their recommendation, you must answer a variety of yes/no questions, which will lead them to make their decision for which restaurant you should go to.

您需要确定下一家餐厅。 要向某人提出建议,您必须回答各种是/否问题,这将使他们做出您应该去哪家餐厅的决定。

Would you rather only ask one friend or ask several friends, then find the mode or general consensus?

您愿意只问一个朋友还是问几个朋友,然后找到方式或普遍共识?

Unless you only have one friend, most people would answer the second. The insight this analogy provides is that each tree has some sort of ‘diversity of thought’ because they were trained on different data, and hence have different ‘experiences’.

除非您只有一个朋友,否则大多数人都会回答第二个。 该类比提供的见解是,每棵树都具有某种“思想多样性”,因为它们是在不同的数据上进行训练的,因此具有不同的“体验”。

This analogy, clean and simple as it is, never really stood out to me. In the real world, the single-friend option has less experience than all the friends in total, but in machine learning, the decision tree and random forest models are trained on the same data, and hence, same experiences. The ensemble model is not actually receiving any new information. If I could ask one all-knowing friend for a recommendation, I see no objection to that.

这种类比,干净和简单,从来没有真正让我脱颖而出。 在现实世界中,单朋友选项的经验少于所有朋友,但是在机器学习中,决策树和随机森林模型是在相同的数据上训练的,因此也具有相同的体验。 集成模型实际上没有接收任何新信息。 如果我可以向一个全知的朋友提出建议,我不会反对。

How can a model trained on the same data that randomly pulls subsets of the data to simulate artificial ‘diversity’ perform better than one trained on the data as a whole?

在相同数据上训练的,随机抽取数据子集以模拟人为“多样性”的模型如何比在整个数据上训练的模型更好?

Take a sine wave with heavy normally distributed noise. This is your single Decision Tree classifier, which is naturally a very high-variance model.

拍摄正弦波,并带有大量正态分布的噪声。 这是您的单个决策树分类器,它自然是一个高方差模型。

Image for post

100 ‘approximators’ will be chosen. These approximators randomly select points along the sine wave and generate a sinusoidal fit, much like decision trees being trained on subsets of the data. These fits are then averaged to form a bagged curve. The result? — a much smoother curve.

将选择100个“近似器”。 这些逼近器沿正弦波随机选择点并生成正弦曲线拟合,就像在数据子集上训练决策树一样。 然后将这些拟合平均,以形成袋装曲线。 结果? -更平滑的曲线。

Image for post

The reason why bagging works is because it reduces the variance of models, and helps improve capability to generalize, by artificially making the model more ‘confident’. This is also why bagging does not work as well on already low-variance models like logistic regression.

套袋工作的原因在于,它通过人为地使模型更具“信心”,从而减少了模型的差异并有助于提高泛化能力。 这也就是为什么装袋在诸如Logistic回归之类的低方差模型中效果不佳的原因。

You can read more about the intuition and more rigorous proof of the success of bagging here.

您可以在这里关于直觉和成功套袋更严格的证据。

支持向量机 (Support Vector Machines)

Support Vector Machines attempt to find a hyperplane that can divide the data best, relying on the concept of ‘support vectors’ to maximize the divide between the two classes.

支持向量机依靠“支持向量”的概念来最大化两个类别之间的距离,从而试图找到一种可以最好地划分数据的超平面。

Image for post

Unfortunately, most datasets are not so easily separable, and if they were, SVM would likely not be the best algorithm to handle it. Consider this one-dimensional separation task; there is no good divider, since any one separation will cause two separate classes to be lumped into the same one.

不幸的是,大多数数据集并不是那么容易分离,如果是这样,SVM可能不是处理它的最佳算法。 考虑此一维分离任务; 没有很好的分隔符,因为任何一种分隔都会导致将两个单独的类归为同一类。

Image for post
One proposal for a split.
一个提议分开。

SVM is powerful at solving these kinds of problems by using a so-called ‘kernel trick’, which projects data into new dimensions to make the separation task easier. For instance, let’s create a new dimension, which is simply defined as x² (x is the original dimension):

SVM通过使用所谓的“内核技巧”来强大地解决此类问题,该技巧将数据投影到新的维度上,从而使分离任务更加容易。 例如,让我们创建一个新的层面,它被简单地定义为x²(x为原始尺寸):

Image for post

Now, the data is cleanly separable after the data was projected onto a new dimension (each data point represented in two dimensions as (x, x²)).

现在,数据被投影到一个新的层面后的数据是干净可分离(每个数据点在两个维度为代表( x , x ²)

Using a variety of kernels — most popularly, polynomial, sigmoid, and RBF kernels — the kernel trick does the heavy lifting to create a transformed space such that the separation task is simple.

使用各种内核(最常见的是多项式,Sigmoid和RBF内核),内核技巧使繁重的工作创造了一个转换后的空间,从而使分离任务变得简单。

神经网络 (Neural Networks)

Neural Networks are the pinnacle of machine learning. Their discovery, and that unlimited variations and improvements that can be made upon it have warranted it the subject of its own field, deep learning. Admittedly, the success of neural networks is still incomplete (“Neural networks are matrix multiplications that no one understands”), but the easiest way to explain them is through the Universal Approximation Theorem (UAT).

神经网络是机器学习的顶峰。 他们的发现以及对它的无穷变化和改进使它成为了自己领域的主题,即深度学习。 诚然,神经网络的成功仍然不完整(“神经网络是没人能理解的矩阵乘法”),但是最简单的解释方法是通过通用近似定理(UAT)。

At their core, every supervised algorithm seeks to model some underlying function of the data; usually this is either a regression plane or the feature boundary. Consider this function y = , which can be modelled to an arbitrary accuracy with several horizontal steps.

每种监督算法的核心都是试图对数据的某些基础功能进行建模。 通常这是一个回归平面或特征边界。 考虑这个函数y = ,可以用几个水平步长将其建模为任意精度。

Image for post

This is essentially what a neural network can do. Perhaps it can be a little more complex and model relationships beyond horizontal steps (like quadratic and linear lines below), but at its core, the neural network is a piecewise function approximator.

这本质上就是神经网络可以做的。 也许除了水平步长(如下面的二次和线性线)之外,模型关系可能会更复杂一些,但是神经网络的核心是分段函数逼近器。

Image for post

Each node is in delegated to one part of the piecewise function, and the purpose of the network is to activate certain neurons responsible for parts of the feature space. For instance, if one were to classify images of men with beards or no beards, several nodes should be delegated specifically to pixel locations where beards often appear. Somewhere in multi-dimensional space, these nodes represent a numerical range.

每个节点都委派给分段功能的一部分,而网络的目的是激活负责部分特征空间的某些神经元。 例如,如果要对有胡须或没有胡须的男性图像进行分类,则应将几个节点专门委派给经常出现胡须的像素位置。 在多维空间中的某个位置,这些节点表示一个数值范围。

Note, again, that the question “why do neural networks work” is still unanswered. The UAT doesn’t answer this question, but states that neural networks, under certain human interpretations, can model any function. The field of Explainable/Interpretable AI is emerging to answer these questions with methods like activation maximization and sensitivity analysis.

再次注意,“神经网络为什么起作用”的问题仍然没有得到回答。 UAT并未回答这个问题,但指出在某些人类的解释下,神经网络可以为任何功能建模。 可解释/可解释AI的领域正在涌现,以通过激活最大化和敏感性分析之类的方法来回答这些问题。

You can read a more in-depth explanation and view visualizations of the Universal Approximation Theorem here.

您可以在此处阅读更深入的解释,并查看通用近似定理的可视化。

In all four algorithms, and many others, these look very simplistic at a low dimensionality. A key realization in machine learning is that a lot of the ‘magic’ and ‘intelligence’ we purport to see in AI is really a simple algorithm hidden under the guise of high dimensionality.

在所有四种算法以及许多其他算法中,这些算法在低维情况下看起来都非常简单。 机器学习的一个关键实现是,我们声称在AI中看到的许多“魔术”和“智能”实际上是一个隐藏在高维伪装下的简单算法。

Decision trees splitting regions into squares is simple, but decision trees splitting high-dimensional space into hypercubes is less so. SVM performing a kernel trick to improve separability from one to two dimensions is understandable, but SVM doing the same thing on a dataset of hundreds of dimensions large is almost magic.

将区域划分为正方形的决策树很简单,但是将高维空间划分为超立方体的决策树却不那么容易。 SVM执行内核技巧以提高一维到二维的可分离性是可以理解的,但是SVM在数百个大维数据集上执行相同的操作几乎是神奇的。

Our admiration and confusion of machine learning is predicated on our lack of understanding for high dimensional spaces. Learning how to get around high dimensionality and understanding algorithms in a native space is instrumental to an intuitive understanding.

我们对机器学习的钦佩和困惑是基于我们对高维空间缺乏了解。 学习如何解决高维问题并了解本机空间中的算法,有助于直观理解。

All images created by author.

作者创作的所有图像。

翻译自: https://towardsdatascience.com/the-aha-moments-in-4-popular-machine-learning-algorithms-f7e75ef5b317

深度学习算法和机器学习算法

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/390774.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Python第一次周考(0402)

2019独角兽企业重金招聘Python工程师标准>>> 一、单选 1、Python3中下列语句错误的有哪些? A s input() B s raw_input() C print(hello world.) D print(hello world.) 2、下面哪个是 Pycharm 在 Windows 下 默认 用于“批量注释”的快捷键 A Ctrl d…

ASP.NET 页面之间传值的几种方式

对于任何一个初学者来说,页面之间传值可谓是必经之路,却又是他们的难点。其实,对大部分高手来说,未必不是难点。 回想2016年面试的将近300人中,有实习生,有应届毕业生,有1-3年经验的&#xff0c…

Mapreduce原理和YARN

MapReduce定义 MapReduce是一种分布式计算框架,由Google公司2004年首次提出,并贡献给Apache基金会。 MR版本 MapReduce 1.0,Hadoop早期版本(只支持MR模型)MapReduce 2.0,Hadoop 2.X版本(引入了YARN资源调度框架后&a…

数据可视化图表类型_数据可视化中12种最常见的图表类型

数据可视化图表类型In the current era of large amounts of information in the form of numbers available everywhere, it is a difficult task to understand and get insights from these dense piles of data.在当今时代,到处都是数字形式的大量信息&#xff…

MapReduce编程

自定义Mapper类 class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> &#xff5b; … }自定义mapper类都必须实现Mapper类&#xff0c;有4个类型参数&#xff0c;分别是&#xff1a; Object&#xff1a;Input Key Type-------------K1Text: Input…

统计信息在数据库中的作用_统计在行业中的作用

统计信息在数据库中的作用数据科学与机器学习 (DATA SCIENCE AND MACHINE LEARNING) Statistics are everywhere, and most industries rely on statistics and statistical thinking to support their business. The interest to grasp on statistics also required to become…

IOS手机关于音乐自动播放问题的解决办法

2019独角兽企业重金招聘Python工程师标准>>> 评估手机自带浏览器不能识别 aduio标签重的autoplay属性 也不能自动执行play()方法 一个有效的解决方案是在微信jssdk中调用play方法 document.addEventListener("WeixinJSBridgeReady", function () { docum…

开发人员怎么看实施人员

英文原文&#xff1a;What Developers Think Of Operations&#xff0c;翻译&#xff1a;张红月CSDN 在一个公司里面&#xff0c;开发和产品实施对于IS/IT的使用是至关重要的&#xff0c;一个负责产品的研发工作&#xff0c;另外一个负责产品的安装、调试等工作。但是在开发人员…

怎么评价两组数据是否接近_接近组数据(组间)

怎么评价两组数据是否接近接近组数据(组间) (Approaching group data (between-group)) A typical situation regarding solving an experimental question using a data-driven approach involves several groups that differ in (hopefully) one, sometimes more variables.使…

代码审计之DocCms漏洞分析

0x01 前言 DocCms[音译&#xff1a;稻壳Cms] &#xff0c;定位于为企业、站长、开发者、网络公司、VI策划设计公司、SEO推广营销公司、网站初学者等用户 量身打造的一款全新企业建站、内容管理系统&#xff0c;服务于企业品牌信息化建设&#xff0c;也适应用个人、门户网站建设…

翻译(九)——Clustered Indexes: Stairway to SQL Server Indexes Level 3

原文链接&#xff1a;www.sqlservercentral.com/articles/StairwaySeries/72351/ Clustered Indexes: Stairway to SQL Server Indexes Level 3 By David Durant, 2013/01/25 (first published: 2011/06/22) The Series 本文是阶梯系列的一部分&#xff1a;SQL Server索引的阶梯…

power bi 中计算_Power BI中的期间比较

power bi 中计算Just recently, I’ve come across a question on the LinkedIn platform, if it’s possible to create the following visualization in Power BI:就在最近&#xff0c;我是否在LinkedIn平台上遇到了一个问题&#xff0c;是否有可能在Power BI中创建以下可视化…

-Hive-

Hive定义 Hive 是一种数据仓库技术&#xff0c;用于查询和管理存储在分布式环境下的大数据集。构建于Hadoop的HDFS和MapReduce上&#xff0c;用于管理和查询分析结构化/非结构化数据的数据仓库; 使用HQL&#xff08;类SQL语句&#xff09;作为查询接口&#xff1b;使用HDFS作…

CentOS 7 安装 JDK

2019独角兽企业重金招聘Python工程师标准>>> 1、下载oracle jdk 下载地址&#xff1a; http://www.oracle.com/technetwork/java/javase/downloads/index.html 选择同一协议&#xff0c;下载rpm格式版本jdk&#xff0c;或tar.gz格式jdk。 2、卸载本机openjdk 2.1、查…

如何进行数据分析统计_对您不了解的数据集进行统计分析

如何进行数据分析统计Recently, I took the opportunity to work on a competition held by Wells Fargo (Mindsumo). The dataset provided was just a bunch of numbers in various columns with no indication of what the data might be. I always thought that the analys…

020-Spring Boot 监控和度量

一、概述 通过配置使用actuator查看监控和度量信息 二、使用 2.1、建立web项目&#xff0c;增加pom <dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-web</artifactId></dependency> 启动项目&a…

matplotlib布局_Matplotlib多列,行跨度布局

matplotlib布局For Visualization in Python, Matplotlib library has been the workhorse for quite some time now. It has held its own even after more nimble rivals with easier code interface and capabilities like seaborn, plotly, bokeh etc. have arrived on the…

Hadoop生态系统

大数据架构-Lambda Lambda架构由Storm的作者Nathan Marz提出。旨在设计出一个能满足实时大数据系统关键特性的架构&#xff0c;具有高容错、低延时和可扩展等特性。Lambda架构整合离线计算和实时计算&#xff0c;融合不可变性&#xff08;Immutability&#xff09;&#xff0c…

使用Hadoop所需要的一些Linux基础

Linux 概念 Linux 是一个类Unix操作系统&#xff0c;是 Unix 的一种&#xff0c;它 控制整个系统基本服务的核心程序 (kernel) 是由 Linus 带头开发出来的&#xff0c;「Linux」这个名称便是以 「Linus’s unix」来命名的。 Linux泛指一类操作系统&#xff0c;具体的版本有&a…

python多项式回归_Python从头开始的多项式回归

python多项式回归Polynomial regression in an improved version of linear regression. If you know linear regression, it will be simple for you. If not, I will explain the formulas here in this article. There are other advanced and more efficient machine learn…