回归分析_回归

回归分析

Machine learning algorithms are not your regular algorithms that we may be used to because they are often described by a combination of some complex statistics and mathematics. Since it is very important to understand the background of any algorithm you want to implement, this could pose a challenge to people with a non-mathematical background as the maths can sap your motivation by slowing you down.

机器学习算法不是我们可能习惯的常规算法,因为它们通常由一些复杂的统计数据和数学的组合来描述。 由于了解要实现的任何算法的背景非常重要,因此这可能会对非数学背景的人构成挑战,因为数学会通过减慢速度来降低您的动力。

In this article, we would be discussing linear and logistic regression and some regression techniques assuming we all have heard or even learnt about the Linear model in Mathematics class at high school. Hopefully, at the end of the article, the concept would be clearer.

在本文中,我们将讨论线性和逻辑回归以及一些回归技术,假设我们都已经听说甚至中学了数学课上的线性模型。 希望在文章末尾,这个概念会更清楚。

Regression Analysis is a statistical process for estimating the relationships between the dependent variables (say Y) and one or more independent variables or predictors (X). It explains the changes in the dependent variables with respect to changes in select predictors. Some major uses for regression analysis are in determining the strength of predictors, forecasting an effect, and trend forecasting. It finds the significant relationship between variables and the impact of predictors on dependent variables. In regression, we fit a curve/line (regression/best fit line) to the data points, such that the differences between the distances of data points from the curve/line are minimized.

回归分析是一种统计过程,用于估计因变量(例如Y)和一个或多个自变量或预测变量(X)之间的关系 。 它解释了因变量相对于所选预测变量变化的变化。 回归分析的一些主要用途是确定预测器的强度,预测效果和趋势预测。 它发现变量之间的显着关系以及预测变量对因变量的影响。 在回归中,我们将曲线/直线(回归/最佳拟合线)拟合到数据点,以使数据点到曲线/直线的距离之间的差异最小。

线性回归 (Linear Regression)

Image for post

It is the simplest and most widely known regression technique. Linear Regression establishes a relationship between the dependent variable (Y) and one or more independent variables (X) using a regression line. This is done by the Ordinary Least-Squares method (OLS calculates the best-fit line for the observed data by minimizing the sum of the squares of the vertical deviations from each data point to the line. Since the deviations are first squared, when added, there is no cancelling out between positive and negative values). It is represented by the equation:

它是最简单,最广为人知的回归技术。 线性回归使用回归线在因变量(Y)和一个或多个自变量(X)之间建立关系。 这是通过普通最小二乘方法完成的 (OLS通过最小化每个数据点到该行的垂直偏差的平方和来计算观测数据的最佳拟合线。 ,则无法在正值和负值之间抵消)。 它由等式表示:

Y=a+b*X + e; where a is intercept, b is slope of the line and e is error term.

Y = a + b * X + e; 其中a是截距,b是直线的斜率,e是误差项。

The OLS has several assumptions. They are-

OLS有几个假设。 他们是-

  1. Linearity: The relationship between X and the mean of Y is linear.

    线性 :X和Y的平均值之间的关系是线性的。

  2. Normality: The error(residuals) follow a normal distribution.

    正态性 :误差(残差)服从正态分布。

  3. Homoscedasticity: The variance of residual is the same for any value of X (Constant variance of errors).

    方差性:对于任何X值,残差方差都是相同的(误差的方差恒定)。

  4. No Endogeneity of regressors: It refers to the prohibition of a link between the independent variables and the errors

    回归变量无内生性 :指禁止自变量与错误之间的联系

  5. No autocorrelation: Errors are assumed to be uncorrelated and randomly spread across the regression line.

    无自相关 :假定错误是不相关的,并且随机分布在回归线上。

  6. Independence/No multicollinearity: it is observed when two or more variables have a high correlation.

    独立/无多重共线性:当两个或多个变量具有高度相关性时,会观察到。

We have simple and multiple linear regression, the difference being that multiple linear regression has more than one independent variables, whereas simple linear regression has only one independent variable.

我们有简单和多元线性回归,区别在于多元线性回归具有多个自变量,而简单线性回归只有一个自变量。

We can evaluate the performance of this model using the metric R-square.

我们可以使用度量R平方来评估此模型的性能

逻辑回归 (Logistic Regression)

Using linear regression, we can predict the price a customer will pay if he/she buys. With logistic regression we can make a more fundamental decision, “will the customer buy at all?”

使用线性回归,我们可以预测客户购买时将支付的价格。 通过逻辑回归,我们可以做出更基本的决定,“客户是否愿意购买?”

Image for post

Here, there is a shift from numerical to categorical. It is used in solving classification problems and in prediction where our targets are categorical variables. It can handle various types of relationships between the independent variables and Y because it applies a non-linear log transformation to the predicted odds ratio.

在这里,从数字到绝对是一个转变。 它用于解决分类问题和预测,其中我们的目标是分类变量。 它可以处理自变量和Y之间的各种类型的关系,因为它将非线性对数转换应用于预测的优势比。

odds= p/ (1-p) ln(odds) = ln(p/(1-p))logit(p) = ln(p/(1-p)) = b0+b1X1+b2X2+b3X3….+bkXk

赔率= p /(1-p)ln(奇数)= ln(p /(1-p))logit(p)= ln(p /(1-p))= b0 + b1X1 + b2X2 + b3X3…。+ bkXk

where p is the probability of event success and (1-p) is the probability of event failure.

其中p是事件成功的概率,而(1-p)是事件失败的概率。

The logit function can map any real value between 0 and 1. The parameters in the equation above are chosen to maximize the likelihood of observing the sample values rather than minimizing the sum of squared errors.

logit函数可以映射0到1之间的任何实数值。选择上式中的参数是为了最大化观察样本值的可能性,而不是最小化平方误差的总和。

结论。 (Conclusion.)

I would encourage you to read further to get a more solid understanding. There are several techniques employed in increasing the robustness of regression. They include regularization/penalisation methods(Lasso, Ridge and ElasticNet), gradient descent, stepwise regression, and so on.

我鼓励您进一步阅读以获得更扎实的理解。 有几种技术可以提高回归的鲁棒性。 它们包括正则化/惩罚化方法(Lasso,Ridge和ElasticNet),梯度下降,逐步回归等。

Kindly note that they are not types of regression as was noticed in many articles online. Below, you will find links to articles I found helpful in explaining some concepts and for your further reading. Happy learning!

请注意,它们不是许多在线文章所注意到的回归类型。 在下面,您会找到指向我的文章的链接,这些文章对我解释一些概念和进一步阅读很有帮助。 学习愉快!

https://medium.com/datadriveninvestor/regression-in-machine-learning-296caae933ec

https://medium.com/datadriveninvestor/regression-in-machine-learning-296caae933ec

https://machinelearningmastery.com/linear-regression-for-machine-learning/

https://machinelearningmastery.com/linear-regression-for-machine-learning/

https://www.geeksforgeeks.org/ml-linear-regression/

https://www.geeksforgeeks.org/ml-linear-regression/

https://www.geeksforgeeks.org/types-of-regression-techniques/

https://www.geeksforgeeks.org/types-of-regression-techniques/

https://www.vebuso.com/2020/02/linear-to-logistic-regression-explained-step-by-step/

https://www.vebuso.com/2020/02/linear-to-logistic-regression-explained-step-by-step/

https://www.statisticssolutions.com/what-is-logistic-regression/

https://www.statisticssolutions.com/what-is-logistic-regression/

https://www.listendata.com/2014/11/difference-between-linear-regression.html#:~:text=Purpose%20%3A%20Linear%20regression%20is%20used,the%20probability%20of%20an%20event.

https://www.listendata.com/2014/11/difference-between-linear-regression.html#:~:text=Purpose%20%3A%20Linear%20regression%20is%20used,%20probability%20of%20an %20event 。

https://www.kaggle.com/residentmario/l1-norms-versus-l2-norms

https://www.kaggle.com/residentmario/l1-norms-versus-l2-norms

翻译自: https://medium.com/analytics-vidhya/regression-15cfaffe805a

回归分析

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/390738.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

数据科学还是计算机科学_何时不使用数据科学

数据科学还是计算机科学意见 (Opinion) 目录 (Table of Contents) Introduction 介绍 Examples 例子 When You Should Use Data Science 什么时候应该使用数据科学 Summary 摘要 介绍 (Introduction) Both Data Science and Machine Learning are useful fields that apply sev…

leetcode 523. 连续的子数组和

给你一个整数数组 nums 和一个整数 k ,编写一个函数来判断该数组是否含有同时满足下述条件的连续子数组: 子数组大小 至少为 2 ,且 子数组元素总和为 k 的倍数。 如果存在,返回 true ;否则,返回 false 。 …

Docker学习笔记 - Docker Compose

一、概念 Docker Compose 用于定义运行使用多个容器的应用,可以一条命令启动应用(多个容器)。 使用Docker Compose 的步骤: 定义容器 Dockerfile定义应用的各个服务 docker-compose.yml启动应用 docker-compose up二、安装 Note t…

线性回归算法数学原理_线性回归算法-非数学家的高级数学

线性回归算法数学原理内部AI (Inside AI) Linear regression is one of the most popular algorithms used in different fields well before the advent of computers. Today with the powerful computers, we can solve multi-dimensional linear regression which was not p…

Linux 概述

UNIX发展历程 第一个版本是1969年由Ken Thompson(UNIX之父)在AT& T贝尔实验室实现Ken Thompson和Dennis Ritchie(C语言之父)使用C语言对整个系统进行了再加工和编写UNIX的源代码属于SCO公司(AT&T ->Novell …

泰坦尼克:机器从灾难中学习_用于灾难响应的机器学习研究:什么才是好的论文?...

泰坦尼克:机器从灾难中学习For the first time in 2021, a major Machine Learning conference will have a track devoted to disaster response. The 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2021) has a track on…

github持续集成的设置_如何使用GitHub Actions和Puppeteer建立持续集成管道

github持续集成的设置Lately Ive added continuous integration to my blog using Puppeteer for end to end testing. My main goal was to allow automatic dependency updates using Dependabot. In this guide Ill show you how to create such a pipeline yourself. 最近&…

shell与常用命令

虚拟控制台 一台计算机的输入输出设备就是一个物理的控制台 ; 如果在一台计算机上用软件的方法实现了多个互不干扰独立工作的控制台界面,就是实现了多个虚拟控制台; Linux终端的工作方式是字符命令行方式,用户通过键盘输入命令进…

Linux文本编辑器

Linux文本编辑器 Linux系统下有很多文本编辑器。 按编辑区域: 行编辑器 ed 全屏编辑器 vi 按运行环境: 命令行控制台编辑器 vi X Window图形界面编辑器 gedit ed 它是一个很古老的行编辑器,vi这些编辑器都是ed演化而来。 每次只能对一…

Alpha第十天

Alpha第十天 听说 031502543 周龙荣(队长) 031502615 李家鹏 031502632 伍晨薇 031502637 张柽 031502639 郑秦 1.前言 任务分配是VV、ZQ、ZC负责前端开发,由JP和LL负责建库和服务器。界面开发的教辅材料是《第一行代码》,利用And…

Streamlit —使用数据应用程序更好地测试模型

介绍 (Introduction) We use all kinds of techniques from creating a very reliable validation set to using k-fold cross-validation or coming up with all sorts of fancy metrics to determine how good our model performs. However, nothing beats looking at the ra…

X Window系统

X Window系统 一种以位图方式显示的软件窗口系统。诞生于1984,比Microsoft Windows要早。是一套独立于内核的软件 Linux上的X Window系统 X Window系统由三个基本元素组成:X Server、X Client和二者通信的通道。 X Server:是控制输出及输入…

lasso回归和岭回归_如何计划新产品和服务机会的回归

lasso回归和岭回归Marketers sometimes have to be creative to offer customers something new without the luxury of that new item being a brand-new product or built-from-scratch service. In fact, incrementally introducing features is familiar to marketers of c…

Linux 设备管理和进程管理

设备管理 Linux系统中设备是用文件来表示的,每种设备都被抽象为设备文件的形式,这样,就给应用程序一个一致的文件界面,方便应用程序和操作系统之间的通信。 设备文件集中放置在/dev目录下,一般有几千个,不…

贝叶斯 定理_贝叶斯定理实际上是一个直观的分数

贝叶斯 定理Bayes’ Theorem is one of the most known to the field of probability, and it is used often as a baseline model in machine learning. It is, however, too often memorized and chanted by people who don’t really know what P(B|E) P(E|B) * P(B) / P(E…

文本数据可视化_如何使用TextHero快速预处理和可视化文本数据

文本数据可视化自然语言处理 (Natural Language Processing) When we are working on any NLP project or competition, we spend most of our time on preprocessing the text such as removing digits, punctuations, stopwords, whitespaces, etc and sometimes visualizati…

linux shell 编程

shell的作用 shell是用户和系统内核之间的接口程序shell是命令解释器 shell程序 Shell程序的特点及用途: shell程序可以认为是将shell命令按照控制结构组织到一个文本文件中,批量的交给shell去执行 不同的shell解释器使用不同的shell命令语法 shell…

真实感人故事_您的数据可以告诉您真实故事吗?

真实感人故事Many are passionate about Data Analytics. Many love matplotlib and Seaborn. Many enjoy designing and working on Classifiers. We are quick to grab a data set and launch Jupyter Notebook, import pandas and NumPy and get to work. But wait a minute…

转:防止跨站攻击,安全过滤

转:http://blog.csdn.net/zpf0918/article/details/43952511 Spring MVC防御CSRF、XSS和SQL注入攻击 本文说一下SpringMVC如何防御CSRF(Cross-site request forgery跨站请求伪造)和XSS(Cross site script跨站脚本攻击)。 说说CSRF 对CSRF来说,其实Spring…

Linux c编程

c语言标准 ANSI CPOSIX(提高UNIX程序可移植性)SVID(POSIX的扩展超集)XPG(X/Open可移植性指南)GNU C(唯一能编译Linux内核的编译器) gcc 简介 名称: GNU project C an…