回归分析
Machine learning algorithms are not your regular algorithms that we may be used to because they are often described by a combination of some complex statistics and mathematics. Since it is very important to understand the background of any algorithm you want to implement, this could pose a challenge to people with a non-mathematical background as the maths can sap your motivation by slowing you down.
机器学习算法不是我们可能习惯的常规算法,因为它们通常由一些复杂的统计数据和数学的组合来描述。 由于了解要实现的任何算法的背景非常重要,因此这可能会对非数学背景的人构成挑战,因为数学会通过减慢速度来降低您的动力。
In this article, we would be discussing linear and logistic regression and some regression techniques assuming we all have heard or even learnt about the Linear model in Mathematics class at high school. Hopefully, at the end of the article, the concept would be clearer.
在本文中,我们将讨论线性和逻辑回归以及一些回归技术,假设我们都已经听说甚至中学了数学课上的线性模型。 希望在文章末尾,这个概念会更清楚。
Regression Analysis is a statistical process for estimating the relationships between the dependent variables (say Y) and one or more independent variables or predictors (X). It explains the changes in the dependent variables with respect to changes in select predictors. Some major uses for regression analysis are in determining the strength of predictors, forecasting an effect, and trend forecasting. It finds the significant relationship between variables and the impact of predictors on dependent variables. In regression, we fit a curve/line (regression/best fit line) to the data points, such that the differences between the distances of data points from the curve/line are minimized.
回归分析是一种统计过程,用于估计因变量(例如Y)和一个或多个自变量或预测变量(X)之间的关系 。 它解释了因变量相对于所选预测变量变化的变化。 回归分析的一些主要用途是确定预测器的强度,预测效果和趋势预测。 它发现变量之间的显着关系以及预测变量对因变量的影响。 在回归中,我们将曲线/直线(回归/最佳拟合线)拟合到数据点,以使数据点到曲线/直线的距离之间的差异最小。
线性回归 (Linear Regression)
It is the simplest and most widely known regression technique. Linear Regression establishes a relationship between the dependent variable (Y) and one or more independent variables (X) using a regression line. This is done by the Ordinary Least-Squares method (OLS calculates the best-fit line for the observed data by minimizing the sum of the squares of the vertical deviations from each data point to the line. Since the deviations are first squared, when added, there is no cancelling out between positive and negative values). It is represented by the equation:
它是最简单,最广为人知的回归技术。 线性回归使用回归线在因变量(Y)和一个或多个自变量(X)之间建立关系。 这是通过普通最小二乘方法完成的 (OLS通过最小化每个数据点到该行的垂直偏差的平方和来计算观测数据的最佳拟合线。 ,则无法在正值和负值之间抵消)。 它由等式表示:
Y=a+b*X + e; where a is intercept, b is slope of the line and e is error term.
Y = a + b * X + e; 其中a是截距,b是直线的斜率,e是误差项。
The OLS has several assumptions. They are-
OLS有几个假设。 他们是-
Linearity: The relationship between X and the mean of Y is linear.
线性 :X和Y的平均值之间的关系是线性的。
Normality: The error(residuals) follow a normal distribution.
正态性 :误差(残差)服从正态分布。
Homoscedasticity: The variance of residual is the same for any value of X (Constant variance of errors).
均 方差性:对于任何X值,残差方差都是相同的(误差的方差恒定)。
No Endogeneity of regressors: It refers to the prohibition of a link between the independent variables and the errors
回归变量无内生性 :指禁止自变量与错误之间的联系
No autocorrelation: Errors are assumed to be uncorrelated and randomly spread across the regression line.
无自相关 :假定错误是不相关的,并且随机分布在回归线上。
Independence/No multicollinearity: it is observed when two or more variables have a high correlation.
独立/无多重共线性:当两个或多个变量具有高度相关性时,会观察到。
We have simple and multiple linear regression, the difference being that multiple linear regression has more than one independent variables, whereas simple linear regression has only one independent variable.
我们有简单和多元线性回归,区别在于多元线性回归具有多个自变量,而简单线性回归只有一个自变量。
We can evaluate the performance of this model using the metric R-square.
我们可以使用度量R平方来评估此模型的性能。
逻辑回归 (Logistic Regression)
Using linear regression, we can predict the price a customer will pay if he/she buys. With logistic regression we can make a more fundamental decision, “will the customer buy at all?”
使用线性回归,我们可以预测客户购买时将支付的价格。 通过逻辑回归,我们可以做出更基本的决定,“客户是否愿意购买?”
Here, there is a shift from numerical to categorical. It is used in solving classification problems and in prediction where our targets are categorical variables. It can handle various types of relationships between the independent variables and Y because it applies a non-linear log transformation to the predicted odds ratio.
在这里,从数字到绝对是一个转变。 它用于解决分类问题和预测,其中我们的目标是分类变量。 它可以处理自变量和Y之间的各种类型的关系,因为它将非线性对数转换应用于预测的优势比。
odds= p/ (1-p) ln(odds) = ln(p/(1-p))logit(p) = ln(p/(1-p)) = b0+b1X1+b2X2+b3X3….+bkXk
赔率= p /(1-p)ln(奇数)= ln(p /(1-p))logit(p)= ln(p /(1-p))= b0 + b1X1 + b2X2 + b3X3…。+ bkXk
where p is the probability of event success and (1-p) is the probability of event failure.
其中p是事件成功的概率,而(1-p)是事件失败的概率。
The logit function can map any real value between 0 and 1. The parameters in the equation above are chosen to maximize the likelihood of observing the sample values rather than minimizing the sum of squared errors.
logit函数可以映射0到1之间的任何实数值。选择上式中的参数是为了最大化观察样本值的可能性,而不是最小化平方误差的总和。
结论。 (Conclusion.)
I would encourage you to read further to get a more solid understanding. There are several techniques employed in increasing the robustness of regression. They include regularization/penalisation methods(Lasso, Ridge and ElasticNet), gradient descent, stepwise regression, and so on.
我鼓励您进一步阅读以获得更扎实的理解。 有几种技术可以提高回归的鲁棒性。 它们包括正则化/惩罚化方法(Lasso,Ridge和ElasticNet),梯度下降,逐步回归等。
Kindly note that they are not types of regression as was noticed in many articles online. Below, you will find links to articles I found helpful in explaining some concepts and for your further reading. Happy learning!
请注意,它们不是许多在线文章所注意到的回归类型。 在下面,您会找到指向我的文章的链接,这些文章对我解释一些概念和进一步阅读很有帮助。 学习愉快!
https://medium.com/datadriveninvestor/regression-in-machine-learning-296caae933ec
https://medium.com/datadriveninvestor/regression-in-machine-learning-296caae933ec
https://machinelearningmastery.com/linear-regression-for-machine-learning/
https://machinelearningmastery.com/linear-regression-for-machine-learning/
https://www.geeksforgeeks.org/ml-linear-regression/
https://www.geeksforgeeks.org/ml-linear-regression/
https://www.geeksforgeeks.org/types-of-regression-techniques/
https://www.geeksforgeeks.org/types-of-regression-techniques/
https://www.vebuso.com/2020/02/linear-to-logistic-regression-explained-step-by-step/
https://www.vebuso.com/2020/02/linear-to-logistic-regression-explained-step-by-step/
https://www.statisticssolutions.com/what-is-logistic-regression/
https://www.statisticssolutions.com/what-is-logistic-regression/
https://www.listendata.com/2014/11/difference-between-linear-regression.html#:~:text=Purpose%20%3A%20Linear%20regression%20is%20used,the%20probability%20of%20an%20event.
https://www.listendata.com/2014/11/difference-between-linear-regression.html#:~:text=Purpose%20%3A%20Linear%20regression%20is%20used,%20probability%20of%20an %20event 。
https://www.kaggle.com/residentmario/l1-norms-versus-l2-norms
https://www.kaggle.com/residentmario/l1-norms-versus-l2-norms
翻译自: https://medium.com/analytics-vidhya/regression-15cfaffe805a
回归分析
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/390738.shtml
如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!