数据科学和统计学_数据科学中的统计

数据科学和统计学

统计 (Statistics)

Statistics are utilized to process complex issues in reality with the goal that Data Scientists and Analysts can search for important patterns and changes in Data. In straightforward words, Statistics can be utilized to get significant experiences from information by performing scientific calculations on it. A few Statistical capacities, standards and calculations are executed to break down crude information, fabricate a Statistical Model and construe or foresee the outcome. The motivation behind this is to give an extensive review of the fundamentals of statistics that you’ll need to start your data science journey.

统计数据用于处理现实中的复杂问题,其目标是数据科学家和分析师可以搜索数据的重要模式和变化。 简而言之,可以通过对统计信息进行科学计算,利用统计信息来获得重要的经验。 执行一些统计能力,标准和计算以分解原始信息,构建统计模型并解释或预见结果。 其背后的动机是对开始进行数据科学之旅所需的统计基础知识进行广泛的回顾。

资料类型 (Data Types)

  1. Numerical:

    数值

    Data communicated with digits; is quantifiable. It can either be discrete (limited number of qualities) or consistent (interminable number of qualities).

    用数字传达的数据; 是可量化的。 它可以是离散的(有限数量的质量)或一致的(无限数量的质量)。

  2. Downright:

    完全

    Qualitative data grouped into classes. It tends to be ostensible (no structure) or ordinal (requested data).

    定性数据分为几类。 它倾向于表面上的(无结构)或顺序的(请求的数据)。

集中趋势测度 (Measures of Central Tendency)

  • Mean: The normal of a dataset.

    平均值 :数据集的法线。

  • Medium: The center of an arranged dataset; less defenseless to anomalies.

    :排列的数据集的中心; 对异常情况缺乏防御力。

  • Mode: The most widely recognized incentive in a dataset; just significant for discrete information.

    模式 :数据集中最广泛认可的激励; 对于离散信息而言意义重大。

statistics (1)

变异量度 (Measures of Variability)

  • Range: The distinction between the most elevated and least incentive in a dataset.

    范围 :数据集中最高激励和最低激励之间的区别。

  • Variance (σ2): Apportions on how to spread a lot of data is comparative with the mean.

    方差(σ2) :关于如何分散大量数据的方式与均值比较。

  • Standard Deviation (σ): Another estimation of how to spread out numbers are in data collection; it is the square foundation of variance

    标准偏差(σ) :关于如何分散数字的另一种估计是在数据收集中。 它是方差的平方根

  • Z-score: Decides the number of the standard deviations data point is from the mean.

    Z分数 :确定标准差数据点与平均值的数量。

  • R-Squared: A factual proportion of fit that demonstrates how much variety of a reliant variable is clarified by the free variable(s); just helpful for straightforward direct relapse.

    R平方 :拟合的实际比例,它表明自由变量阐明了多少依赖变量; 有助于直接复发。

  • Balanced R-squared: A changed variant of r-squared that has been balanced for the number of indicators in the model; it increments if the new term improves the model more than would be normal by some coincidence and the other way around.

    平衡的R平方R平方的已更改变体,已经针对模型中的指标数量进行了平衡; 如果新术语对模型的改进程度比正常情况好一些(反之亦然),则它会增加。

变量之间关系的度量 (Measurement of Relationships between Variables)

  • Covariance: Measures the fluctuation between (at least two) factors. On the off chance that it's sure, at that point they will move in a similar way, in the event that it's negative, at that point they will in general move in inverse bearings, and on the off chance that they're zero, they have no connection to one another.

    协方差 :衡量(至少两个)因素之间的波动。 可以肯定的是,到那时它们将以类似的方式运动,如果它为负,则通常它们将反向移动,而当它们为零时,它们将以相反的方向运动。没有任何联系。

  • Correlation: Measures the quality of a connection between two factors and ranges from - 1 to 1; the standardized adaptation of covariance. By and large, a connection of +/ - 0.7 speaks to a solid connection between two factors. On the other side, connections between - 0.3 and 0.3 show that there is almost no connection between factors.

    相关 :测量两个因素之间的连接质量,范围为-1到1; 协方差的标准化适应。 总的来说,+ /-0.7的连接表示两个因素之间的牢固连接。 另一方面,-0.3和0.3之间的联系表明因素之间几乎没有联系。

概率分布函数 (Probability Distribution Functions)

  • Probability Density Function (PDF): A capacity for ceaseless data where the incentive anytime can be deciphered as giving a relative probability that the estimation of the irregular variable would rise to that example.

    概率密度函数(PDF) :一种不间断数据的能力,在这种能力下,可以随时将激励解释为给出不规则变量的估计将上升到该示例的相对概率。

  • Probability Mass Function (PMF): A capacity for discrete information that gives the likelihood of a given worth happening.

    概率质量函数(PMF) :离散信息的能力,给出给定价值发生的可能性。

  • Cumulative Density Function (CDF): A capacity that reveals to us the probability that an irregular variable is not exactly a specific worth; the basis of the PDF.

    累积密度函数(CDF) :一种能力,向我们揭示不规则变量不完全是特定价值的可能性; PDF的基础。

连续数据分配 (Continuous Data Distributions)

  • Uniform Distribution: Probability dissemination where all results are similarly likely.

    均匀分布 :概率分布 ,所有结果都有可能相似。

  • Normal/Gaussian Distribution: Regularly alluded to as the bell curve and is identified with central limit theorem; has a mean of 0 and a standard deviation of 1.

    正态/高斯分布 :通常被称为钟形曲线,并通过中心极限定理进行标识; 平均值为0,标准偏差为1。

statistics (2)

T-Distribution: Probability dissemination used to evaluate populace parameters when the example size is little and/r when the populace change is obscure.

T分布 :当样本量较小时和/或在人口变化不明显时,用于评估人口参数的概率分布

Chi-Square Distribution: Dissemination of the chi-square measurement.

卡方分布 :传播卡方测量。

离散数据分布 (Discrete Data Distributions)

  • Poisson Distribution: Probability dissemination that communicates the likelihood of a given number of occasions happening inside a fixed timeframe.

    泊松分布 :概率分布 ,用于传达在固定时间范围内发生给定次数的情况的可能性。

  • Binomial Distribution: Probability dissemination of the number of achievements in a succession of n autonomous encounters each with its Boolean-esteemed result (p, 1-p).

    二项式分布 :概率分布 n次连续的自动遭遇中每个成就的数量,每个自主遭遇都有布尔值估计的结果(p,1-p)。

片刻 (Moments)

Moments portray various parts of nature and state of circulation. The principal moment is the mean, the subsequent moment is the fluctuation, the third moment is the skewness, and the fourth moment is the kurtosis.

时刻刻画了自然的各个部分和循环状态。 主力矩是均值,随后力矩是波动,第三力矩是偏度,第四力矩是峰度。

可能性 (Probability)

Conditional Probability [P(A|B)] is the probability of an occasion happening, in light of the event of a past occasion.

条件概率[P(A | B)]是根据过去的事件发生的情况的概率。

Independent Event whose result doesn't impact the likelihood of the result of another occasion; P(A|B) = P(A).

独立事件,其结果不会影响其他情况下结果的可能性; P(A | B)= P(A)。

Mutually Exclusive events are events that can't happen at the same time; P(A|B) = 0.

互斥事件是不能同时发生的事件。 P(A | B)= 0。

Bayes' Theorem: A scientific recipe for deciding restrictive likelihood. "The probability of A given B is equal to the probability of B given A times the probability of A over the probability of B".

贝叶斯定理 :决定限制性可能性的科学方法。 “ A给定B的概率等于B给定A的概率乘以A的概率对B的概率”。

statistics (3)

准确性 (Accuracy)

  • True positive: Identifies the condition when the condition is available.

    真实肯定 :在条件可用时标识条件。

  • True negative: doesn't distinguish the condition when the condition is absent.

    真否定 :不存在条件时不区分条件。

  • False-positive: distinguishes the condition when the condition is missing.

    假阳性 :缺少条件时区分条件。

  • False-negative: doesn't distinguish the condition when the condition is available.

    假阴性 :在条件可用时不区分条件。

  • Sensitivity: otherwise called recall; quantifies the capacity of a test to distinguish the condition when the condition is available; sensitivity = TP/(TP+FN)

    敏感性 :否则称为召回; 在条件可用时量化测试区分条件的能力; 灵敏度= TP /(TP + FN)

  • Specificity: quantifies the capacity of a test to accurately reject the condition when the condition is missing; Specificity = TN/(TN+FP)

    特异性 :量化测试在条件缺失时准确拒绝条件的能力; 特异性= TN /(TN + FP)

  • Predictive value positive: otherwise called precision; the extent of positives that compare to the nearness of the condition; PVP = TP/(TP+FP)

    正预测值 :否则称为精度; 与条件的接近程度相比,阳性的程度; PVP = TP /(TP + FP)

  • Predictive value negative: the extent of negatives that compare to the nonattendance of the condition; PVN = TN/(TN+FN)

    预测值负数 :与条件的无人值守相比较的负数范围; PVN = TN /(TN + FN)

statistics (4)

假设检验及其统计意义 (Hypothesis Testing and Statistical Significance)

  • Null Hypothesis: The speculation that example perceptions result absolutely from possibility.

    零假设(Null Hypothesis)假设感知完全是由可能性引起的。

  • Alternative Hypothesis: The theory that example perceptions are affected by some non-irregular reason.

    替代假设 :理论感知受一些非常规原因影响的理论。

  • P-value: the likelihood of acquiring the watched aftereffects of a test, accepting that the invalid speculation is right; a littler p-value implies that there is more grounded proof for the elective theory.

    P值 :接受无效推测是正确的,获得测试的观察到的后效应的可能性; 较小的p值表示选修理论有更多扎实的证据。

  • Alpha: The essentialness level; the probability of dismissing the invalid theory when it is valid — otherwise called Type 1 error.

    Alpha :必要性级别; 无效理论成立时被驳回的可能性-否则称为1类错误。

  • Beta: type 2 mistake; neglecting to dismiss the false null hypothesis.

    Beta :类型2错误; 忽略了错误的虚假假设。

假设检验的步骤 (Steps to Hypothesis Testing)

  1. Express the invalid and elective theory

    表达无效选修理论

  2. Decide the test size; is it a couple or two-tailed test?

    确定测试大小; 是几尾还是两尾测试?

  3. Register the test measurement and the likelihood value

    注册测试测量值和似然值

  4. Dissect the outcomes and either dismiss or don't dismiss the invalid speculation

    剖析结果,或者驳斥或不驳斥无效的推测

翻译自: https://www.includehelp.com/data-science/statistics.aspx

数据科学和统计学

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/377179.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

java随机数生成(固定位数)

随机生成 a 到 b (不包含b)的整数:(int)(Math.random()*(b-a))a; 随机生成 a 到 b (包含b)的整数:(int)(Math.random()*(b-a1))a;转载于:https://www.cnblogs.com/zhwl/p/3624726.html

POJ 3670 Eating Together

POJ_3670 由于递增和递减是类似的,下面不妨只讨论变成递增序列的情况。 由于Di只有三个数,所以可以考虑将序列分割成三部分,第一部分全部变成1,第二部分全部变成2,第三部分全部变成3。然后我们枚举3开始的位置&#xf…

《MySQL——如何解决一主多从的读写分离的过期读问题》

目录两种架构两种架构特点强制走主库方案Sleep方案判断主备无延迟方案配合semi-sync等主库位点方案GTID方案两种架构 基于一主多从的读写分离,如何处理主备延迟导致的读写分离问题。 读写分离的主要目标:分摊主库压力。 有两种架构: 1、客…

json/ 发送形式_24/7的完整形式是什么?

json/ 发送形式24/7:二十四 (24/7: Twenty-Four Seven) 24/7 or 24-7 service, which generally marked "twenty-four seven" is service that is existing at any time and typically, every day in trade business and industry. Substitute orthograph…

《MySQL tips:并发查询与并发连接区别》

并发连接与并发查询,并不是一个概念。 在执行show processlist的结果里,看到了几千个连接,指的是并发连接。 而"当前正在执行"的语句,才是并发查询。 并发连接数多影响的是内存。 并发查询太高对CPU不利。一个机器的…

对上拉下拉电阻的作用作个总结(想了解的过来看看)(转载)

转自:http://www.amobbs.com/thread-5475279-1-3.html 一、定义:上拉就是将不确定的信号通过一个电阻嵌位在高电平!电阻同时起限流作用!下拉同理!上拉是对器件注入电流,下拉是输出电流;弱强只是…

给用户传入的变量进行转义操作

先看代码实现: /* 对用户传入的变量进行转义操作。*/ if (!get_magic_quotes_gpc()) {if (!empty($_GET)){$_GET addslashes_deep($_GET);}if (!empty($_POST)){$_POST addslashes_deep($_POST);}$_COOKIE addslashes_deep($_COOKIE);$_REQUEST addslashes_…

《MySQL——外部检测与内部统计 判断 主库是否出现问题》

目录select1判断查表判断更新判断外部检测弊端内部统计一主一备的双M架构里,主备切换只需要把客户端流量切换到备库。 在一主多从的架构里,主备切换要把客户端流量切换到备库,也需要把从库接到新主库上。 切换有两种场景:1、主动…

NIM的完整形式是什么?

NIM:无内部消息 (NIM: No Internal Message) NIM is an abbreviation of "No Internal Message". NIM是“无内部消息”的缩写。 It is an expression, which is commonly used in the Gmail platform. It is written in the subject of the mail, if the…

[Json] C#ConvertJson|List转成Json|对象|集合|DataSet|DataTable|DataReader转成Json (转载)...

点击下载 ConvertJson.rar 本类实现了 C#ConvertJson|List转成Json|对象|集合|DataSet|DataTable|DataReader转成Json|等功能大家先预览一下 请看代码 /// <summary> /// 类说明&#xff1a;Assistant /// 编 码 人&#xff1a;苏飞 /// 联系方式&#xff1a;361983679 …

let 只能在严格模式下吗_LET的完整形式是什么?

let 只能在严格模式下吗LET&#xff1a;今天早早离开 (LET: Leaving Early Today) LET is an abbreviation of "Leaving Early Today". LET是“ Leaveing Today Today”的缩写 。 It is an expression, which is commonly used in the Gmail platform. It is writt…

js 遮罩层 loading 效果

//调用方法 //关闭事件<button οnclickLayerHide()>关闭</button>&#xff0c;在loadDiv(text)中&#xff0c;剔除出来 //调用LayerShow(text)&#xff0c;text为参数&#xff0c;可以写入想要写入的提示语 //本方法在调用时会自动生成一个添加到body的div&#x…

centos6.5安装配置LDAP服务[转]

centos6.5安装配置LDAP服务[转] 安装之前查一下 1find / -name openldap*centos6.4默认安装了LDAP&#xff0c;但没有装ldap-server和ldap-client 于是yum安装 1su root2yum install -y openldap openldap-servers openldap-clients不建议编译源码包&#xff0c;有依赖比较麻烦…

《MySQL——恢复数据-误删行、表、库》

目录误删行事前预防误删行数据方法误删表/库延迟复制备库事前预防误删库/表方法传统的架构不能预防误删数据&#xff0c;因为主库的一个drop table命令&#xff0c;会通过binlog传给所有从库和级联从库&#xff0c;进而导致整个集群的实例都会执行这个命令。 MySQL相关的误删除…

python图例位置_Python | 图例位置

python图例位置Legends are one of the key components of data visualization and plotting. Matplotlib can automatically define a position for a legend in addition to this, it allows us to locate it in our required positions. Following is the list of locations…

Freemarker中遍历List实例

Freemarker中如何遍历List摘要&#xff1a;在Freemarker应用中经常会遍历List获取需要的数据&#xff0c;并对需要的数据进行排序加工后呈现给用户。那么在Freemarker中如何遍历List&#xff0c;并对List中数据进行适当的排序呢&#xff1f;通过下文的介绍&#xff0c;相信您一…

工作总结:文件对话框的分类(C++)

原文地址&#xff1a;http://www.jizhuomi.com/software/173.html 文件对话框分为打开文件对话框和保存文件对话框&#xff0c;相信大家在Windows系统中经常见到这两种文件对话框。例如&#xff0c;很多编辑软件像记事本等都有“打开”选项&#xff0c;选择“打开”后会弹出一个…

《MySQL——Innodb改进LRU算法》

Innodb改进LRU.算法&#xff0c;实质上将内存链表分成两段。 靠近头部的young和靠近末尾的old&#xff0c;取5/12段为分界。 新数据在一定时间内只能在old段的头部&#xff0c;当在old段保持了一定的时间后被再次访问才能升级到young。 实质上是分了两段lru&#xff0c;这样做的…

nfc/nfc模式_NFC的完整形式是什么?

nfc/nfc模式NFC&#xff1a;没有进一步评论 (NFC: No Further Comment) NFC is an abbreviation of "No Further Comment". NFC是“没有进一步评论”的缩写 。 It is an expression, which is commonly used in messaging or chatting on social media networking s…

dx小记(2)

1.构造一个平截台体&#xff08;Frustum&#xff09; 最近距离-projMatirx.43/projMatrix.33 projMatrix。33 深度/&#xff08;深度-最近距离&#xff09; projMatrix。44-最近距离*&#xff08;深度/&#xff08;深度-最近距离&#xff09;&#xff09; FrustumMatrix proje…