js值的拷贝和值的引用_到达P值的底部:直观的解释

js值的拷贝和值的引用

介绍 (Introduction)

Welcome to this lesson on calculating p-values.

欢迎参加有关计算p值的课程。

Before we jump into how to calculate a p-value, it’s important to think about what the p-value is really for.

在我们开始计算p值之前,考虑一下p值的真正意义很重要。

假设检验复习 (Hypothesis Testing Refresher)

Without going into too much detail for this post, when establishing a hypothesis test, you will determine a null hypothesis. Your null hypothesis represents the world in which the two variables your assessing don’t have any given relationship. Conversely the alternative hypothesis represents the world where there is a statistically significant relationship such that you’re able to reject the null hypothesis in favor of the alternative hypothesis.

在不进行过多介绍的情况下,建立假设检验时,您将确定原假设。 您的零假设代表了您评估的两个变量没有任何给定关系的世界。 相反,替代假设表示存在统计学上显着关系的世界,这样您就可以拒绝原假设,而支持替代假设。

深潜 (Diving Deeper)

Before we move on from the idea of hypothesis testing… think about what we just said. You effectively need to prove that with little room for error, what we’re seeing in the real world could not be taking place in a world where these variables are not related or in a world where the relationship is independent.

在继续进行假设检验的想法之前,请思考一下我们刚才所说的内容。 您实际上需要证明,在几乎没有错误余地的情况下,在这些变量不相关的世界或在关系独立的世界中,我们在现实世界中看到的东西不可能发生。

Sometimes when learning concepts in statistics, you hear the definition, but take little time to conceptualize. There is often a lot of memorization of rule sets… I find that understanding the intuitive foundation of these principles will serve you far better when finding their practical applications.

有时,当学习统计学中的概念时,您会听到定义,但是花很少的时间来概念化。 规则集通常记忆很多。我发现了解这些原理的直观基础将在您发现其实际应用时为您提供更好的服务。

Continuing on this vein of thought. If you want to compare your real world stat with the fake world, that’s exactly what you should do.

继续这种思想脉络。 如果您想将真实世界的统计数据与假世界进行比较,那正是您应该做的。

As you’d guess we can calculate our observed statistic by creating a linear regression model where we explain our response variable as a function of our explanatory variable. Once we’ve done this we can quantify the relationship between these two variables using the slope or coefficient identified through our ols regression.

如您所料,我们可以通过创建线性回归模型来计算观察到的统计数据,在该模型中,我们将响应变量解释为解释变量的函数。 完成此操作后,我们可以使用通过ols回归确定的斜率或系数来量化这两个变量之间的关系。

But now we need to come up with a this idea of the null world… or the world where these variables are independent. This is something we don’t have, so we’ll need to simulate it. For our convenience, we’re going to leverage the infer package.

但是,现在我们需要提出一个关于零世界 ……或这些变量是独立的世界的想法。 这是我们所没有的,因此我们需要对其进行仿真。 为了方便起见,我们将利用推断包。

让我们计算观察到的统计数据 (Let’s Calculate our Observed Statistic)

First things first, let’s get our observed statistic!

首先,让我们获取观察到的统计信息!

The dataset we’re working with is a Seattle home prices dataset. I’ve used this dataset many times before and find it particularly flexible for demonstration. The record level of the dataset is by home and details price, square footage, # of beds, # of baths, and so forth.

我们正在使用的数据集是西雅图房屋价格数据集。 我以前曾多次使用过该数据集,并发现它对于演示特别灵活。 数据集的记录级别是按房屋和详细信息,价格,平方英尺,床位数,浴室数量等等。

Through the course of this post, we’ll be trying to explain price through a function of square footage.

在本文的整个过程中,我们将尝试通过平方英尺的功能来解释价格。

Let’s create our regression model

让我们创建回归模型

fit <- lm(price_log ~ sqft_living_log,
data = housing)
summary(fit)
Image for post

As you can see in the output above, the statistic we’re after is the Estimate for our explanatory variable, sqft_living_log.

如您在上面的输出中看到的,我们需要的统计信息是我们的解释变量sqft_living_logEstimate

A very clean way to do this is to tidy our results such that rather than a linear model, we get a tibble. Tibbles, tables, or data frames are going to make it a lot easier for us to systematically interact with.

一种非常干净的方法是整理我们的结果,使我们得到的不是线性模型,而是小标题。 标语,表格或数据框将使我们更轻松地进行系统地交互。

We’ll then want to filter down to the sqft_living_log term and we'll wrap it up by using the pull function to return the estimate itself. This will return the slope as a number, which will make things easier to compare with our null distribution later on.

然后,我们希望过滤到sqft_living_log项,并使用pull函数返回估计值本身来对其进行包装。 这将以数字形式返回斜率,这将使以后更容易与空分布进行比较。

Take a look!

看一看!

lm(price_log ~ sqft_living_log,
data = housing)%>%
tidy()%>%
filter(term == 'sqft_living_log')%>%
pull(estimate)

是时候模拟了! (Time to Simulate!)

To kick things off, you should know there are various types of simulation. The one we’ll be using here is what’s called permutation.

首先,您应该知道有各种类型的模拟。 我们将在这里使用的就是所谓的permutation

Permutation is particularly helpful when it comes to showing a world where variables are independent of one another.

当显示一个变量相互独立的世界时,排列特别有用。

While we won’t be going into the specifics of how a permutation sample is created under the hood; it’s worth noting that the sample will be normal and center around 0 for the observed statistic.

虽然我们不会详细介绍如何在后台创建排列样本; 值得注意的是,样本将是正常的,并且在观察到的统计数据的中心大约为0。

In this case, the slope would center around 0 as we’re operating under the premise that there is no relationship between our explanatory and response variables.

在这种情况下,当我们在解释变量和响应变量之间没有关系的前提下进行操作时,斜率将以0为中心。

推断基本原理 (Infer Fundamentals)

A few things for you to know:

您需要了解的几件事:

  • specify is how we determine the relationship we’re modeling: price_log~sqft_living_log

    指定如何确定我们正在建模的关系: price_log~sqft_living_log

  • hypothesize is where we designate independence

    假设是我们指定independence

  • generate is how we determine the number of replications of our dataset we want to make. Note that if you did, one replicate and did not calculate it would return a sample dataset of the same size as the original dataset.

    generate是我们确定要复制的数据集的数量的方式。 请注意,如果您这样做了,则一次重复但不进行calculate将返回与原始数据集大小相同的样本数据集。

  • calculate allows you to determine the calculation in question (slope, mean, median, diff in means, etc.)

    计算可让您确定相关的计算(斜率,均值,中位数,均值差异等)
library(infer)
set.seed(1) perm <- housing %>%
specify(price_log ~ sqft_living_log) %>%
hypothesize(null = 'independence') %>%
generate(reps = 100, type = 'permute') %>%
calculate('slope')perm
hist(perm$stat)
Image for post

Same distribution with 1000 reps

分配相同,重复1000次

Image for post

空采样分布 (Null Sampling Distribution)

Ok we’ve done it! We’ve created what is known as the null sampling distribution. What we’re seeing above is a distribution of 1000 slopes each modeled after 1000 simulations of independent data.

好的,我们完成了! 我们创建了所谓的空采样分布。 上面我们看到的是1000个坡度的分布,每个坡度都是在独立数据进行1000次模拟之后建模的。

This gives us just what we needed. A simulated world against which we can compare reality.

这给了我们我们所需要的。 一个可以与现实进行比较的模拟世界。

Taking the visual we just made, let’s use a density plot and add a vertical line for our observed slope, marked in red.

以我们刚刚制作的视觉效果,让我们使用密度图,并为观察到的斜率添加一条垂直线,用红色标记。

ggplot(perm, aes(stat)) + 
geom_density()+
geom_vline(xintercept = obs_slope, color = 'red')
Image for post

Visually, you can see that this is happening far beyond the occurrences of random chance.

从视觉上,您可以看到这种情况远远超出了随机机会的发生。

As you can guess from visually looking at this the p-value here is going to be 0. As to say, in 0% of the null sampling distribution is greater than or equal to our observed statistic.

从视觉上可以看出,这里的p值将为0。也就是说,在0%的原始抽样分布中,大于或等于我们观察到的统计量。

If in fact we were seeing cases where our permuted data was greater than or equal to our observed statistic, we would know that it was just random.

如果实际上我们看到的是排列的数据大于或等于观察到的统计数据的情况,那么我们将知道它只是随机的。

The reiterate the message here, the purpose of p-value is to give you an idea of how feasible it is that we saw such a slope randomly versus a statistically significant relationship.

在此重申此信息,p值的目的是让您了解我们随机看到这样的斜率与统计上显着的关系是多么可行。

计算P值 (Calculating P-value)

While we know what our p-value will be here, let’s get you set up with the calculation for p-value.

虽然我们知道这里的p值将是多少,但让我们开始设置p值的计算。

To re-prime this idea; p-value is the portion of replicates that were (randomly) greater than or equal to our observed slope.

重新提出这个想法; p值是重复(随机)大于或等于我们观察到的斜率的部分。

You’ll see in our summarise function that we're checking to see whether our stat or slope is greater than or equal to the observed slope. Each record will be assigned TRUE or FALSE accordingly.. When you wrap that in a mean function, TRUE will represent 1 and FALSE 0, resulting in a proportion of the cases stat was greater than or equal to our observed slope.

您将在summarise功能中看到,我们正在检查统计数据或斜率是否大于或等于观察到的斜率。 每条记录将被相应地分配为TRUE或FALSE。当您将其包装在平均值函数中时,TRUE将代表1,而FALSE为0,从而导致部分情况stat大于或等于我们观察到的斜率。

perm %>%
summarise(p_val = 2 * round(mean(stat >= obs_slope),2))

For the sake of identifying the case of a weaker relationship in which we would not have sufficient evidence to reject the null hypothesis, let’s look at price explained as a function of the year it was built.

为了确定关系较弱的情况,在这种情况下我们将没有足够的证据来拒绝原假设,让我们看一下价格作为其建立年份的函数。

Image for post

Using the same calculation as above, this results in a p-value of 12%; which according to a standard confidence level of 95%, is not sufficient evidence to reject the null hypothesis.

使用与上述相同的计算,得出的p值为12%; 根据95%的标准置信度,这不足以拒绝原假设。

关于P值解释的最终说明 (Final Notes on P-value Interpretation)

One final thing I want to highlight just one more time….

最后一件事,我想再强调一次。

The meaning of 12%. We saw that when we randomly generated an independent sample… a whole 12% of the time, our randomly generated slope was as or more extreme…

意思是12%。 我们看到,当我们随机生成一个独立样本时……整整12%的时间里,我们随机生成的斜率等于或大于极限。

You might see such a result as much as 12% just due to random chance

由于随机机会,您可能会看到多达12%的结果

结论 (Conclusion)

That’s it! You’re a master of the calculating & understanding p-value.

而已! 您是计算和理解p值的大师。

In a few short minutes we have learned a lot:

在短短的几分钟内,我们学到了很多:

  • hypothesis testing

    假设检验
  • linear regression refresher

    线性回归更新
  • sampling explanation

    抽样说明
  • learning about infer package

    了解推断包
  • building a sampling distribution

    建立抽样分布
  • visualizing p-value

    可视化p值
  • calculating p-value

    计算p值

It’s easy to get lost when dissecting statistics concepts like p-value. My hope is that having a strong foundational understanding of the need and corresponding execution allows you to understand and correctly apply this to any variety of problems.

剖析p值之类的统计概念时,很容易迷失方向。 我希望对需求和相应的执行有深刻的基础理解,使您能够理解并正确地将其应用于各种问题。

If this was helpful, feel free to check out my other posts at https://medium.com/@datasciencelessons. Happy Data Science-ing!

如果这有帮助,请随时通过https://medium.com/@datasciencelessons查看我的其他帖子。 快乐数据科学!

翻译自: https://towardsdatascience.com/getting-to-the-bottom-of-p-value-the-intuitive-explanation-calculation-fec46bb15a92

js值的拷贝和值的引用

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/391600.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

监督学习-KNN最邻近分类算法

分类&#xff08;Classification&#xff09;指的是从数据中选出已经分好类的训练集&#xff0c;在该训练集上运用数据挖掘分类的技术建立分类模型&#xff0c;从而对没有分类的数据进行分类的分析方法。 分类问题的应用场景&#xff1a;用于将事物打上一个标签&#xff0c;通常…

无监督学习-主成分分析和聚类分析

聚类分析&#xff08;cluster analysis&#xff09;是将一组研究对象分为相对同质的群组&#xff08;clusters&#xff09;的统计分析技术&#xff0c;即将观测对象的群体按照相似性和相异性进行不同群组的划分&#xff0c;划分后每个群组内部各对象相似度很高&#xff0c;而不…

struts实现分页_在TensorFlow中实现点Struts

struts实现分页If you want to get started on 3D Object Detection and more specifically on Point Pillars, I have a series of posts written on it just for that purpose. Here’s the link. Also, going through the Point Pillars paper directly will be really help…

MySQL-InnoDB索引实现

联合索引提高查询效率的原理 MySQL会为InnoDB的每个表建立聚簇索引&#xff0c;如果表有索引会建立二级索引。聚簇索引以主键建立索引&#xff0c;如果没有主键以表中的唯一键建立&#xff0c;唯一键也没会以隐式的创建一个自增的列来建立。聚簇索引和二级索引都是一个b树&…

钉钉设置jira机器人_这是当您机器学习JIRA票证时发生的事情

钉钉设置jira机器人For software developers, one of the most-debated and maybe even most-hated questions is “…and how long will it take?”. I’ve experienced those discussions myself, which oftentimes lacked precise information on the requirements. What I…

vscode 标准库位置_如何在VSCode中使用标准

vscode 标准库位置I use Visual Studio Code as my text editor. When I write JavaScript, I follow JavaScript Standard Style.Theres an easy way to integrate Standard in VS Code—with the vscode-standardjs plugin. I made a video for this some time ago if youre …

IBM量子计算新突破:成功构建50个量子比特原型机

本文来自AI新媒体量子位&#xff08;QbitAI&#xff09;IBM去年开始以云计算服务的形式提供量子计算能力。当时&#xff0c;IBM发布了包含5个量子比特的计算机。在短短18个月之后&#xff0c;IBM周五宣布&#xff0c;将发布包含20个量子比特的计算机。 IBM还宣布&#xff0c;该…

小程序点击地图气泡获取气泡_气泡上的气泡

小程序点击地图气泡获取气泡Combining two colors that are two steps apart on the Color Wheel creates a Diad Color Harmony. This Color Harmony is one of the lesser used ones. I decided to cover it here to add variety to your options for colorizing visualizati…

PopTheBubble —测量媒体偏差的产品创意

产品管理 (Product Management) A couple of months ago, I decided to try something new. The MVP Lab by Mozilla is an 8-week incubator for pre-startup teams to explore product concepts and, over the 8 weeks of the program, ship a minimum viable product that p…

linux-Centos7安装nginx

首先配置linux环境&#xff0c;我这里是刚刚装好linux&#xff0c;所以一次性安装了一系列我需要到的环境&#xff1b; yum install pcre pcre-devel zlib zlib-devel openssl openssl-devel gd gd-devel libjpeg libjpeg-devel libpng libpng-devel freetype freetype-devel e…

elasticsearch,elasticsearch-service安装

在Windows上安装Elasticsearch.zip 1 安装条件 安装需具备java 8或更高版本&#xff1b;官方的Oracle发行版&#xff0c;只需安装JDKElasticsearch的ZIP安装包——安装包地址 2 如何安装 Elasticsearch 傻瓜式的点下一步即可&#xff0c; java 注意环境变量配置 3 如何判断安装…

图表可视化seaborn风格和调色盘

seaborn是基于matplotlib的python数据可视化库&#xff0c;提供更高层次的API封装&#xff0c;包括一些高级图表可视化等工具。 使用seaborn需要先安装改模块pip3 install seaborn 。 一、风格style 包括set() / set_style() / axes_style() / despine() / set_context() 创建正…

面向Tableau开发人员的Python简要介绍(第3部分)

用PYTHON探索数据 (EXPLORING DATA WITH PYTHON) One of Tableau’s biggest advantages is how it lets you swim around in your data. You don’t always need a fine-tuned dashboard to find meaningful insights, so even someone with quite a basic understanding of T…

7、芯片发展

第一台继电器式计算机由康德拉.楚泽制造&#xff08;1910-1995&#xff09;&#xff0c;这台机器使用了二进制数&#xff0c;但早期版本中使用的是机械存储器而非继电器&#xff0c;使用老式35毫米电影胶片进行穿孔编程。 同一时期&#xff0c;哈佛大学研究生霍华德.艾肯 要寻找…

seaborn分布数据可视化:直方图|密度图|散点图

系统自带的数据表格&#xff08;存放在github上https://github.com/mwaskom/seaborn-data&#xff09;&#xff0c;使用时通过sns.load_dataset(表名称)即可&#xff0c;结果为一个DataFrame。 print(sns.get_dataset_names()) #获取所有数据表名称 # [anscombe, attention, …

pymc3使用_使用PyMC3了解飞机事故趋势

pymc3使用Visually exploring historic airline accidents, applying frequentist interpretations and validating changing trends with PyMC3.使用PyMC3直观地浏览历史性航空事故&#xff0c;应用常识性解释并验证变化趋势。 前言 (Preface) On the 7th of August this yea…

爬虫结果数据完整性校验

数据完整性分为三个方面&#xff1a; 1、域完整性&#xff08;列&#xff09; 限制输入数据的类型&#xff0c;及范围&#xff0c;或者格式&#xff0c;如性别字段必须是“男”或者“女”&#xff0c;不允许其他数据插入&#xff0c;成绩字段只能是0-100的整型数据&#xff0c;…

go map数据结构

map数据结构 key-value的数据结构&#xff0c;又叫字典或关联数组 声明&#xff1a;var map1 map[keytype]valuetype var a map[string]string var a map[string]int var a map[int]string var a map[string]map[string]string备注&#xff1a;声明是不会分配内存的&#xff0c…

吴恩达神经网络1-2-2_图神经网络进行药物发现-第2部分

吴恩达神经网络1-2-2预测毒性 (Predicting Toxicity) 相关资料 (Related Material) Jupyter Notebook for the article Jupyter Notebook的文章 Drug Discovery with Graph Neural Networks — part 1 图神经网络进行药物发现-第1部分 Introduction to Cheminformatics 化学信息…

Android热修复之 - 阿里开源的热补丁

1.1 基本介绍     我们先去github上面了解它https://github.com/alibaba/AndFix 这里就有一个概念那就AndFix.apatch补丁用来修复方法&#xff0c;接下来我们看看到底是怎么实现的。1.2 生成apatch包      假如我们收到了用户上传的崩溃信息&#xff0c;我们改完需要修复…