r a/b 测试_R中的A / B测试

r a/b 测试

什么是A / B测试? (What is A/B Testing?)

A/B testing is a method used to test whether the response rate is different for two variants of the same feature. For instance, you may want to test whether a specific change to your website like moving the shopping cart button to the top right hand corner of your web page instead of on the right hand panel changes the number of people that click on the shopping cart and buy a product.

A / B测试是一种用于测试同一功能的两个变体的响应率是否不同的方法。 例如,您可能想测试对网站的特定更改(例如将购物车按钮移至网页的右上角而不是在右侧面板上)是否会更改点击购物车的人数,以及购买产品。

A/B testing is also called split testing where two variants of the same web page are shown to different samples from your population of visitors to the website at the same time. Then, the number of conversions are compared for the two variants. Generally, the variant that gives a higher proportion of variants is the winning variant.

A / B测试也称为拆分测试,在该测试中,同一网页的两个变体会同时显示来自您网站访问者群体的不同样本。 然后,比较两个变体的转化次数。 通常,给出较高比例变体的变体是获胜变体。

However, as this is a data science blog, we want to ensure that the difference in proportion of conversions for the two variants is statistically significant. We may also want to understand what attributes of the visitors is driving those conversions. So, let’s move on to your data problem.

但是,由于这是一个数据科学博客,我们希望确保两个变体的转换比例差异在统计上是显着的。 我们可能还想了解访问者的哪些属性正在推动这些转化。 因此,让我们继续您的数据问题。

数据问题 (The Data Problem)

  • An A/B test was recently run and the Product Manager of your company wants to know whether the new variant of the web page resulted in more conversions. Make a recommendation to your Product Manager based on your analysis

    最近运行了A / B测试,您公司的产品经理想知道网页的新版本是否带来了更多的转化。 根据您的分析向产品经理提出建议
  • The CRM Manager is interested in knowing how accurately we can predict whether users are likely to engage with our emails based on the attributes we collected about the users when they first visit the website. Report back to the CRM Manager on your findings.

    CRM经理有兴趣了解如何根据用户首次访问网站时收集到的有关用户的属性来预测用户是否可能与我们的电子邮件互动。 向您的CRM报告报告您的发现。

数据集 (The Dataset)

Four datasets are provided.

提供了四个数据集。

  • Visits contains data from 10,000 unique users and has the following columns:

    访问次数包含来自10,000个唯一用户的数据,并包含以下列:
  • user_id: unique identifier for the user

    user_id:用户的唯一标识符
  • visit_time: timestamp indicating date and time of visit to website

    visit_time:表示网站访问日期和时间的时间戳记
  • channel: marketing channel that prompted the user to visit the website

    渠道:提示用户访问网站的营销渠道
  • age: user’s age at time of visiting website

    年龄:用户访问网站时的年龄
  • gender: user’s gender

    性别:用户的性别
  • Email engagement contains data on those users that engaged with a recent email campaign. The file contains the following columns:

    电子邮件参与度包含有关最近参与电子邮件活动的那些用户的数据。 该文件包含以下列:
  • user_id: unique identifier for the user

    user_id:用户的唯一标识符
  • clicked_on_email: flag to indicate that the user engaged with the email where 1 indicates that the user clicked on the email

    clicked_on_email:标志,表示用户与电子邮件互动,其中1表示用户单击了电子邮件
  • Variations contains data indicating which of the variations each user saw of the A/B test. The file has the following columns:

    变体包含指示每个用户在A / B测试中看到了哪些变体的数据。 该文件包含以下列:
  • user_id: unique identifier for the user

    user_id:用户的唯一标识符
  • variation: variation (control or treatment) that the user saw

    差异:用户看到的差异(控制或处理)
  • Test conversions contains data on those users that converted as a result of the A/B test. The file contains the following columns:

    测试转换包含有关由于A / B测试而转换的用户的数据。 该文件包含以下列:
  • user_id: unique identifier for the user

    user_id:用户的唯一标识符
  • converted: flag to indicate that the user converted (1 for converted

    convert:标志,指示用户已转换(1表示已转换

导入数据集并清理 (Importing the dataset and cleaning)

I always start by first combining the files using a primary key or a unique identifier. I then decide what to do with the data. I find this approach useful as I can get rid of what I don’t need later. It also helps me view the dataset on a holistic level.

我总是首先使用主键或唯一标识符组合文件。 然后,我决定如何处理数据。 我发现这种方法很有用,因为我以后可以摆脱不需要的东西。 这也有助于我全面地查看数据集。

In this instance, our unique identifier is user_id. After merging the files using the following code,

在这种情况下,我们的唯一标识符是user_id。 使用以下代码合并文件后,

merge_1<-merge(variations_df,visits_df,by.x="user_id",by.y="user_id")  
merge_2<-merge(merge_1,test_conv_df,by.x="user_id",by.y="user_id",all.x=TRUE)
merge_3<-merge(merge_2,eng_df,by.x="user_id",by.y="user_id",all.x=TRUE)

I discovered that I had to create my own binary variable for whether or not a user converted and whether or not they had clicked on an email. This was based on their user ID not being found in the test_conversions.csv and email_engagement.csv files. I did this by replacing all “NA”s with 0's.

我发现我必须创建自己的二进制变量来确定用户是否转换以及他们是否单击了电子邮件。 这是基于在test_conversions.csv和email_engagement.csv文件中找不到用户ID的原因。 我通过将所有“ NA”替换为0来做到这一点。

merge_3$converted<-if_else(is.na(merge_3$converted),0,1)  
merge_3$clicked_on_email<-if_else(is.na(merge_3$clicked_on_email),0,1)
merge_3$converted<-as.factor(merge_3$converted)
merge_3$clicked_on_email<-as.factor(merge_3$clicked_on_email)

The next task was to convert variables like visit time into information that would provide meaningful information on the users.

下一个任务是将诸如访问时间之类的变量转换为可以为用户提供有意义信息的信息。

merge_3$timeofday<-  mapvalues(hour(merge_3$visit_time),from=c(0:23),  
to=c(rep("night",times=5), rep("morning",times=6),rep("afternoon",times=5),rep("night", times=8)))
merge_3$timeofday<-as.factor(merge_3$timeofday)

Now, that the data had been cleaned it was time to explore the data to understand whether there was an association between user conversion and the variation they visited on the website.

现在,已经清理了数据,是时候探索数据了,以了解用户转换与他们在网站上访问的变化之间是否存在关联。

数据探索和可视化 (Data Exploration and Visualization)

The simplest aspect of the data to check for is to determine whether there is indeed a difference in the proportion of users that converted based on the type of variation they viewed. Running the code provided at the end of the blog post gives the following graph and proportions:

要检查的数据最简单的方面是,根据他们查看的变化类型来确定转化用户的比例是否确实存在差异。 运行博客文章末尾提供的代码将给出以下图形和比例:

control : 0.20 treatment : 0.24

控制:0.20处理:0.24

统计测试对A / B测试的重要性 (Statistical testing for significance of A/B Testing)

To test whether the difference in proportions is statistically significant, we can either carry out a difference in proportions test or a chi-squared test of independence where the null hypothesis is that there is no association between whether or not a user converted and the type of variation they visited.

为了检验比例差异是否在统计上具有显着性,我们可以进行比例差异检验或独立性的卡方检验,其中零假设是用户是否转换与用户类型之间没有关联。他们参观的变化。

For both tests, a p-value < 0.05 was observed indicating a statistically significant difference in proportions.

对于两种测试,均观察到p值<0.05,表明各比例的统计学差异显着。

I went a step further and ran logistic regression to understand how the other attributes of the users contributed to the difference in proportions. Only the type of variation and income (p-values less than 0.05) appeared to contribute to the difference in conversion proportions. A calculation of McFadden’s R-squared tells us that only 12.94% of the variation in proportions can be explained by the variation type and user attributes provided within our dataset. Hence, my response to the Product Manager would be as follows:

我走了一步,并进行了逻辑回归,以了解用户的其他属性如何导致比例差异。 仅差异类型和收入类型(p值小于0.05)对转化比例的差异有所贡献。 麦克法登(McFadden)的R平方计算表明,只有12.94%的比例变化可以由我们数据集中提供的变化类型和用户属性来解释。 因此,我对产品经理的回复如下:

There is a statistically significant difference in conversion rates for those that visited the treatment variation vs the control variation. However, it is difficult to understand why this is the case. It would be best to repeat this test 2–3 more times to cross-validate results.

访视治疗差异与对照差异的转化率在统计上存在显着差异。 但是,很难理解为什么会这样。 最好再重复进行2-3次此测试以交叉验证结果。

探索性数据分析,以了解用户参与电子邮件的动因 (Exploratory Data Analysis to understand drivers of user engagement with emails)

Barplots were produced to check for a visual relationship between user attributes and whether or not they clicked on an email.

制作了条形图,以检查用户属性之间的视觉关系以及它们是否单击了电子邮件。

Image for post
Image for post
Image for post

While running the exploratory data analysis, I noticed that the age was missing for 1,243 users. These users were omitted from analysis as I cannot impute their ages without any knowledge. Boxplots and numerical summaries were produced to understand any difference in average age of users that clicked on emails.

在进行探索性数据分析时,我注意到1,243位用户缺少该年龄。 由于我无法在没有任何知识的情况下估算他们的年龄,因此从分析中忽略了这些用户。 制作了箱线图和数字摘要,以了解单击电子邮件的用户平均年龄的任何差异。

It was found that those that clicked on emails (“1”) on average had higher income than those that didn’t. However, both groups have very high standard deviations, thus income does not appear to be a useful indicator.

结果发现,平均而言,点击电子邮件的人(“ 1”)的收入要高于没有点击电子邮件的人。 但是,两组的标准差都很高,因此收入似乎不是有用的指标。

使用统计建模进行重要性测试 (Using statistical modelling for significance testing)

The dataset was randomly split into training (70%) and test (30%) sets for modelling. Logistic regression was run to determine which attributes had a statistically significant contribution in explaining whether users clicked or did not click on an email.

数据集被随机分为训练(70%)和测试(30%)集以进行建模。 运行Logistic回归以确定在解释用户是否单击电子邮件时,哪些属性在统计上具有重要作用。

The model was trained on the training set and predictions were carried out on the test set for accuracy. An ROC curve was generated by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings. The AUC is the area under the ROC curve. As a rule of thumb, a model with good predictive ability should have an AUC closer to 1 (1 is ideal) than to 0.5. In our example, we have an AUC of 0.84, showing pretty good accuracy.

在训练集上对模型进行了训练,并在测试集上进行了准确性的预测。 通过在各种阈值设置下绘制真实阳性率(TPR)相对于阴性阳性率(FPR)绘制ROC曲线。 AUC是ROC曲线下的面积。 根据经验,具有良好预测能力的模型的AUC应该接近于1(理想值为1)而不是接近0.5。 在我们的示例中,我们的AUC为0.84,显示出非常好的准确性。

Image for post

Though the score is good, it would be good to carry out some form of cross-validation to validate the results further and ensure reproducibility.

尽管分数不错,但最好进行某种形式的交叉验证以进一步验证结果并确保可重复性。

A summary of the logistic regression model confirms what we saw visually that the top predictors of the likelihood of a user clicking on an email are:

logistic回归模型的摘要确认了我们在视觉上看到的结果,即用户单击电子邮件的可能性最大的预测因素是:

- channel

-频道

- age

-年龄

- gender

- 性别

My response to the CRM Manager would be that the top predictors of email conversion are age (older users are more likely to click), channel (PPC being popular amongst users that click) and gender (males are more likely to click than females). However, I would like to validate these results via a larger sample to allow for cross-validation.

我对CRM Manager的回答是,电子邮件转换的主要预测因素是年龄(老用户点击的可能性更高),渠道(PPC在点击用户中很受欢迎)和性别(男性比女性更有可能点击)。 但是,我想通过更大的样本来验证这些结果,以便进行交叉验证。

最后的想法 (Final Thoughts)

Hopefully, this blog post has demystified A/B testing to some extent, given you some ways to test for statistical significance and shown you how exploratory data analysis and statistical testing work together to validate results.

希望该博客文章在一定程度上消除了A / B测试的神秘性,为您提供了一些测试统计意义的方法,并向您展示了探索性数据分析和统计测试如何共同验证结果。

Please note that a very small sample size was used in this example (around 4000 users) and as such it did not make sense to run and train a complex machine learning algorithm.

请注意,在此示例中使用了非常小的样本量(大约4000个用户),因此运行和训练复杂的机器学习算法没有意义。

I would love your feedback and suggestions and all useful code is provided below and on github for download. :)

我希望收到您的反馈和建议,所有有用的代码都在下面和github上提供,以供下载。 :)

https://gist.github.com/shedoesdatascience/de3c5d3c2c88132339347c7da838a126

https://gist.github.com/shedoesdatascience/de3c5d3c2c88132339347c7da838a126

翻译自: https://towardsdatascience.com/a-b-testing-in-r-ae819ce30656

r a/b 测试

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/388464.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Java基础回顾

内容&#xff1a; 1、Java中的数据类型 2、引用类型的使用 3、IO流及读写文件 4、对象的内存图 5、this的作用及本质 6、匿名对象 1、Java中的数据类型 Java中的数据类型有如下两种&#xff1a; 基本数据类型: 4类8种 byte(1) boolean(1) short(2) char(2) int(4) float(4) l…

计算机部分应用显示模糊,win10系统打开部分软件字体总显示模糊的解决方法-电脑自学网...

win10系统打开部分软件字体总显示模糊的解决方法。方法一&#xff1a;win10软件字体模糊1、首先&#xff0c;在Win10的桌面点击鼠标右键&#xff0c;选择“显示设置”。2、在“显示设置”的界面下方&#xff0c;点击“高级显示设置”。3、在“高级显示设置”的界面中&#xff0…

Tomcat调节

Tomcat默认可以使用的内存为128MB&#xff0c;在较大型的应用项目中&#xff0c;这点内存是不够的&#xff0c;需要调大,并且Tomcat本身不能直接在计算机上运行&#xff0c;需要依赖于硬件基础之上的操作系统和一个java虚拟机。 AD&#xff1a; 这里向大家描述一下如何使用Tom…

turtle 20秒画完小猪佩奇“社会人”

转载&#xff1a;https://blog.csdn.net/csdnsevenn/article/details/80650456 图片源自网络 作者 丁彦军 如需转载&#xff0c;请联系原作者授权。 今年社交平台上最火的带货女王是谁&#xff1f;范冰冰&#xff1f;杨幂&#xff1f;Angelababy&#xff1f;不&#xff0c;是猪…

最佳子集aic选择_AutoML的起源:最佳子集选择

最佳子集aic选择As there is a lot of buzz about AutoML, I decided to write about the original AutoML; step-wise regression and best subset selection. Then I decided to ignore step-wise regression because it is bad and should probably stop being taught. That…

Java虚拟机内存溢出

最近在看周志明的《深入理解Java虚拟机》&#xff0c;虽然刚刚开始看&#xff0c;但是觉得还是一本不错的书。对于和我一样对于JVM了解不深&#xff0c;有志进一步了解的人算是一本不错的书。注明&#xff1a;不是书托&#xff0c;同样是华章出的书&#xff0c;质量要比《深入剖…

用户输入汉字时计算机首先将,用户输入汉字时,计算机首先将汉字的输入码转换为__________。...

用户的蓄的形能器常见式有。输入时计算机首先输入包括药物具有基的酚羟。汉字换物包腺皮括质激肾上素药。对既荷又有线有相间负负荷时&#xff0c;将汉倍作为等选取相负效三相负荷乘荷最大&#xff0c;将汉相负荷换荷应先将线间负算为&#xff0c;效三相负荷时在计算等&#xf…

从最终用户角度来看外部结构_从不同角度来看您最喜欢的游戏

从最终用户角度来看外部结构The complete python code and Exploratory Data Analysis Notebook are available at my github profile;完整的python代码和Exploratory Data Analysis Notebook可在我的github个人资料中找到 &#xff1b; Pokmon is a Japanese media franchise,…

apache+tomcat配置

无意间看到tomcat 6集群的内容&#xff0c;就尝试配置了一下&#xff0c;还是遇到很多问题&#xff0c;特此记录。apache服务器和tomcat的连接方法其实有三种:JK、http_proxy和ajp_proxy。本文主要介绍最为常见的JK。 环境&#xff1a;PC2台&#xff1a;pc1(IP 192.168.88.118…

记自己在spring中使用redis遇到的两个坑

本人在spring中使用redis作为缓存时&#xff0c;遇到两个坑&#xff0c;现在记录如下&#xff0c;算是作为自己的备忘吧&#xff0c;文笔不好&#xff0c;望大家见谅&#xff1b; 一、配置文件 1 <!-- 加载Properties文件 -->2 <bean id"configurer" cl…

Azure实践之如何批量为资源组虚拟机创建alert

通过上一篇的简介&#xff0c;相信各位对于简单的创建alert&#xff0c;以及Azure monitor使用以及大概有个印象了。基础的使用总是非常简单的&#xff0c;这里再分享一个常用的alert使用方法实际工作中&#xff0c;不管是日常运维还是做项目&#xff0c;我们都需要知道VM的实际…

管道过滤模式 大数据_大数据管道配方

管道过滤模式 大数据介绍 (Introduction) If you are starting with Big Data it is common to feel overwhelmed by the large number of tools, frameworks and options to choose from. In this article, I will try to summarize the ingredients and the basic recipe to …

DevOps时代,企业数字化转型需要强大的工具链

伴随时代的飞速进步&#xff0c;中国的人口红利带来了互联网业务的快速发展&#xff0c;巨大的流量也带动了技术的不断革新&#xff0c;研发的模式也在不断变化。传统企业纷纷效仿互联网的做法&#xff0c;结合DevOps进行数字化的转型。通常提到DevOps&#xff0c;大家浮现在脑…

用户体验可视化指南pdf_R中增强可视化的初学者指南

用户体验可视化指南pdfLearning to build complete visualizations in R is like any other data science skill, it’s a journey. RStudio’s ggplot2 is a useful package for telling data’s story, so if you are newer to ggplot2 and would love to develop your visua…

linux挂载磁盘阵列

linux挂载磁盘阵列 在许多项目中&#xff0c;都会把数据存放于磁盘阵列&#xff0c;以确保数据安全或者实现负载均衡。在初始安装数据库系统和数据恢复时&#xff0c;都需要先挂载磁盘阵列到系统中。本文记录一次在linux系统中挂载磁盘的操作步骤&#xff0c;以及注意事项。 此…

sql横着连接起来sql_SQL联接的简要介绍(到目前为止)

sql横着连接起来sqlSQL Join是什么意思&#xff1f; (What does a SQL Join mean?) A SQL join describes the process of merging rows in two different tables or files together.SQL连接描述了将两个不同表或文件中的行合并在一起的过程。 Rows of data are combined bas…

《Python》进程收尾线程初识

一、数据共享 from multiprocessing import Manager 把所有实现了数据共享的比较便捷的类都重新又封装了一遍&#xff0c;并且在原有的multiprocessing基础上增加了新的机制list、dict 机制&#xff1a;支持的数据类型非常有限 list、dict都不是数据安全的&#xff0c;需要自己…

北京修复宕机故障之旅

2012-12-18日 下午开会探讨北京项目出现的一些问题&#xff0c;当时记录的问题是由可能因为有一定数量的客户上来后&#xff0c;就造成了Web服务器宕机&#xff0c;而且没有任何时间上的规律性&#xff0c;让我准备出差到北京&#xff0c;限定三天时间&#xff0c;以及准备测试…

一般线性模型和混合线性模型_从零开始的线性混合模型

一般线性模型和混合线性模型生命科学的数学统计和机器学习 (Mathematical Statistics and Machine Learning for Life Sciences) This is the eighteenth article from the column Mathematical Statistics and Machine Learning for Life Sciences where I try to explain som…

《企业私有云建设指南》-导读

内容简介第1章总结性地介绍了云计算的参考架构、典型解决方案架构和涉及的关键技术。 第2章从需求分析入手&#xff0c;详细讲解了私有云的技术选型、资源管理、监控和运维。 第3章从计算、网络、存储资源池等方面讲解了私有云的规划和建设&#xff0c;以及私有云建设的总体原则…