r a/b 测试

什么是A / B测试？ (What is A/B Testing?)

A/B testing is a method used to test whether the response rate is different for two variants of the same feature. For instance, you may want to test whether a specific change to your website like moving the shopping cart button to the top right hand corner of your web page instead of on the right hand panel changes the number of people that click on the shopping cart and buy a product.

A / B测试是一种用于测试同一功能的两个变体的响应率是否不同的方法。例如，您可能想测试对网站的特定更改(例如将购物车按钮移至网页的右上角而不是在右侧面板上)是否会更改点击购物车的人数，以及购买产品。

A/B testing is also called split testing where two variants of the same web page are shown to different samples from your population of visitors to the website at the same time. Then, the number of conversions are compared for the two variants. Generally, the variant that gives a higher proportion of variants is the winning variant.

A / B测试也称为拆分测试，在该测试中，同一网页的两个变体会同时显示来自您网站访问者群体的不同样本。然后，比较两个变体的转化次数。通常，给出较高比例变体的变体是获胜变体。

However, as this is a data science blog, we want to ensure that the difference in proportion of conversions for the two variants is statistically significant. We may also want to understand what attributes of the visitors is driving those conversions. So, let’s move on to your data problem.

但是，由于这是一个数据科学博客，我们希望确保两个变体的转换比例差异在统计上是显着的。我们可能还想了解访问者的哪些属性正在推动这些转化。因此，让我们继续您的数据问题。

数据问题 (The Data Problem)

An A/B test was recently run and the Product Manager of your company wants to know whether the new variant of the web page resulted in more conversions. Make a recommendation to your Product Manager based on your analysis
最近运行了A / B测试，您公司的产品经理想知道网页的新版本是否带来了更多的转化。根据您的分析向产品经理提出建议
The CRM Manager is interested in knowing how accurately we can predict whether users are likely to engage with our emails based on the attributes we collected about the users when they first visit the website. Report back to the CRM Manager on your findings.
CRM经理有兴趣了解如何根据用户首次访问网站时收集到的有关用户的属性来预测用户是否可能与我们的电子邮件互动。向您的CRM报告报告您的发现。

数据集 (The Dataset)

Four datasets are provided.

提供了四个数据集。

Visits contains data from 10,000 unique users and has the following columns:
访问次数包含来自10,000个唯一用户的数据，并包含以下列：
user_id: unique identifier for the user
user_id：用户的唯一标识符
visit_time: timestamp indicating date and time of visit to website
visit_time：表示网站访问日期和时间的时间戳记
channel: marketing channel that prompted the user to visit the website
渠道：提示用户访问网站的营销渠道
age: user’s age at time of visiting website
年龄：用户访问网站时的年龄
gender: user’s gender
性别：用户的性别
Email engagement contains data on those users that engaged with a recent email campaign. The file contains the following columns:
电子邮件参与度包含有关最近参与电子邮件活动的那些用户的数据。该文件包含以下列：
user_id: unique identifier for the user
user_id：用户的唯一标识符
clicked_on_email: flag to indicate that the user engaged with the email where 1 indicates that the user clicked on the email
clicked_on_email：标志，表示用户与电子邮件互动，其中1表示用户单击了电子邮件
Variations contains data indicating which of the variations each user saw of the A/B test. The file has the following columns:
变体包含指示每个用户在A / B测试中看到了哪些变体的数据。该文件包含以下列：
user_id: unique identifier for the user
user_id：用户的唯一标识符
variation: variation (control or treatment) that the user saw
差异：用户看到的差异(控制或处理)
Test conversions contains data on those users that converted as a result of the A/B test. The file contains the following columns:
测试转换包含有关由于A / B测试而转换的用户的数据。该文件包含以下列：
user_id: unique identifier for the user
user_id：用户的唯一标识符
converted: flag to indicate that the user converted (1 for converted
convert：标志，指示用户已转换(1表示已转换

导入数据集并清理 (Importing the dataset and cleaning)

I always start by first combining the files using a primary key or a unique identifier. I then decide what to do with the data. I find this approach useful as I can get rid of what I don’t need later. It also helps me view the dataset on a holistic level.

我总是首先使用主键或唯一标识符组合文件。然后，我决定如何处理数据。我发现这种方法很有用，因为我以后可以摆脱不需要的东西。这也有助于我全面地查看数据集。

In this instance, our unique identifier is user_id. After merging the files using the following code,

在这种情况下，我们的唯一标识符是user_id。使用以下代码合并文件后，

merge_1<-merge(variations_df,visits_df,by.x="user_id",by.y="user_id")  
 merge_2<-merge(merge_1,test_conv_df,by.x="user_id",by.y="user_id",all.x=TRUE)  
 merge_3<-merge(merge_2,eng_df,by.x="user_id",by.y="user_id",all.x=TRUE)

I discovered that I had to create my own binary variable for whether or not a user converted and whether or not they had clicked on an email. This was based on their user ID not being found in the test_conversions.csv and email_engagement.csv files. I did this by replacing all “NA”s with 0's.

我发现我必须创建自己的二进制变量来确定用户是否转换以及他们是否单击了电子邮件。这是基于在test_conversions.csv和email_engagement.csv文件中找不到用户ID的原因。我通过将所有“ NA”替换为0来做到这一点。

merge_3$converted<-if_else(is.na(merge_3$converted),0,1)  
 merge_3$clicked_on_email<-if_else(is.na(merge_3$clicked_on_email),0,1)  
 merge_3$converted<-as.factor(merge_3$converted)  
 merge_3$clicked_on_email<-as.factor(merge_3$clicked_on_email)

The next task was to convert variables like visit time into information that would provide meaningful information on the users.

下一个任务是将诸如访问时间之类的变量转换为可以为用户提供有意义信息的信息。

merge_3$timeofday<-  mapvalues(hour(merge_3$visit_time),from=c(0:23),  
               to=c(rep("night",times=5), rep("morning",times=6),rep("afternoon",times=5),rep("night", times=8)))  
 merge_3$timeofday<-as.factor(merge_3$timeofday)

Now, that the data had been cleaned it was time to explore the data to understand whether there was an association between user conversion and the variation they visited on the website.

现在，已经清理了数据，是时候探索数据了，以了解用户转换与他们在网站上访问的变化之间是否存在关联。

数据探索和可视化 (Data Exploration and Visualization)

The simplest aspect of the data to check for is to determine whether there is indeed a difference in the proportion of users that converted based on the type of variation they viewed. Running the code provided at the end of the blog post gives the following graph and proportions:

要检查的数据最简单的方面是，根据他们查看的变化类型来确定转化用户的比例是否确实存在差异。运行博客文章末尾提供的代码将给出以下图形和比例：

control : 0.20 treatment : 0.24

控制：0.20处理：0.24

统计测试对A / B测试的重要性 (Statistical testing for significance of A/B Testing)

To test whether the difference in proportions is statistically significant, we can either carry out a difference in proportions test or a chi-squared test of independence where the null hypothesis is that there is no association between whether or not a user converted and the type of variation they visited.

为了检验比例差异是否在统计上具有显着性，我们可以进行比例差异检验或独立性的卡方检验，其中零假设是用户是否转换与用户类型之间没有关联。他们参观的变化。

For both tests, a p-value < 0.05 was observed indicating a statistically significant difference in proportions.

对于两种测试，均观察到p值<0.05，表明各比例的统计学差异显着。

I went a step further and ran logistic regression to understand how the other attributes of the users contributed to the difference in proportions. Only the type of variation and income (p-values less than 0.05) appeared to contribute to the difference in conversion proportions. A calculation of McFadden’s R-squared tells us that only 12.94% of the variation in proportions can be explained by the variation type and user attributes provided within our dataset. Hence, my response to the Product Manager would be as follows:

我走了一步，并进行了逻辑回归，以了解用户的其他属性如何导致比例差异。仅差异类型和收入类型(p值小于0.05)对转化比例的差异有所贡献。麦克法登(McFadden)的R平方计算表明，只有12.94％的比例变化可以由我们数据集中提供的变化类型和用户属性来解释。因此，我对产品经理的回复如下：

There is a statistically significant difference in conversion rates for those that visited the treatment variation vs the control variation. However, it is difficult to understand why this is the case. It would be best to repeat this test 2–3 more times to cross-validate results.

访视治疗差异与对照差异的转化率在统计上存在显着差异。 但是，很难理解为什么会这样。 最好再重复进行2-3次此测试以交叉验证结果。

探索性数据分析，以了解用户参与电子邮件的动因 (Exploratory Data Analysis to understand drivers of user engagement with emails)

Barplots were produced to check for a visual relationship between user attributes and whether or not they clicked on an email.

制作了条形图，以检查用户属性之间的视觉关系以及它们是否单击了电子邮件。

While running the exploratory data analysis, I noticed that the age was missing for 1,243 users. These users were omitted from analysis as I cannot impute their ages without any knowledge. Boxplots and numerical summaries were produced to understand any difference in average age of users that clicked on emails.

在进行探索性数据分析时，我注意到1,243位用户缺少该年龄。由于我无法在没有任何知识的情况下估算他们的年龄，因此从分析中忽略了这些用户。制作了箱线图和数字摘要，以了解单击电子邮件的用户平均年龄的任何差异。

It was found that those that clicked on emails (“1”) on average had higher income than those that didn’t. However, both groups have very high standard deviations, thus income does not appear to be a useful indicator.

结果发现，平均而言，点击电子邮件的人(“ 1”)的收入要高于没有点击电子邮件的人。但是，两组的标准差都很高，因此收入似乎不是有用的指标。

使用统计建模进行重要性测试 (Using statistical modelling for significance testing)

The dataset was randomly split into training (70%) and test (30%) sets for modelling. Logistic regression was run to determine which attributes had a statistically significant contribution in explaining whether users clicked or did not click on an email.

数据集被随机分为训练(70％)和测试(30％)集以进行建模。运行Logistic回归以确定在解释用户是否单击电子邮件时，哪些属性在统计上具有重要作用。

The model was trained on the training set and predictions were carried out on the test set for accuracy. An ROC curve was generated by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings. The AUC is the area under the ROC curve. As a rule of thumb, a model with good predictive ability should have an AUC closer to 1 (1 is ideal) than to 0.5. In our example, we have an AUC of 0.84, showing pretty good accuracy.

在训练集上对模型进行了训练，并在测试集上进行了准确性的预测。通过在各种阈值设置下绘制真实阳性率(TPR)相对于阴性阳性率(FPR)绘制ROC曲线。 AUC是ROC曲线下的面积。根据经验，具有良好预测能力的模型的AUC应该接近于1(理想值为1)而不是接近0.5。在我们的示例中，我们的AUC为0.84，显示出非常好的准确性。

Though the score is good, it would be good to carry out some form of cross-validation to validate the results further and ensure reproducibility.

尽管分数不错，但最好进行某种形式的交叉验证以进一步验证结果并确保可重复性。

A summary of the logistic regression model confirms what we saw visually that the top predictors of the likelihood of a user clicking on an email are:

logistic回归模型的摘要确认了我们在视觉上看到的结果，即用户单击电子邮件的可能性最大的预测因素是：

- channel

-频道

- age

-年龄

- gender

- 性别

My response to the CRM Manager would be that the top predictors of email conversion are age (older users are more likely to click), channel (PPC being popular amongst users that click) and gender (males are more likely to click than females). However, I would like to validate these results via a larger sample to allow for cross-validation.

我对CRM Manager的回答是，电子邮件转换的主要预测因素是年龄(老用户点击的可能性更高)，渠道(PPC在点击用户中很受欢迎)和性别(男性比女性更有可能点击)。 但是，我想通过更大的样本来验证这些结果，以便进行交叉验证。

最后的想法 (Final Thoughts)

Hopefully, this blog post has demystified A/B testing to some extent, given you some ways to test for statistical significance and shown you how exploratory data analysis and statistical testing work together to validate results.

希望该博客文章在一定程度上消除了A / B测试的神秘性，为您提供了一些测试统计意义的方法，并向您展示了探索性数据分析和统计测试如何共同验证结果。

Please note that a very small sample size was used in this example (around 4000 users) and as such it did not make sense to run and train a complex machine learning algorithm.

请注意，在此示例中使用了非常小的样本量(大约4000个用户)，因此运行和训练复杂的机器学习算法没有意义。

I would love your feedback and suggestions and all useful code is provided below and on github for download. :)

我希望收到您的反馈和建议，所有有用的代码都在下面和github上提供，以供下载。 :)

https://gist.github.com/shedoesdatascience/de3c5d3c2c88132339347c7da838a126

翻译自: https://towardsdatascience.com/a-b-testing-in-r-ae819ce30656

r a/b 测试

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.mzph.cn/news/388464.shtml

如若内容造成侵权/违法违规/事实不符，请联系多彩编程网进行投诉反馈email:809451989@qq.com，一经查实，立即删除！

一台机器同时运行两个Tomcat

如果不加任何修改，在一台服务器上同时运行两个Tomcat服务显然会发生端口冲突。假设现在已经按照正常的方式安装配置好了第一个Tomcat,第二个如何设置呢？以下是使用Tomcat5.5解压版本所做的实验。解决办法： 1.解压Tomcat到一个新的目录&#…

PHP获取IP地址的方法,防止伪造IP地址注入攻击

PHP获取IP地址的方法,防止伪造IP地址注入攻击原文:PHP获取IP地址的方法,防止伪造IP地址注入攻击PHP获取IP地址的方法 /*** 获取客户端IP地址* <br />来源：ThinkPHP* <br />"X-FORWARDED-FOR" 是代理服务器通过 HTTP Headers 提供的客户端IP。…

工作10年厌倦写代码_厌倦了数据质量讨论？

工作10年厌倦写代码I have been in tons of meetings where data and results of any sort of analysis have been presented. And most meetings have one thing in common, data quality is being challenged and most of the meeting time is used for discussing potential…

Java基础回顾

内容： 1、Java中的数据类型 2、引用类型的使用 3、IO流及读写文件 4、对象的内存图 5、this的作用及本质 6、匿名对象 1、Java中的数据类型 Java中的数据类型有如下两种： 基本数据类型: 4类8种 byte(1) boolean(1) short(2) char(2) int(4) float(4) l…

oracle数据库日志满了

1、数据库不能启动SQL> startupORACLE 例程已经启动。Total System Global Area 289406976 bytesFixed Size 1248576 bytesVariable Size 83886784 bytesDatabase Buffers 197132288 bytesRedo Buffers 7139328 byt…

计算机应用基础学生自查报告,计算机应用基础(专科).docx

1.在资源管理器中，如果要选择连续多个文件或文件夹，需要单击第一个文件或文件夹，按下键盘()，再用鼠标单击最后一个文件或文件夹即可。(A)Shift(B)Tab(C)Alt(D)Ctrl分值：2完全正确?得分：2?2.下列数据能被E…

Random随机数

Random 随机数 1 产生随机数 1.1 Random的使用步骤我们想产生1-100(包含1和100)的随机数该怎么办？我们不需要自己写算法，因为额Java已经为我们提供好了产生随机数的类---Random 作用：用于产生一个随机数使用步骤(和Scanner类似)&#xff1a…

模拟一个简单计算器_阅读模拟器的简单介绍

模拟一个简单计算器Read simulators are widely being used within the research community to create synthetic and mock datasets for analysis. In this article, I will introduce some recently proposed, commonly used read simulators.阅读模拟器在研究社区中被广泛使…

计算机部分应用显示模糊,win10系统打开部分软件字体总显示模糊的解决方法-电脑自学网...

win10系统打开部分软件字体总显示模糊的解决方法。方法一：win10软件字体模糊1、首先，在Win10的桌面点击鼠标右键，选择“显示设置”。2、在“显示设置”的界面下方，点击“高级显示设置”。3、在“高级显示设置”的界面中&#xff0…

Tomcat调节

Tomcat默认可以使用的内存为128MB，在较大型的应用项目中，这点内存是不够的，需要调大,并且Tomcat本身不能直接在计算机上运行，需要依赖于硬件基础之上的操作系统和一个java虚拟机。 AD： 这里向大家描述一下如何使用Tom…

假如不工作了，你还有源源不断的收入吗？

拥有金山跟银矿，其实不值得羡慕。俗话说：授人以鱼不如授人以渔。与其选择万贯家财，倒不如选择一个会持续冒出钱的杯子。很多人害怕上班的收入不确定，上班族急于寻找双薪，下班之后还要辛勤工作，以为这样就可…

turtle 20秒画完小猪佩奇“社会人”

转载：https://blog.csdn.net/csdnsevenn/article/details/80650456 图片源自网络作者丁彦军如需转载，请联系原作者授权。今年社交平台上最火的带货女王是谁？范冰冰？杨幂？Angelababy？不，是猪…

最佳子集aic选择_AutoML的起源：最佳子集选择

最佳子集aic选择As there is a lot of buzz about AutoML, I decided to write about the original AutoML; step-wise regression and best subset selection. Then I decided to ignore step-wise regression because it is bad and should probably stop being taught. That…