r a/b 测试
什么是A / B测试? (What is A/B Testing?)
A/B testing is a method used to test whether the response rate is different for two variants of the same feature. For instance, you may want to test whether a specific change to your website like moving the shopping cart button to the top right hand corner of your web page instead of on the right hand panel changes the number of people that click on the shopping cart and buy a product.
A / B测试是一种用于测试同一功能的两个变体的响应率是否不同的方法。 例如,您可能想测试对网站的特定更改(例如将购物车按钮移至网页的右上角而不是在右侧面板上)是否会更改点击购物车的人数,以及购买产品。
A/B testing is also called split testing where two variants of the same web page are shown to different samples from your population of visitors to the website at the same time. Then, the number of conversions are compared for the two variants. Generally, the variant that gives a higher proportion of variants is the winning variant.
A / B测试也称为拆分测试,在该测试中,同一网页的两个变体会同时显示来自您网站访问者群体的不同样本。 然后,比较两个变体的转化次数。 通常,给出较高比例变体的变体是获胜变体。
However, as this is a data science blog, we want to ensure that the difference in proportion of conversions for the two variants is statistically significant. We may also want to understand what attributes of the visitors is driving those conversions. So, let’s move on to your data problem.
但是,由于这是一个数据科学博客,我们希望确保两个变体的转换比例差异在统计上是显着的。 我们可能还想了解访问者的哪些属性正在推动这些转化。 因此,让我们继续您的数据问题。
数据问题 (The Data Problem)
- An A/B test was recently run and the Product Manager of your company wants to know whether the new variant of the web page resulted in more conversions. Make a recommendation to your Product Manager based on your analysis 最近运行了A / B测试,您公司的产品经理想知道网页的新版本是否带来了更多的转化。 根据您的分析向产品经理提出建议
- The CRM Manager is interested in knowing how accurately we can predict whether users are likely to engage with our emails based on the attributes we collected about the users when they first visit the website. Report back to the CRM Manager on your findings. CRM经理有兴趣了解如何根据用户首次访问网站时收集到的有关用户的属性来预测用户是否可能与我们的电子邮件互动。 向您的CRM报告报告您的发现。
数据集 (The Dataset)
Four datasets are provided.
提供了四个数据集。
- Visits contains data from 10,000 unique users and has the following columns: 访问次数包含来自10,000个唯一用户的数据,并包含以下列:
- user_id: unique identifier for the user user_id:用户的唯一标识符
- visit_time: timestamp indicating date and time of visit to website visit_time:表示网站访问日期和时间的时间戳记
- channel: marketing channel that prompted the user to visit the website 渠道:提示用户访问网站的营销渠道
- age: user’s age at time of visiting website 年龄:用户访问网站时的年龄
- gender: user’s gender 性别:用户的性别
- Email engagement contains data on those users that engaged with a recent email campaign. The file contains the following columns: 电子邮件参与度包含有关最近参与电子邮件活动的那些用户的数据。 该文件包含以下列:
- user_id: unique identifier for the user user_id:用户的唯一标识符
- clicked_on_email: flag to indicate that the user engaged with the email where 1 indicates that the user clicked on the email clicked_on_email:标志,表示用户与电子邮件互动,其中1表示用户单击了电子邮件
- Variations contains data indicating which of the variations each user saw of the A/B test. The file has the following columns: 变体包含指示每个用户在A / B测试中看到了哪些变体的数据。 该文件包含以下列:
- user_id: unique identifier for the user user_id:用户的唯一标识符
- variation: variation (control or treatment) that the user saw 差异:用户看到的差异(控制或处理)
- Test conversions contains data on those users that converted as a result of the A/B test. The file contains the following columns: 测试转换包含有关由于A / B测试而转换的用户的数据。 该文件包含以下列:
- user_id: unique identifier for the user user_id:用户的唯一标识符
- converted: flag to indicate that the user converted (1 for converted convert:标志,指示用户已转换(1表示已转换
导入数据集并清理 (Importing the dataset and cleaning)
I always start by first combining the files using a primary key or a unique identifier. I then decide what to do with the data. I find this approach useful as I can get rid of what I don’t need later. It also helps me view the dataset on a holistic level.
我总是首先使用主键或唯一标识符组合文件。 然后,我决定如何处理数据。 我发现这种方法很有用,因为我以后可以摆脱不需要的东西。 这也有助于我全面地查看数据集。
In this instance, our unique identifier is user_id. After merging the files using the following code,
在这种情况下,我们的唯一标识符是user_id。 使用以下代码合并文件后,
merge_1<-merge(variations_df,visits_df,by.x="user_id",by.y="user_id")  
 merge_2<-merge(merge_1,test_conv_df,by.x="user_id",by.y="user_id",all.x=TRUE)  
 merge_3<-merge(merge_2,eng_df,by.x="user_id",by.y="user_id",all.x=TRUE)I discovered that I had to create my own binary variable for whether or not a user converted and whether or not they had clicked on an email. This was based on their user ID not being found in the test_conversions.csv and email_engagement.csv files. I did this by replacing all “NA”s with 0's.
我发现我必须创建自己的二进制变量来确定用户是否转换以及他们是否单击了电子邮件。 这是基于在test_conversions.csv和email_engagement.csv文件中找不到用户ID的原因。 我通过将所有“ NA”替换为0来做到这一点。
merge_3$converted<-if_else(is.na(merge_3$converted),0,1)  
 merge_3$clicked_on_email<-if_else(is.na(merge_3$clicked_on_email),0,1)  
 merge_3$converted<-as.factor(merge_3$converted)  
 merge_3$clicked_on_email<-as.factor(merge_3$clicked_on_email)The next task was to convert variables like visit time into information that would provide meaningful information on the users.
下一个任务是将诸如访问时间之类的变量转换为可以为用户提供有意义信息的信息。
merge_3$timeofday<-  mapvalues(hour(merge_3$visit_time),from=c(0:23),  
               to=c(rep("night",times=5), rep("morning",times=6),rep("afternoon",times=5),rep("night", times=8)))  
 merge_3$timeofday<-as.factor(merge_3$timeofday)Now, that the data had been cleaned it was time to explore the data to understand whether there was an association between user conversion and the variation they visited on the website.
现在,已经清理了数据,是时候探索数据了,以了解用户转换与他们在网站上访问的变化之间是否存在关联。
数据探索和可视化 (Data Exploration and Visualization)
The simplest aspect of the data to check for is to determine whether there is indeed a difference in the proportion of users that converted based on the type of variation they viewed. Running the code provided at the end of the blog post gives the following graph and proportions:
要检查的数据最简单的方面是,根据他们查看的变化类型来确定转化用户的比例是否确实存在差异。 运行博客文章末尾提供的代码将给出以下图形和比例:
control : 0.20 treatment : 0.24
控制:0.20处理:0.24
统计测试对A / B测试的重要性 (Statistical testing for significance of A/B Testing)
To test whether the difference in proportions is statistically significant, we can either carry out a difference in proportions test or a chi-squared test of independence where the null hypothesis is that there is no association between whether or not a user converted and the type of variation they visited.
为了检验比例差异是否在统计上具有显着性,我们可以进行比例差异检验或独立性的卡方检验,其中零假设是用户是否转换与用户类型之间没有关联。他们参观的变化。
For both tests, a p-value < 0.05 was observed indicating a statistically significant difference in proportions.
对于两种测试,均观察到p值<0.05,表明各比例的统计学差异显着。
I went a step further and ran logistic regression to understand how the other attributes of the users contributed to the difference in proportions. Only the type of variation and income (p-values less than 0.05) appeared to contribute to the difference in conversion proportions. A calculation of McFadden’s R-squared tells us that only 12.94% of the variation in proportions can be explained by the variation type and user attributes provided within our dataset. Hence, my response to the Product Manager would be as follows:
我走了一步,并进行了逻辑回归,以了解用户的其他属性如何导致比例差异。 仅差异类型和收入类型(p值小于0.05)对转化比例的差异有所贡献。 麦克法登(McFadden)的R平方计算表明,只有12.94%的比例变化可以由我们数据集中提供的变化类型和用户属性来解释。 因此,我对产品经理的回复如下:
There is a statistically significant difference in conversion rates for those that visited the treatment variation vs the control variation. However, it is difficult to understand why this is the case. It would be best to repeat this test 2–3 more times to cross-validate results.
访视治疗差异与对照差异的转化率在统计上存在显着差异。 但是,很难理解为什么会这样。 最好再重复进行2-3次此测试以交叉验证结果。
探索性数据分析,以了解用户参与电子邮件的动因 (Exploratory Data Analysis to understand drivers of user engagement with emails)
Barplots were produced to check for a visual relationship between user attributes and whether or not they clicked on an email.
制作了条形图,以检查用户属性之间的视觉关系以及它们是否单击了电子邮件。



While running the exploratory data analysis, I noticed that the age was missing for 1,243 users. These users were omitted from analysis as I cannot impute their ages without any knowledge. Boxplots and numerical summaries were produced to understand any difference in average age of users that clicked on emails.
在进行探索性数据分析时,我注意到1,243位用户缺少该年龄。 由于我无法在没有任何知识的情况下估算他们的年龄,因此从分析中忽略了这些用户。 制作了箱线图和数字摘要,以了解单击电子邮件的用户平均年龄的任何差异。
It was found that those that clicked on emails (“1”) on average had higher income than those that didn’t. However, both groups have very high standard deviations, thus income does not appear to be a useful indicator.
结果发现,平均而言,点击电子邮件的人(“ 1”)的收入要高于没有点击电子邮件的人。 但是,两组的标准差都很高,因此收入似乎不是有用的指标。
使用统计建模进行重要性测试 (Using statistical modelling for significance testing)
The dataset was randomly split into training (70%) and test (30%) sets for modelling. Logistic regression was run to determine which attributes had a statistically significant contribution in explaining whether users clicked or did not click on an email.
数据集被随机分为训练(70%)和测试(30%)集以进行建模。 运行Logistic回归以确定在解释用户是否单击电子邮件时,哪些属性在统计上具有重要作用。
The model was trained on the training set and predictions were carried out on the test set for accuracy. An ROC curve was generated by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings. The AUC is the area under the ROC curve. As a rule of thumb, a model with good predictive ability should have an AUC closer to 1 (1 is ideal) than to 0.5. In our example, we have an AUC of 0.84, showing pretty good accuracy.
在训练集上对模型进行了训练,并在测试集上进行了准确性的预测。 通过在各种阈值设置下绘制真实阳性率(TPR)相对于阴性阳性率(FPR)绘制ROC曲线。 AUC是ROC曲线下的面积。 根据经验,具有良好预测能力的模型的AUC应该接近于1(理想值为1)而不是接近0.5。 在我们的示例中,我们的AUC为0.84,显示出非常好的准确性。

Though the score is good, it would be good to carry out some form of cross-validation to validate the results further and ensure reproducibility.
尽管分数不错,但最好进行某种形式的交叉验证以进一步验证结果并确保可重复性。
A summary of the logistic regression model confirms what we saw visually that the top predictors of the likelihood of a user clicking on an email are:
logistic回归模型的摘要确认了我们在视觉上看到的结果,即用户单击电子邮件的可能性最大的预测因素是:
- channel
-频道
- age
-年龄
- gender
- 性别
My response to the CRM Manager would be that the top predictors of email conversion are age (older users are more likely to click), channel (PPC being popular amongst users that click) and gender (males are more likely to click than females). However, I would like to validate these results via a larger sample to allow for cross-validation.
我对CRM Manager的回答是,电子邮件转换的主要预测因素是年龄(老用户点击的可能性更高),渠道(PPC在点击用户中很受欢迎)和性别(男性比女性更有可能点击)。 但是,我想通过更大的样本来验证这些结果,以便进行交叉验证。
最后的想法 (Final Thoughts)
Hopefully, this blog post has demystified A/B testing to some extent, given you some ways to test for statistical significance and shown you how exploratory data analysis and statistical testing work together to validate results.
希望该博客文章在一定程度上消除了A / B测试的神秘性,为您提供了一些测试统计意义的方法,并向您展示了探索性数据分析和统计测试如何共同验证结果。
Please note that a very small sample size was used in this example (around 4000 users) and as such it did not make sense to run and train a complex machine learning algorithm.
请注意,在此示例中使用了非常小的样本量(大约4000个用户),因此运行和训练复杂的机器学习算法没有意义。
I would love your feedback and suggestions and all useful code is provided below and on github for download. :)
我希望收到您的反馈和建议,所有有用的代码都在下面和github上提供,以供下载。 :)
https://gist.github.com/shedoesdatascience/de3c5d3c2c88132339347c7da838a126
https://gist.github.com/shedoesdatascience/de3c5d3c2c88132339347c7da838a126
翻译自: https://towardsdatascience.com/a-b-testing-in-r-ae819ce30656
r a/b 测试
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/388464.shtml
如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!