大数据ab 测试
Hello Everyone!
大家好!
I am back with another article about Data Science. In this article, I will write about what is A-B testing and how to use it on real life data-set to compare two advertisement methods.
我回来了另一篇有关数据科学的文章。 在本文中,我将介绍什么是AB测试以及如何在现实生活的数据集上使用它来比较两种广告方法。
What is A-B Testing and where do we use it?
什么是AB测试,我们在哪里使用它?
A/B testing is a method used to compare two versions or methods of something. It is a very popular method used by from big companies like Facebook, Google, Amazon, Ali Express etc. to many other companies. With this method, we can decide on, for example, the following:
A / B测试是一种用于比较某事物的两个版本或方法的方法。 从Facebook,Google,Amazon,Ali Express等大公司到许多其他公司,这是一种非常流行的方法。 使用这种方法,我们可以决定例如以下内容:
- Button shape, size or color, 按钮的形状,大小或颜色
- Which advertisement strategy is better, 哪种广告策略更好,
- Which email format is better, 哪种电子邮件格式更好,
- Which website design is better, 哪个网站设计更好,
- Which headline is better etc. 哪个标题更好,等等。
So let us apply this testing method on a real life situation.
因此,让我们将这种测试方法应用于现实生活中。
案例分析: (Case Study:)
A big social platform recently introduced an advertisement method with a new bidding type “average bidding”, as an alternative to its existing advertisement method with bidding type called “maximum bidding”. One of their clients has decided to test this new feature and wants to conduct an A/B test to understand if average bidding brings more conversions than maximum bidding.
大型社交平台最近推出了一种具有新出价类型“平均出价”的广告方法,以替代其现有的具有出价类型的广告方法“最高出价”。 他们的一位客户决定测试此新功能,并希望进行A / B测试,以了解平均出价带来的转化次数是否大于最高出价。
In this A/B test, the client randomly splits its audience into two equally sized groups, e.g. the test and the control group. The existing ad campaign with “maximum bidding” is served to “control group” and the new campaign with “average bidding” is served to the “test group”. The A/B test has run for 1 month and now the client wants to analyze and present the results of this A/B test.
在此A / B测试中,客户将受众随机分为两个大小相等的组,例如测试组和对照组。 现有的具有“最高出价”的广告系列将投放到“对照组”,而具有“平均出价”的新广告系列将投放到“测试组”。 A / B测试已经进行了1个月,现在客户希望分析并展示此A / B测试的结果。
To understand the data variables better, we need to understand the customer journey for the campaign. The customer journey for this campaign is as in the order below:
为了更好地了解数据变量,我们需要了解广告系列的客户旅程。 此广告系列的客户流程如下:
1. User sees an ad (Impression)
1.用户看到一个广告(展示)
2. User clicks on the website link on the ad (Website Click)
2.用户点击广告上的网站链接(网站点击)
3. User makes a search on the website (Search)
3.用户在网站上进行搜索(搜索)
4. User views details of a product (View Content)
4.用户查看产品的详细信息(查看内容)
5. User adds the product to the cart (Add to Cart)
5.用户将产品添加到购物车(添加到购物车)
6. User purchases the product (Purchase)
6.用户购买产品(购买)
数据理解 (Data Understanding)
The variables are the same in both groups. There are totally 10 variables (10 columns): “Campaign Name”, “Date”, “Spend[USD]”, “# of Impressions (the number of times an ad is displayed)”, “Reach (the number of unique people who saw an ad)”, “# of Website Clicks (the number of clicks on ad links directed to Advertiser’s website)”, “# of Searches”, “# of View Content”, “# of Add to Cart”, “# of Purchase”. There are equal number of observations in both groups (30 rows each).
两组中的变量相同。 总共有10个变量(10列):“广告系列名称”,“日期”,“支出[USD]”,“展示次数(显示广告的次数)”,“到达率(唯一身份的人数)谁看到了广告)”,“网站点击次数(指向广告商网站的广告链接的点击次数)”,“搜索次数”,“查看内容数量”,“添加到购物车的数量”,“#购买”。 两组中的观察数相等(每组30行)。
The most important metric for analyzing success for the customer is “# of Purchases”.
分析客户成功的最重要指标是“购买数量”。
Other metrics that are widely used and compared are CTR (Click Through rate), CPA (Cost per Action), CR (Conversion rate):
广泛使用和比较的其他指标是CTR(点击率),CPA(每次操作费用),CR(转化率):
Click Through Rate: Number of Website Clicks / Number of Impressions
点击率:网站点击次数/展示次数
Cost per Action: Spend / Number of Actions
每次操作费用:支出/操作次数
Conversion Rate: Number of Actions / Number of Website Clicks
转化率:操作数/网站点击数
Action: Can be any conversion event, such as Search, View Content, Add to Cart and Purchase.
行动:可以是任何转化事件,例如搜索,查看内容,添加到购物车和购买。
数据预处理 (Data Pre-processing)
When analyzing the data-set, it is found that:
分析数据集时,发现:
Control group has one NA row. This row has only Date and Spend values, other values are missing:
对照组有一个NA行。 该行只有日期和支出值,其他值缺失:
The mean values of these variables are assigned to their own NA values respectively:
这些变量的平均值分别分配给它们自己的NA值:
It is found that there is no outlier in the “# of Purchase” variable of the control group. Test group does not have any missing values. However, it has one outlier in # of Purchases:
发现对照组的“购买数量”变量中没有异常值。 测试组没有任何缺失值。 但是,它在购买数量中有一个异常值:
This value is repressed by assigning the upper bound value to it. Upper bound value is found by Box-plot method:
通过为其指定上限值来抑制此值。 上限值通过Box-plot方法找到:
Now we have no outliers in test group as well. The data-set is ready for hypothesis testing.
现在我们在测试组中也没有异常值。 数据集已准备好进行假设检验。
· Group A: Existing method: Maximum Bidding — Control Group
·A组:现有方法:最高出价-对照组
· Group B: New method: Average Bidding — Test Group
·B组:新方法:平均出价-测试组
假设检验 (Hypothesis Testing)
Control Group has 507 purchases in average and test group has 481 purchases in average, meaning Control group has more purchases in average. However, we need to check if this is a significant difference. For this, we need to make hypothesis test. As we have independent and paired sample groups, we can use T-testing for Hypothesis test. A t-test is a statistic method used to determine if there is a significant difference between the means of two groups based on a sample of data. The common assumptions made when doing a t-test include normality of data distribution and equality of variance in standard deviation.
对照组平均有507笔采购,测试组平均有481笔采购,这意味着对照组平均有更多的采购。 但是,我们需要检查这是否有显着差异。 为此,我们需要进行假设检验。 由于我们有独立且成对的样本组,因此我们可以将T检验用于假设检验。 t检验是一种统计方法,用于根据数据样本确定两组平均值之间是否存在显着差异。 进行t检验时,通常的假设包括数据分布的正态性和标准偏差的方差相等。
1. Normality of data distribution
1.数据分发的常态
2. Equality of variances
2.方差均等
1) Controlling 1st Assumption:
1)控制第一个假设:
Shapiro test is applied for checking the first assumption. The hypothesis is constructed as follows:
Shapiro检验用于检查第一个假设。 假设的构造如下:
H0: The data is normally distributed
H0:数据正态分布
H1: The data is not normally distributed
H1:数据不是正态分布
The results of the test are found as follows:
测试结果如下:
· p-value of group A is 0.979 which is >0.05 so we fail to reject H0 meaning that group A is normally distributed.
·组A的p值为0.979,即> 0.05,因此我们无法拒绝H0,这意味着组A是正态分布的。
· p value of group B is 0.776 which is <0.05 so we fail to reject H0 meaning that group B data is normally distributed.
·B组的p值为0.776,该值<0.05,因此我们不能拒绝H0,这意味着B组数据是正态分布的。
As a result of Shapiro test, it is seen that the normality assumption has been satisfied.
作为Shapiro测试的结果,可以看出正态性假设已得到满足。
2) Controlling 2nd Assumption:
2)控制第二个假设:
Levene test is applied for checking the first assumption. The hypothesis is constructed as follows:
Levene检验用于检验第一个假设。 假设的构造如下:
H0: The variances are equal(homogeneous)
H0:方差相等(均匀)
H1: The variances are unequal(non-homogeneous)
H1:方差不相等(不均匀)
As the p-value has been found as 0.11 which is >0.05, we fail to reject H0 meaning that the variances of both groups are equal variances.
由于发现p值为0.11(大于0.05),我们不能拒绝H0,这意味着两组的方差是相等的方差。
As the normality and equal variance of distributions assumptions are satisfied, we can use Independent Two-Sample T-Test for testing the hypothesis:
当满足分布假设的正态性和均方差时,我们可以使用独立两次样本T检验来检验假设:
对于2-T检验,假设如下: (For 2-T test, the hypothesis is as follows:)
H0: There is no statistically significant difference between control and test groups with respect to the average of Number of Purchases ( μ1=μ2 )
H0:对照组和测试组之间的平均购买次数(μ1=μ2)没有统计学上的显着差异
H1: There is statistically significant differences between control and test groups with respect to the average of Number of Purchases (μ1 ≠ μ2 )
假设1:对照组和测试组之间的平均购买次数(μ1≠μ2)在统计上有显着差异
After tests are applied, the p value resulted in more than 0.05 significance level. Therefore, the result is “fail to reject H0” that is there is no statistically significant difference between control and experiment groups with respect to the average of Number of Purchases (μ1=μ2).
应用测试后,p值导致超过0.05的显着性水平。 因此,结果是“未能拒绝H0”,相对于平均购买数量(μ1=μ2),对照组和实验组之间没有统计学上的显着差异。
通过假设检验分析其他指标: (Analyzing Other Metrics by Hypothesis Testing:)
More metrics have been tested to see if the new method is better or not. The metrics that were analyzed are Click through rate (CTR), CPA (Cost per Action) and Conversion Rate (CR). When comparing rates in two independent groups, independent 2-sample proportion test is used.
已经测试了更多指标,以查看新方法是否更好。 分析的指标为点击率(CTR),CPA(每次操作费用)和转化率(CR)。 比较两个独立组的比率时,使用独立的2样本比例测试。
1) CTR:
1) 点击率:
CTR= Number of Website Clicks / Number of Impressions
点击率=网站点击次数/展示次数
The proportions are calculated for both groups. Control group’s CTR is found as 0.045 and Test group’s CTR is found as 0.044. There is slight difference between two groups’ CTR values and the control group has bigger rate. Let’s see if this difference is statistically significant. The hypothesis is constructed as follows:
计算两组的比例。 对照组的CTR为0.045,测试组的CTR为0.044。 两组的CTR值之间存在细微差异,而对照组的CTR值则较大。 让我们看看这种差异是否具有统计意义。 假设的构造如下:
H0: There is no statistically significant difference between control and test group CTR rates.
H0:对照组和测试组的点击率之间没有统计学上的显着差异。
H1: There is a statistically significant difference between control and test group CTR rates.
假设1 :对照组和测试组的点击率有统计学差异。
As a result, the p-value found smaller than 0.05 meaning that we reject the null hypothesis. When bidding methods are examined, there is a statistically significant difference between CTR rates. And this difference is in favor of the control group (current existing “max bidding” method).
结果,发现p值小于0.05,这意味着我们拒绝了原假设。 在检查出价方法时,点击率之间在统计上有显着差异。 这种差异有利于对照组(当前现有的“最高出价”方法)。
2) CPA:
2) 每次转化费用:
CAP= Spend / Number of Actions = Spend/Number of Purchases
CAP =支出/操作次数=支出/购买次数
Number of actions is taken as the number of the purchases in this calculation.
在此计算中,将操作数作为购买数。
The proportions are calculated for both groups and found as follows:
计算两组的比例,结果如下:
Control group’s CPA is found as 5, it means that a customer’s one purchase costed us 5 USD in average. Test group’s CPA is found as 5.23, it means that a customer’s one purchase costed us 5.23 USD in average.
对照组的CPA为5,这表示客户一次购买的平均费用为5美元。 测试组的CPA为5.23,这表示客户一次购买的平均费用为5.23美元。
The hypothesis is constructed as follows:
假设的构造如下:
H0: There is no statistically significant difference between control and test group CPA rates.
H0:对照组和测试组的CPA率之间没有统计学上的显着差异。
H1: There is a statistically significant difference between control and test group CPA rates.
H1 :对照组和测试组的CPA率之间存在统计学差异。
The p-value has been found smaller than 0.05. We reject H0 meaning that there is statistically meaningful difference between two groups CPA values.
发现p值小于0.05。 我们拒绝H0,这意味着两组CPA值之间存在统计上有意义的差异。
We can say that the existing method: max bidding is better than the new one as it has cheaper cost per purchase. So still, we cannot say that the new method is better than the existing one. Lastly, CR has been tested.
可以说,现有方法:最高出价比新方法要好,因为它的每次购买费用更便宜。 因此,我们仍然不能说新方法比现有方法更好。 最后,CR已通过测试。
3) CR:
3) CR:
Conversion Rate: Number of Actions / Number of Website Clicks
转化率:操作数/网站点击数
For number of actions, 3 different metrics have been analyzed:
对于操作数,已分析了3个不同的指标:
Firstly, the proportions (“# of Purchase”/”# of Website Clicks”) are found as follows:
首先,找到比例(“购买次数” /“网站点击次数”):
Control CR= 0.098
控制CR = 0.098
Test CR= 0.099
测试CR = 0.099
The hypothesis is constructed as follows:
假设的构造如下:
H0: There is no statistically significant difference between control and test group CR values.
H0:对照组和测试组的CR值之间没有统计学上的显着差异。
H1: There is a statistically significant difference between control and test group CR values.
H1 :对照组和测试组的CR值在统计上有显着差异。
As a result, we fail to reject the null hypothesis meaning that there is no statistically significant difference between control and test groups with respect to the first CR.
结果,我们不能拒绝零假设,即相对于第一个CR,对照组和测试组之间没有统计学上的显着差异。
Secondly, the proportions (“# of Purchase”/”# of Add to Cart”) are found as follows:
其次,找到比例(“购买数量” /“添加到购物车数量”):
Control CR2= 0.512
控制CR2 = 0.512
Test CR2= 0.588
测试CR2 = 0.588
This result shows that the occurrence number of buying items after they are added into cart is more in test group. Let’s see if this is significant difference.
结果表明,在测试组中,将购买的商品添加到购物车后的出现次数更多。 让我们看看这是否有显着差异。
The hypothesis is constructed as follows:
假设的构造如下:
H0: There is no statistically significant difference between control and test group CR2 values.
H0:对照组和测试组的CR2值之间没有统计学上的显着差异。
H1: There is a statistically significant difference between control and test group CR2 values.
H1 :对照组和测试组的CR2值在统计上有显着差异。
As a result, p-value has been found lower than< 0.05. We reject H0 meaning that there is statistically meaningful difference between two groups CPA values and the difference is in favor of the new method.
结果,发现p值小于<0.05。 我们拒绝H0意味着两组CPA值之间在统计上存在有意义的差异,并且该差异有利于新方法。
It is the first result that has ever favored the new method.
这是有史以来使用该新方法的第一个结果。
Let’s check the final CR:
让我们检查最终的CR:
Lastly, the proportions (“# of Purchase”/”# of View Content”) are found as: Control CR3= 0.334 and Test CR3= 0.345
最后,比例(“购买数量” /“观看内容数量”)为:控件CR3 = 0.334和测试CR3 = 0.345
Test group has better rate but let’s see if this is significant by applying Hypothesis testing:
测试组的评分较高,但通过使用假设检验来看看这是否有意义:
H0: There is no statistically significant difference between control and test group CR3 values.
H0:对照组和测试组的CR3值之间没有统计学上的显着差异。
H1: There is a statistically significant difference between control and test group CR3 values.
H1 :对照组和测试组的CR3值在统计上有显着差异。
As a result, the p-value (0.567) has been found bigger than 0.05. We fail to Reject H0 meaning that there is not statistically meaningful difference between two groups. We cannot conclude anything in here.
结果,发现p值(0.567)大于0.05。 我们无法拒绝H0,这意味着两组之间在统计上没有显着差异。 我们在这里不能得出任何结论。
结果 (Results)
Based on my analysis, I would recommend to client to collect more data for few more months before drawing any conclusions. If there is no chance (no time, no budget etc.), it is not suggested to use the new bidding type (average bidding). Because:
根据我的分析,我建议客户在得出结论之前再收集几个月的更多数据。 如果没有机会(没有时间,没有预算等),建议不要使用新的出价类型(平均出价)。 因为:
- Average spend value of new method is more that the existing one. No need to spend more while there is no significant difference between two models: 新方法的平均花费价值大于现有方法。 两种模式之间没有显着差异时,无需花费更多:
- # of purchase is decreasing when using the new method: 使用新方法时,购买数量正在减少:
- # of clicks is decreasing when using the new method: 使用新方法时,点击次数正在减少:
As a result, it doesn’t bring more conversions than existing bidding type (max bidding). I recommend to the client to continue with existing bidding type if a decision must be made now.
因此,它带来的转化次数不会超过现有的出价类型(最高出价)。 如果必须立即做出决定,我建议客户继续使用现有的出价类型。
In this article, I have given brief information about A/B Testing and where it is used in real life. I have also applied it on a real data-set to compare two advertisement methods. I hope this article will be helpful for you to understand A/B Testing and its applications. You can reach to the full code from my GitHub account:
在本文中,我简要介绍了A / B测试及其在现实生活中的使用情况。 我还将其应用于实际数据集以比较两种广告方法。 希望本文对您了解A / B测试及其应用程序有所帮助。 您可以从我的GitHub帐户获取完整代码:
https://github.com/bsrymn/AB-Test/blob/master/CaseStudyAB-Test_BusraYaman.ipynb
https://github.com/bsrymn/AB-Test/blob/master/CaseStudyAB-Test_BusraYaman.ipynb
See you in my next articles!!
下篇再见!!
参考资料 (REFERENCES)
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html
https://docs.scipy.org/doc/scipy/reference/generation/scipy.stats.ttest_ind.html
- Veri Bilimi Okulu (Data Science School) Class Notes Veri Bilimi Okulu(数据科学学院)课程笔记
https://hbr.org/2017/06/a-refresher-on-ab-testing
https://hbr.org/2017/06/a-refresher-on-ab-testing
https://medium.com/@ng.dasci/ger%C3%A7ek-verilerle-ab-testi-uygulamas%C4%B1-yeni-reklam-teklif-y%C3%B6ntemi-sat%C4%B1n-alma-say%C4%B1s%C4%B1n%C4%B1-artt%C4%B1rd%C4%B1-m%C4%B1-f9cdd45cdb21
https://medium.com/@ng.dasci/ger%C3%A7ek-verilerle-ab-testi-uygulamas%C4%B1-yeni-reklam-teklif-y%C3%B6ntemi-sat%C4%B1n-alma -say%C4%B1s%C4%B1n%C4%B1-artt%C4%B1rd%C4%B1-m%C4%B1-f9cdd45cdb21
翻译自: https://medium.com/swlh/a-b-test-application-on-real-data-8ec58f8280f9
大数据ab 测试
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389922.shtml
如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!