接facebook广告

Is our company’s Facebook advertising even worth the effort?

我们公司的Facebook广告是否值得努力？

题： (QUESTION:)

A company would like to know if their advertising is effective. Before you start, yes…. Facebook does have analytics for users who actually utilize their advertising platform. Our customer does not. Their “advertisements” are posts on their feed and are not marketed by Facebook.

公司想知道他们的广告是否有效。开始之前，是的。 Facebook确实为实际使用其广告平台的用户提供了分析。我们的客户没有。他们的“广告”是他们的供稿上的帖子，并不由Facebook进行营销。

数据： (DATA:)

Data is from the client’s POS system and their Facebook feed.

数据来自客户的POS系统及其Facebook提要。

模型： (MODEL:)

KISS. A simple linear model will suffice.

吻。一个简单的线性模型就足够了。

First, we need to obtain our data. We can use a nice Facebook scraper to scrape the last posts in a usable format.

首先，我们需要获取数据。 我们可以使用一个不错的Facebook抓取工具以可用的格式抓取最后的帖子。

#install & load scraper !pip install facebook_scraper from facebook_scraper import get_posts import pandas as pd

Lets first scrape the posts from the first 200 posts of their Facebook page.

首先让我们从其Facebook页面的前200个帖子中抓取这些帖子。

#scrape post_list = [] for post in get_posts('clients_facebook_page', pages=200): post_list.append(post)#View the data print(post_list[0].keys()) print("Number of Posts: {}".format(len(post_list)))## dict_keys(['post_id', 'text', 'post_text', 'shared_text', 'time', 'image', 'video', 'video_thumbnail', 'likes', 'comments', 'shares', 'post_url', 'link']) ## Number of Posts: 38

Lets clean up the list, keeping only Time, Image, Likes, Comments, Shares.

让我们清理列表，仅保留时间，图像，顶，评论，共享。

post_list_cleaned = [] for post in post_list: #create a list of indexes to keep temp = [] indexes_to_keep = ['time', 'image', 'likes', 'comments', 'shares'] for key in indexes_to_keep: temp.append(post[key]) post_list_cleaned.append(temp) #Remove image hyperlink, replace with 0, 1 & recast date for post in post_list_cleaned: if post[1] == None: post[1] = 0 else: post[1] = 1 post[0] = post[0].date

We now need to combine the Facebook data with data from the company’s POS system.

现在，我们需要将Facebook数据与该公司POS系统中的数据结合起来。

#turn into a DataFrame fb_posts_df = pd.DataFrame(post_list_cleaned) fb_posts_df.columns = ['Date', 'image', 'likes', 'comments', 'shares'] #import our POS data daily_sales_df = pd.read_csv('daily_sales.csv') #merge both sets of data combined_df = pd.merge(daily_sales_df, fb_posts_df, on='Date', how='outer')

Finally, lets export the data to a csv. We’ll do our modeling in R.

最后，让我们将数据导出到csv。 我们将在R中进行建模。

combined_df.to_csv('data.csv')

R分析 (R Analysis)

First, lets import our data from python. We then need to ensure the variables are cast appropriate (ie, dates are actual datetime fields and not just strings’). Finally, we are only conserned with data since the start of 2019.

首先，让我们从python导入数据。 然后，我们需要确保变量被正确地转换(即，日期是实际的日期时间字段，而不仅仅是字符串')。 最后，自2019年初以来，我们只了解数据。

library(readr) library(ggplot2) library(gvlma) #set a seed to be reproducable data <- read.table("data.csv", header = TRUE, sep = ",") data <- as.data.frame(data) #rename 'i..Date' to 'Date' names(data)[1] <- c("Date") #set data types data$Sales <- as.numeric(data$Sales) data$Date <- as.Date(data$Date, "%m/%d/%Y") data$Image <- as.factor(data$Image) data$Post <- as.factor(data$Post) #create a set of only 2019+ data data_PY = data[data$Date >= '2019-01-01',] head(data)

6 rows

6排

head(data_PY)

6 rows

6排

summary(data_PY)## Date Sales Post Image Likes ## Min. :2019-01-02 Min. :3181 0:281 0:287 Min. : 0.000 ## 1st Qu.:2019-04-12 1st Qu.:3370 1: 64 1: 58 1st Qu.: 0.000 ## Median :2019-07-24 Median :3456 Median : 0.000 ## Mean :2019-07-24 Mean :3495 Mean : 3.983 ## 3rd Qu.:2019-11-02 3rd Qu.:3606 3rd Qu.: 0.000 ## Max. :2020-02-15 Max. :4432 Max. :115.000 ## Comments Shares ## Min. : 0.0000 Min. :0 ## 1st Qu.: 0.0000 1st Qu.:0 ## Median : 0.0000 Median :0 ## Mean : 0.3101 Mean :0 ## 3rd Qu.: 0.0000 3rd Qu.:0 ## Max. :19.0000 Max. :0

Now that our data’s in, let’s review our summary. We can see our data starts on Jan. 2, 2019 (as we hoped), but we do see one slight problem. When we look at the Post variable, we see it’s highly imbalanced. We have 281 days with no posts and only 64 days with posts. We should re-balance our dataset before doing more analysis to ensure our results aren’t skewed. I’ll rebalance our data by sampling from the larger group (days with no posts), known as undersampling. I’ll also set a random seed so that our numbers are reproducible.

现在已经有了我们的数据，让我们回顾一下摘要。 我们可以看到我们的数据从2019年1月2日开始(我们希望如此)，但是我们确实看到了一个小问题。 当我们查看Post变量时，我们发现它高度不平衡。 我们有281天无帖子，只有64天有帖子。 在进行更多分析之前，我们应该重新平衡我们的数据集，以确保结果不偏斜。 我将通过从较大的组(无帖子的日子)中进行抽样来重新平衡数据，这被称为欠抽样。 我还将设置一个随机种子，以便我们的数字可重现。

set.seed(15) zeros = data_PY[data_PY$Post == 0,] samples = sample(281, size = (345-281), replace = FALSE) zeros = zeros[samples, ] balanced = rbind(zeros, data_PY[data_PY$Post == 1,]) summary(balanced$Post)## 0 1 ## 64 64

Perfect, now our data is balanced. We should also do some EDA on our dependent variable (Daily Sales). It’s a good idea to know what our distribution looks like and if we have outliers we should address.

完美，现在我们的数据已经平衡了。 我们还应该对我们的因变量(每日销售)进行一些EDA。 知道我们的分布是什么样子是一个好主意，如果有异常值我们应该解决。

hist(balanced$Sales)

boxplot(balanced$Sales)

We can see our data is slightly skewed. Sadly, with real-world data our data is never a perfect normal distribution… Luckily though, we appear to have no outliers in our boxplot. Now we can begin modeling. Since we’re interested in understanding the dynamics of the system and not actually classifying or predicting, we’ll use a standard regression model.

我们可以看到我们的数据略有倾斜。 遗憾的是，对于现实世界的数据，我们的数据永远不会是完美的正态分布……但是幸运的是，我们的箱线图中似乎没有异常值。 现在我们可以开始建模了。 由于我们有兴趣了解系统的动态特性，而不是实际进行分类或预测，因此我们将使用标准回归模型。

model1 <- lm(data=balanced, Sales ~ Post) summary(model1)## ## Call: ## lm(formula = Sales ~ Post, data = balanced) ## ## Residuals: ## Min 1Q Median 3Q Max ## -316.22 -114.73 -29.78 111.17 476.49 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 3467.1 20.5 169.095 < 2e-16 *** ## Post1 77.9 29.0 2.687 0.00819 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 164 on 126 degrees of freedom ## Multiple R-squared: 0.05418, Adjusted R-squared: 0.04667 ## F-statistic: 7.218 on 1 and 126 DF, p-value: 0.008193gvlma(model1)## ## Call: ## lm(formula = Sales ~ Post, data = balanced) ## ## Coefficients: ## (Intercept) Post1 ## 3467.1 77.9 ## ## ## ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS ## USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM: ## Level of Significance = 0.05 ## ## Call: ## gvlma(x = model1) ## ## Value p-value Decision ## Global Stat 8.351e+00 0.07952 Assumptions acceptable. ## Skewness 6.187e+00 0.01287 Assumptions NOT satisfied! ## Kurtosis 7.499e-01 0.38651 Assumptions acceptable. ## Link Function -1.198e-13 1.00000 Assumptions acceptable. ## Heteroscedasticity 1.414e+00 0.23435 Assumptions acceptable.

Using a standard linear model, we obtain a result that says, on average, a FaceBook post increases daily sales by $77.90. We can see, based on the t-statistic and p-value that this result is highly statistically significant. We can use the GVLMA feature to ensure our model passes the underlying OLS assumptions. Here, we see we pass on all levels except skewness. We already identified earlier that skewness may be a problem with our data. A common correction for skewness is a log transformation. Let’s transform our dependent variable and see if it helps. Note that this model (a log-lin model) will produce coefficients with different interpretations than our last model.

使用标准线性模型，我们得出的结果表明，平均而言，FaceBook发布使每日销售额增加77.90美元。 根据t统计量和p值，我们可以看到此结果在统计上非常重要。 我们可以使用GVLMA功能来确保我们的模型通过基本的OLS假设。 在这里，我们看到除了偏斜度之外，我们都通过了所有级别。 前面我们已经确定偏斜可能是我们数据的问题。 偏度的常见校正是对数变换。 让我们转换我们的因变量，看看是否有帮助。 请注意，此模型(对数线性模型)将产生与上一个模型具有不同解释的系数。

model2 <- lm(data=balanced, log(Sales) ~ Post) summary(model2)## ## Call: ## lm(formula = log(Sales) ~ Post, data = balanced) ## ## Residuals: ## Min 1Q Median 3Q Max ## -0.092228 -0.032271 -0.007508 0.032085 0.129686 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 8.150154 0.005777 1410.673 < 2e-16 *** ## Post1 0.021925 0.008171 2.683 0.00827 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.04622 on 126 degrees of freedom ## Multiple R-squared: 0.05406, Adjusted R-squared: 0.04655 ## F-statistic: 7.201 on 1 and 126 DF, p-value: 0.008266gvlma(model2)## ## Call: ## lm(formula = log(Sales) ~ Post, data = balanced) ## ## Coefficients: ## (Intercept) Post1 ## 8.15015 0.02193 ## ## ## ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS ## USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM: ## Level of Significance = 0.05 ## ## Call: ## gvlma(x = model2) ## ## Value p-value Decision ## Global Stat 7.101e+00 0.13063 Assumptions acceptable. ## Skewness 4.541e+00 0.03309 Assumptions NOT satisfied! ## Kurtosis 1.215e+00 0.27030 Assumptions acceptable. ## Link Function -6.240e-14 1.00000 Assumptions acceptable. ## Heteroscedasticity 1.345e+00 0.24614 Assumptions acceptable.plot(model2)

hist(log(balanced$Sales))

Our second model produces another highly significant coefficient for the FaceBook post variable. Here we see that each post is associated with an average 2.19% increase in daily sales. Unfortunately, even our log transformation was unable to correct for the skewness in our data. We’ll have to note this when presenting our findings later. Let’s now determine how much the 2.19% is actually worth (since we saw in model 1 that a post was worth $77.90).

我们的第二个模型为FaceBook post变量生成了另一个非常重要的系数。 在这里，我们看到每个帖子都与日均销售额平均增长2.19％相关。 不幸的是，即使我们的对数转换也无法纠正数据中的偏斜。 稍后介绍我们的发现时，我们必须注意这一点。 现在让我们确定2.19％的实际价值是多少(因为我们在模型1中看到，某帖子的价值为77.90美元)。

mean_sales_no_post <- mean(balanced$Sales[balanced$Post == 0]) mean_sales_with_post <- mean(balanced$Sales[balanced$Post == 1]) mean_sales_no_post * model1$coefficients['Post1']## Post1 ## 270086.1

Very close. Model 2’s coefficient equates to $76.02, which is very similar to our 77.90. Let’s now run another test to see if we get similar results. In analytics, it’s always helpful to arrive at the same conclusion via different means, if possible. This helps solidify our results. Here we can run a standard T-test. Yes, yes, for those other analyst reading, a t-test is the same metric used in the OLS (hence the t-statistic it produces). Here, however, let’s run it on the unbalanced dataset to ensure we didn’t miss anything in sampling our data (perhaps we sampled really good or really bad data that will skew our results).

很接近。 模型2的系数等于$ 76.02，与我们的77.90非常相似。 现在让我们运行另一个测试，看看是否获得相似的结果。 在分析中，如果可能的话，通过不同的方法得出相同的结论总是有帮助的。 这有助于巩固我们的结果。 在这里，我们可以运行标准的T检验。 是的，是的，对于其他分析师而言，t检验与OLS中使用的度量相同(因此它会产生t统计量)。 但是，在这里，让我们在不平衡的数据集上运行它，以确保我们在采样数据时不会遗漏任何东西(也许我们采样的是好数据还是坏数据会歪曲我们的结果)。

t_test <- t.test(data_PY$Sales[data_PY$Post == 1],data_PY$Sales[data_PY$Post == 0] ) t_test## ## Welch Two Sample t-test ## ## data: data_PY$Sales[data_PY$Post == 1] and data_PY$Sales[data_PY$Post == 0] ## t = 2.5407, df = 89.593, p-value = 0.01278 ## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval: ## 13.3264 108.9259 ## sample estimates: ## mean of x mean of y ## 3544.970 3483.844summary(t_test)## Length Class Mode ## statistic 1 -none- numeric ## parameter 1 -none- numeric ## p.value 1 -none- numeric ## conf.int 2 -none- numeric ## estimate 2 -none- numeric ## null.value 1 -none- numeric ## stderr 1 -none- numeric ## alternative 1 -none- character ## method 1 -none- character ## data.name 1 -none- characterggplot(data = data_PY, aes(Post, Sales, color = Post)) + geom_boxplot() + geom_jitter()

Again, we receive a promising result. On all of the data, our t-statistic was 2.54, meaning we reject the null hypothesis that the difference in the means between the two groups is 0. Our T-test produces a confidence interval of [13.33, 89.59]. This interval includes our previous findings, once again giving us some added confidence. So now we know our FaceBook posts are actually benefiting the business by generating additional daily sales. However, we also saw earlier that our company hasn’t been posting reguarly (which is why we had to rebalance the data).

再次，我们收到了可喜的结果。 在所有数据上，我们的t统计量为2.54，这意味着我们拒绝零假设，即两组之间的均值差为0。我们的T检验得出的置信区间为[13.33，89.59]。 这个间隔包括我们以前的发现，再次给了我们更多的信心。 因此，现在我们知道我们的FaceBook帖子实际上通过产生额外的每日销售额而使业务受益。 但是，我们还早些时候看到我们的公司并没有进行过定期过账(这就是我们必须重新平衡数据的原因)。

length(data_PY$Sales[data_PY$Post == 0])## [1] 281ggplot(data = data.frame(post=as.factor(c('No Post','Post')), m=c(280, 54)) ,aes(x=post, y=m)) + geom_bar(stat='identity', fill='dodgerblue3')

Let’s create a function that takes two inputs: 1) % of additional days advertised, 2) % of advertisements that were effective. The reason for the second argument is that it’s likely unrealistic to assume all over our ads are effective. Indeed, it’s likely we have diminishing returns with more posts (as people probably tire of seeing them or block them in their feed if they become too frequent). Limiting effectiveness will give us some sense of a more reasonable estimate of lost revenue. Another benefit of creating a custom function is that we can quickly re-run the calculations if management desires.

让我们创建一个接受两个输入的函数：1)额外广告投放天数的百分比，2)有效广告投放的百分比。 第二种说法的原因是，假设我们所有的广告都有效是不现实的。 确实，我们可能会通过发布更多的帖子来减少报酬(因为人们可能会厌倦看到它们，或者如果它们变得太频繁就会阻止它们进入供稿)。 限制有效性将使我们对损失的收入进行更合理的估算。 创建自定义函数的另一个好处是，如果管理层需要，我们可以快速重新运行计算。

#construct a 95% confidence interval around Post1 coefficient conf_interval = confint(model2, "Post1", .90) missed_revenue <- function(pct_addlt_adv, pct_effective){ min = pct_addlt_adv * pct_effective * 280 * mean_sales_no_post * conf_interval[1] mean = pct_addlt_adv * pct_effective * 280 * mean_sales_no_post * model2$coefficients['Post1'] max = pct_addlt_adv * pct_effective * 280 * mean_sales_no_post * conf_interval[2] print(paste(pct_addlt_adv * 280, "additional days of advertising")) sprintf("$%.2f -- $%.2f -- $%.2f",max(min, 0), mean, max } #Missed_revenue(% of additional days advertised, % of advertisements that were effective) missed_revenue(.5, .7)## [1] "140 additional days of advertising"## [1] "$2849.38 -- $7449.57 -- $12049.75"

So if our company had advertised half of the days they didn’t, and only 70% of those adds were effective, we’d have missed out on an average of $7,449.57.

因此，如果我们的公司在一半的时间里没有刊登广告，而其中只有70％的广告有效，那么我们平均会错失$ 7,449.57。

Originally published at http://lowhangingfruitanalytics.com on August 21, 2020.

最初于 2020年8月21日 发布在 http://lowhangingfruitanalytics.com 上。

翻译自: https://medium.com/the-innovation/facebook-advertising-analysis-3bedca07d7fe

接facebook广告

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.mzph.cn/news/391551.shtml

如若内容造成侵权/违法违规/事实不符，请联系多彩编程网进行投诉反馈email:809451989@qq.com，一经查实，立即删除！

如何创建自定义进度栏

Originally published on www.florin-pop.com最初发布在www.florin-pop.com The theme for week #14 of the Weekly Coding Challenge is: 每周编码挑战第14周的主题是： 进度条 (Progress Bar) A progress bar is used to show how far a user action is still in…

基于SpringBoot的CodeGenerator

title: 基于SpringBoot的CodeGenerator tags: SpringBootMybatis生成器PageHelper categories: springboot date: 2017-11-21 15:13:33背景目前组织上对于一个基础的crud的框架需求较多因此选择了SpringBoot作为基础选型。 Spring Boot是由Pivotal团队提供的全新框架&#xf…

seaborn线性关系数据可视化：时间线图|热图|结构化图表可视化

一、线性关系数据可视化lmplot( ) 表示对所统计的数据做散点图，并拟合一个一元线性回归关系。 lmplot(x, y, data, hueNone, colNone, rowNone, paletteNone,col_wrapNone, height5, aspect1,markers"o", sharexTrue,shareyTrue, hue_orderNone, col_orde…

hdu 1257

http://acm.hdu.edu.cn/showproblem.php?pid1257 题意：有个拦截系统，这个系统在最开始可以拦截任意高度的导弹，但是之后只能拦截不超过这个导弹高度的导弹，现在有N个导弹需要拦截，问你最少需要多少个拦截系统思路&am…

eda可视化_5用于探索性数据分析（EDA）的高级可视化

eda可视化Early morning, a lady comes to meet Sherlock Holmes and Watson. Even before the lady opens her mouth and starts telling the reason for her visit, Sherlock can tell a lot about a person by his sheer power of observation and deduction. Similarly, we…

我的AWS开发人员考试未通过。现在怎么办？

I have just taken the AWS Certified Developer - Associate Exam on July 1st of 2019. The result? I failed.我刚刚在2019年7月1日参加了AWS认证开发人员-联考。结果如何？ 我失败了。 The AWS Certified Developer - Associate (DVA-C01) has a scaled score …

关系数据可视化gephi

表示对象之间的关系，可通过gephi软件实现，软件下载官方地址https://gephi.org/users/download/ 如何来表示两个对象之间的关系？ 把对象变成点，点的大小、颜色可以是它的两个参数，两个点之间的关系可以用连线表示。连线…

Hyperledger Fabric 1.0 从零开始（十二）——fabric-sdk-java应用

Hyperledger Fabric 1.0 从零开始（十）——智能合约（参阅：Hyperledger Fabric Chaincode for Operators——实操智能合约） Hyperledger Fabric 1.0 从零开始（十一）——CouchDB（参阅&a…

css跑道_如何不超出跑道：计划种子的简单方法

css跑道There’s lots of startup advice floating around. I’m going to give you a very practical one that’s often missed — how to plan your early growth. The seed round is usually devoted to finding your product-market fit, meaning you start with no or li…

将json 填入表格_如何将Google表格用作JSON端点

将json 填入表格UPDATE: 5/13/2020 - New Share Dialog Box steps available below.更新：5/13/2020-下面提供了新共享对话框步骤。 Thanks Erica H!谢谢埃里卡H！ Are you building a prototype dynamic web application and need to collaborate with …

leetcode 173. 二叉搜索树迭代器

实现一个二叉搜索树迭代器类BSTIterator ，表示一个按中序遍历二叉搜索树（BST）的迭代器： BSTIterator(TreeNode root) 初始化 BSTIterator 类的一个对象。BST 的根节点 root 会作为构造函数的一部分给出。指针应初始化为一个不存在…

jyputer notebook 、jypyter、IPython basics

1 、修改jupyter默认工作目录：打开cmd，在命令行下指定想要进的工作目录，即键入“cd d/ G:\0工作面试\学习记录”标红部分是想要进入的工作目录。 2、Tab补全 a、在命令行输入表达式时，按下Tab键即可为任意变量（对象、…

cookie和session（1）

cookie和session 1.cookie产生识别用户 HTTP是无状态协议，这就回出现这种现象：当你登录一个页面，然后转到登录网站的另一个页面，服务器无法认识到。或者说两次的访问，服务器不能认识到是同一个客户端的访问&#xff0…

熊猫数据集_为数据科学拆箱熊猫

熊猫数据集If you are already familiar with NumPy, Pandas is just a package build on top of it. Pandas provide more flexibility than NumPy to work with data. While in NumPy we can only store values of single data type(dtype) Pandas has the flexibility to st…