接facebook广告_Facebook广告分析

接facebook广告

Is our company’s Facebook advertising even worth the effort?

我们公司的Facebook广告是否值得努力?

题: (QUESTION:)

A company would like to know if their advertising is effective. Before you start, yes…. Facebook does have analytics for users who actually utilize their advertising platform. Our customer does not. Their “advertisements” are posts on their feed and are not marketed by Facebook.

公司想知道他们的广告是否有效。 开始之前,是的。 Facebook确实为实际使用其广告平台的用户提供了分析。 我们的客户没有。 他们的“广告”是他们的供稿上的帖子,并不由Facebook进行营销。

数据: (DATA:)

Data is from the client’s POS system and their Facebook feed.

数据来自客户的POS系统及其Facebook提要。

模型: (MODEL:)

KISS. A simple linear model will suffice.

吻。 一个简单的线性模型就足够了。

First, we need to obtain our data. We can use a nice Facebook scraper to scrape the last posts in a usable format.

首先,我们需要获取数据。 我们可以使用一个不错的Facebook抓取工具以可用的格式抓取最后的帖子。

#install & load scraper !pip install facebook_scraper from facebook_scraper import get_posts import pandas as pd

Lets first scrape the posts from the first 200 posts of their Facebook page.

首先让我们从其Facebook页面的前200个帖子中抓取这些帖子。

#scrape post_list = [] for post in get_posts('clients_facebook_page', pages=200): post_list.append(post)#View the data print(post_list[0].keys()) print("Number of Posts: {}".format(len(post_list)))## dict_keys(['post_id', 'text', 'post_text', 'shared_text', 'time', 'image', 'video', 'video_thumbnail', 'likes', 'comments', 'shares', 'post_url', 'link']) ## Number of Posts: 38

Lets clean up the list, keeping only Time, Image, Likes, Comments, Shares.

让我们清理列表,仅保留时间,图像,顶,评论,共享。

post_list_cleaned = [] for post in post_list: #create a list of indexes to keep temp = [] indexes_to_keep = ['time', 'image', 'likes', 'comments', 'shares'] for key in indexes_to_keep: temp.append(post[key]) post_list_cleaned.append(temp) #Remove image hyperlink, replace with 0, 1 & recast date for post in post_list_cleaned: if post[1] == None: post[1] = 0 else: post[1] = 1 post[0] = post[0].date

We now need to combine the Facebook data with data from the company’s POS system.

现在,我们需要将Facebook数据与该公司POS系统中的数据结合起来。

#turn into a DataFrame fb_posts_df = pd.DataFrame(post_list_cleaned) fb_posts_df.columns = ['Date', 'image', 'likes', 'comments', 'shares'] #import our POS data daily_sales_df = pd.read_csv('daily_sales.csv') #merge both sets of data combined_df = pd.merge(daily_sales_df, fb_posts_df, on='Date', how='outer')

Finally, lets export the data to a csv. We’ll do our modeling in R.

最后,让我们将数据导出到csv。 我们将在R中进行建模。

combined_df.to_csv('data.csv')

R分析 (R Analysis)

First, lets import our data from python. We then need to ensure the variables are cast appropriate (ie, dates are actual datetime fields and not just strings’). Finally, we are only conserned with data since the start of 2019.

首先,让我们从python导入数据。 然后,我们需要确保变量被正确地转换(即,日期是实际的日期时间字段,而不仅仅是字符串')。 最后,自2019年初以来,我们只了解数据。

library(readr) library(ggplot2) library(gvlma) #set a seed to be reproducable data <- read.table("data.csv", header = TRUE, sep = ",") data <- as.data.frame(data) #rename 'i..Date' to 'Date' names(data)[1] <- c("Date") #set data types data$Sales <- as.numeric(data$Sales) data$Date <- as.Date(data$Date, "%m/%d/%Y") data$Image <- as.factor(data$Image) data$Post <- as.factor(data$Post) #create a set of only 2019+ data data_PY = data[data$Date >= '2019-01-01',] head(data)

6 rows

6排

head(data_PY)

6 rows

6排

summary(data_PY)## Date Sales Post Image Likes ## Min. :2019-01-02 Min. :3181 0:281 0:287 Min. : 0.000 ## 1st Qu.:2019-04-12 1st Qu.:3370 1: 64 1: 58 1st Qu.: 0.000 ## Median :2019-07-24 Median :3456 Median : 0.000 ## Mean :2019-07-24 Mean :3495 Mean : 3.983 ## 3rd Qu.:2019-11-02 3rd Qu.:3606 3rd Qu.: 0.000 ## Max. :2020-02-15 Max. :4432 Max. :115.000 ## Comments Shares ## Min. : 0.0000 Min. :0 ## 1st Qu.: 0.0000 1st Qu.:0 ## Median : 0.0000 Median :0 ## Mean : 0.3101 Mean :0 ## 3rd Qu.: 0.0000 3rd Qu.:0 ## Max. :19.0000 Max. :0

Now that our data’s in, let’s review our summary. We can see our data starts on Jan. 2, 2019 (as we hoped), but we do see one slight problem. When we look at the Post variable, we see it’s highly imbalanced. We have 281 days with no posts and only 64 days with posts. We should re-balance our dataset before doing more analysis to ensure our results aren’t skewed. I’ll rebalance our data by sampling from the larger group (days with no posts), known as undersampling. I’ll also set a random seed so that our numbers are reproducible.

现在已经有了我们的数据,让我们回顾一下摘要。 我们可以看到我们的数据从2019年1月2日开始(我们希望如此),但是我们确实看到了一个小问题。 当我们查看Post变量时,我们发现它高度不平衡。 我们有281天无帖子,只有64天有帖子。 在进行更多分析之前,我们应该重新平衡我们的数据集,以确保结果不偏斜。 我将通过从较大的组(无帖子的日子)中进行抽样来重新平衡数据,这被称为欠抽样。 我还将设置一个随机种子,以便我们的数字可重现。

set.seed(15) zeros = data_PY[data_PY$Post == 0,] samples = sample(281, size = (345-281), replace = FALSE) zeros = zeros[samples, ] balanced = rbind(zeros, data_PY[data_PY$Post == 1,]) summary(balanced$Post)## 0 1 ## 64 64

Perfect, now our data is balanced. We should also do some EDA on our dependent variable (Daily Sales). It’s a good idea to know what our distribution looks like and if we have outliers we should address.

完美,现在我们的数据已经平衡了。 我们还应该对我们的因变量(每日销售)进行一些EDA。 知道我们的分布是什么样子是一个好主意,如果有异常值我们应该解决。

hist(balanced$Sales)
Image for post
boxplot(balanced$Sales)

We can see our data is slightly skewed. Sadly, with real-world data our data is never a perfect normal distribution… Luckily though, we appear to have no outliers in our boxplot. Now we can begin modeling. Since we’re interested in understanding the dynamics of the system and not actually classifying or predicting, we’ll use a standard regression model.

我们可以看到我们的数据略有倾斜。 遗憾的是,对于现实世界的数据,我们的数据永远不会是完美的正态分布……但是幸运的是,我们的箱线图中似乎没有异常值。 现在我们可以开始建模了。 由于我们有兴趣了解系统的动态特性,而不是实际进行分类或预测,因此我们将使用标准回归模型。

model1 <- lm(data=balanced, Sales ~ Post) summary(model1)## ## Call: ## lm(formula = Sales ~ Post, data = balanced) ## ## Residuals: ## Min 1Q Median 3Q Max ## -316.22 -114.73 -29.78 111.17 476.49 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 3467.1 20.5 169.095 < 2e-16 *** ## Post1 77.9 29.0 2.687 0.00819 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 164 on 126 degrees of freedom ## Multiple R-squared: 0.05418, Adjusted R-squared: 0.04667 ## F-statistic: 7.218 on 1 and 126 DF, p-value: 0.008193gvlma(model1)## ## Call: ## lm(formula = Sales ~ Post, data = balanced) ## ## Coefficients: ## (Intercept) Post1 ## 3467.1 77.9 ## ## ## ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS ## USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM: ## Level of Significance = 0.05 ## ## Call: ## gvlma(x = model1) ## ## Value p-value Decision ## Global Stat 8.351e+00 0.07952 Assumptions acceptable. ## Skewness 6.187e+00 0.01287 Assumptions NOT satisfied! ## Kurtosis 7.499e-01 0.38651 Assumptions acceptable. ## Link Function -1.198e-13 1.00000 Assumptions acceptable. ## Heteroscedasticity 1.414e+00 0.23435 Assumptions acceptable.

Using a standard linear model, we obtain a result that says, on average, a FaceBook post increases daily sales by $77.90. We can see, based on the t-statistic and p-value that this result is highly statistically significant. We can use the GVLMA feature to ensure our model passes the underlying OLS assumptions. Here, we see we pass on all levels except skewness. We already identified earlier that skewness may be a problem with our data. A common correction for skewness is a log transformation. Let’s transform our dependent variable and see if it helps. Note that this model (a log-lin model) will produce coefficients with different interpretations than our last model.

使用标准线性模型,我们得出的结果表明,平均而言,FaceBook发布使每日销售额增加77.90美元。 根据t统计量和p值,我们可以看到此结果在统计上非常重要。 我们可以使用GVLMA功能来确保我们的模型通过基本的OLS假设。 在这里,我们看到除了偏斜度之外,我们都通过了所有级别。 前面我们已经确定偏斜可能是我们数据的问题。 偏度的常见校正是对数变换。 让我们转换我们的因变量,看看是否有帮助。 请注意,此模型(对数线性模型)将产生与上一个模型具有不同解释的系数。

model2 <- lm(data=balanced, log(Sales) ~ Post) summary(model2)## ## Call: ## lm(formula = log(Sales) ~ Post, data = balanced) ## ## Residuals: ## Min 1Q Median 3Q Max ## -0.092228 -0.032271 -0.007508 0.032085 0.129686 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 8.150154 0.005777 1410.673 < 2e-16 *** ## Post1 0.021925 0.008171 2.683 0.00827 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.04622 on 126 degrees of freedom ## Multiple R-squared: 0.05406, Adjusted R-squared: 0.04655 ## F-statistic: 7.201 on 1 and 126 DF, p-value: 0.008266gvlma(model2)## ## Call: ## lm(formula = log(Sales) ~ Post, data = balanced) ## ## Coefficients: ## (Intercept) Post1 ## 8.15015 0.02193 ## ## ## ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS ## USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM: ## Level of Significance = 0.05 ## ## Call: ## gvlma(x = model2) ## ## Value p-value Decision ## Global Stat 7.101e+00 0.13063 Assumptions acceptable. ## Skewness 4.541e+00 0.03309 Assumptions NOT satisfied! ## Kurtosis 1.215e+00 0.27030 Assumptions acceptable. ## Link Function -6.240e-14 1.00000 Assumptions acceptable. ## Heteroscedasticity 1.345e+00 0.24614 Assumptions acceptable.plot(model2)
Image for post
Image for post
Image for post
Image for post
hist(log(balanced$Sales))

Our second model produces another highly significant coefficient for the FaceBook post variable. Here we see that each post is associated with an average 2.19% increase in daily sales. Unfortunately, even our log transformation was unable to correct for the skewness in our data. We’ll have to note this when presenting our findings later. Let’s now determine how much the 2.19% is actually worth (since we saw in model 1 that a post was worth $77.90).

我们的第二个模型为FaceBook post变量生成了另一个非常重要的系数。 在这里,我们看到每个帖子都与日均销售额平均增长2.19%相关。 不幸的是,即使我们的对数转换也无法纠正数据中的偏斜。 稍后介绍我们的发现时,我们必须注意这一点。 现在让我们确定2.19%的实际价值是多少(因为我们在模型1中看到,某帖子的价值为77.90美元)。

mean_sales_no_post <- mean(balanced$Sales[balanced$Post == 0]) mean_sales_with_post <- mean(balanced$Sales[balanced$Post == 1]) mean_sales_no_post * model1$coefficients['Post1']## Post1 ## 270086.1

Very close. Model 2’s coefficient equates to $76.02, which is very similar to our 77.90. Let’s now run another test to see if we get similar results. In analytics, it’s always helpful to arrive at the same conclusion via different means, if possible. This helps solidify our results. Here we can run a standard T-test. Yes, yes, for those other analyst reading, a t-test is the same metric used in the OLS (hence the t-statistic it produces). Here, however, let’s run it on the unbalanced dataset to ensure we didn’t miss anything in sampling our data (perhaps we sampled really good or really bad data that will skew our results).

很接近。 模型2的系数等于$ 76.02,与我们的77.90非常相似。 现在让我们运行另一个测试,看看是否获得相似的结果。 在分析中,如果可能的话,通过不同的方法得出相同的结论总是有帮助的。 这有助于巩固我们的结果。 在这里,我们可以运行标准的T检验。 是的,是的,对于其他分析师而言,t检验与OLS中使用的度量相同(因此它会产生t统计量)。 但是,在这里,让我们在不平衡的数据集上运行它,以确保我们在采样数据时不会遗漏任何东西(也许我们采样的是好数据还是坏数据会歪曲我们的结果)。

t_test <- t.test(data_PY$Sales[data_PY$Post == 1],data_PY$Sales[data_PY$Post == 0] ) t_test## ## Welch Two Sample t-test ## ## data: data_PY$Sales[data_PY$Post == 1] and data_PY$Sales[data_PY$Post == 0] ## t = 2.5407, df = 89.593, p-value = 0.01278 ## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval: ## 13.3264 108.9259 ## sample estimates: ## mean of x mean of y ## 3544.970 3483.844summary(t_test)## Length Class Mode ## statistic 1 -none- numeric ## parameter 1 -none- numeric ## p.value 1 -none- numeric ## conf.int 2 -none- numeric ## estimate 2 -none- numeric ## null.value 1 -none- numeric ## stderr 1 -none- numeric ## alternative 1 -none- character ## method 1 -none- character ## data.name 1 -none- characterggplot(data = data_PY, aes(Post, Sales, color = Post)) + geom_boxplot() + geom_jitter()

Again, we receive a promising result. On all of the data, our t-statistic was 2.54, meaning we reject the null hypothesis that the difference in the means between the two groups is 0. Our T-test produces a confidence interval of [13.33, 89.59]. This interval includes our previous findings, once again giving us some added confidence. So now we know our FaceBook posts are actually benefiting the business by generating additional daily sales. However, we also saw earlier that our company hasn’t been posting reguarly (which is why we had to rebalance the data).

再次,我们收到了可喜的结果。 在所有数据上,我们的t统计量为2.54,这意味着我们拒绝零假设,即两组之间的均值差为0。我们的T检验得出的置信区间为[13.33,89.59]。 这个间隔包括我们以前的发现,再次给了我们更多的信心。 因此,现在我们知道我们的FaceBook帖子实际上通过产生额外的每日销售额而使业务受益。 但是,我们还早些时候看到我们的公司并没有进行过定期过账(这就是我们必须重新平衡数据的原因)。

length(data_PY$Sales[data_PY$Post == 0])## [1] 281ggplot(data = data.frame(post=as.factor(c('No Post','Post')), m=c(280, 54)) ,aes(x=post, y=m)) + geom_bar(stat='identity', fill='dodgerblue3')

Let’s create a function that takes two inputs: 1) % of additional days advertised, 2) % of advertisements that were effective. The reason for the second argument is that it’s likely unrealistic to assume all over our ads are effective. Indeed, it’s likely we have diminishing returns with more posts (as people probably tire of seeing them or block them in their feed if they become too frequent). Limiting effectiveness will give us some sense of a more reasonable estimate of lost revenue. Another benefit of creating a custom function is that we can quickly re-run the calculations if management desires.

让我们创建一个接受两个输入的函数:1)额外广告投放天数的百分比,2)有效广告投放的百分比。 第二种说法的原因是,假设我们所有的广告都有效是不现实的。 确实,我们可能会通过发布更多的帖子来减少报酬(因为人们可能会厌倦看到它们,或者如果它们变得太频繁就会阻止它们进入供稿)。 限制有效性将使我们对损失的收入进行更合理的估算。 创建自定义函数的另一个好处是,如果管理层需要,我们可以快速重新运行计算。

#construct a 95% confidence interval around Post1 coefficient conf_interval = confint(model2, "Post1", .90) missed_revenue <- function(pct_addlt_adv, pct_effective){ min = pct_addlt_adv * pct_effective * 280 * mean_sales_no_post * conf_interval[1] mean = pct_addlt_adv * pct_effective * 280 * mean_sales_no_post * model2$coefficients['Post1'] max = pct_addlt_adv * pct_effective * 280 * mean_sales_no_post * conf_interval[2] print(paste(pct_addlt_adv * 280, "additional days of advertising")) sprintf("$%.2f -- $%.2f -- $%.2f",max(min, 0), mean, max } #Missed_revenue(% of additional days advertised, % of advertisements that were effective) missed_revenue(.5, .7)## [1] "140 additional days of advertising"## [1] "$2849.38 -- $7449.57 -- $12049.75"

So if our company had advertised half of the days they didn’t, and only 70% of those adds were effective, we’d have missed out on an average of $7,449.57.

因此,如果我们的公司在一半的时间里没有刊登广告,而其中只有70%的广告有效,那么我们平均会错失$ 7,449.57。

Originally published at http://lowhangingfruitanalytics.com on August 21, 2020.

最初于 2020年8月21日 发布在 http://lowhangingfruitanalytics.com 上。

翻译自: https://medium.com/the-innovation/facebook-advertising-analysis-3bedca07d7fe

接facebook广告

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/391551.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

seaborn线性关系数据可视化:时间线图|热图|结构化图表可视化

一、线性关系数据可视化lmplot( ) 表示对所统计的数据做散点图&#xff0c;并拟合一个一元线性回归关系。 lmplot(x, y, data, hueNone, colNone, rowNone, paletteNone,col_wrapNone, height5, aspect1,markers"o", sharexTrue,shareyTrue, hue_orderNone, col_orde…

eda可视化_5用于探索性数据分析(EDA)的高级可视化

eda可视化Early morning, a lady comes to meet Sherlock Holmes and Watson. Even before the lady opens her mouth and starts telling the reason for her visit, Sherlock can tell a lot about a person by his sheer power of observation and deduction. Similarly, we…

Hyperledger Fabric 1.0 从零开始(十二)——fabric-sdk-java应用

Hyperledger Fabric 1.0 从零开始&#xff08;十&#xff09;——智能合约&#xff08;参阅&#xff1a;Hyperledger Fabric Chaincode for Operators——实操智能合约&#xff09; Hyperledger Fabric 1.0 从零开始&#xff08;十一&#xff09;——CouchDB&#xff08;参阅&a…

css跑道_如何不超出跑道:计划种子的简单方法

css跑道There’s lots of startup advice floating around. I’m going to give you a very practical one that’s often missed — how to plan your early growth. The seed round is usually devoted to finding your product-market fit, meaning you start with no or li…

熊猫数据集_为数据科学拆箱熊猫

熊猫数据集If you are already familiar with NumPy, Pandas is just a package build on top of it. Pandas provide more flexibility than NumPy to work with data. While in NumPy we can only store values of single data type(dtype) Pandas has the flexibility to st…

JAVA基础——时间Date类型转换

在java中有六大时间类&#xff0c;分别是&#xff1a; 1、java.util包下的Date类&#xff0c; 2、java.sql包下的Date类&#xff0c; 3、java.text包下的DateFormat类&#xff0c;&#xff08;抽象类&#xff09; 4、java.text包下的SimpleDateFormat类&#xff0c; 5、java.ut…

LeetCode第五天

leetcode 第五天 2018年1月6日 22.(566) Reshape the Matrix JAVA class Solution {public int[][] matrixReshape(int[][] nums, int r, int c) {int[][] newNums new int[r][c];int size nums.length*nums[0].length;if(r*c ! size)return nums;for(int i0;i<size;i){ne…

matplotlib可视化_使用Matplotlib改善可视化设计的5个魔术技巧

matplotlib可视化It is impossible to know everything, no matter how much our experience has increased over the years, there are many things that remain hidden from us. This is normal, and maybe an exciting motivation to search and learn more. And I am sure …

robot:循环遍历数据库查询结果是否满足要求

使用list类型变量{}接收查询结果&#xff0c;再for循环遍历每行数据&#xff0c;取出需要比较的数值 转载于:https://www.cnblogs.com/gcgc/p/11424114.html

rm命令

命令 ‘rm’ &#xff08;remove&#xff09;&#xff1a;删除一个目录中的一个或多个文件或目录&#xff0c;也可以将某个目录及其下属的所有文件及其子目录均删除掉 语法&#xff1a;rm&#xff08;选项&#xff09;&#xff08;参数&#xff09; 默认会提示‘是否’删除&am…

感知器 机器学习_机器学习感知器实现

感知器 机器学习In this post, we are going to have a look at a program written in Python3 using numpy. We will discuss the basics of what a perceptron is, what is the delta rule and how to use it to converge the learning of the perceptron.在本文中&#xff0…

Python之集合、解析式,生成器,函数

一 集合 1 集合定义&#xff1a; 1 如果花括号为空&#xff0c;则是字典类型2 定义一个空集合&#xff0c;使用set 加小括号使用B方式定义集合时&#xff0c;集合内部的数必须是可迭代对象&#xff0c;数值类型的不可以 其中的值必须是可迭代对象&#xff0c;其中的元素必须是可…

python:如何传递一个列表参数

转载于:https://www.cnblogs.com/gcgc/p/11426356.html

curl的安装与简单使用

2019独角兽企业重金招聘Python工程师标准>>> windows 篇&#xff1a; 安装篇&#xff1a; 我的电脑版本是windows7,64位&#xff0c;对应的curl下载地址如下&#xff1a; https://curl.haxx.se/download.html 直接找到下面的这个版本&#xff1a; curl-7.57.0.tar.g…

gcc 编译过程

gcc 编译过程从 hello.c 到 hello(或 a.out)文件&#xff0c; 必须历经 hello.i、 hello.s、 hello.o&#xff0c;最后才得到 hello(或a.out)文件&#xff0c;分别对应着预处理、编译、汇编和链接 4 个步骤&#xff0c;整个过程如图 10.5 所示。 这 4 步大致的工作内容如下&am…

虎牙直播电影一天收入_电影收入

虎牙直播电影一天收入“美国电影协会(MPAA)的首席执行官J. Valenti提到&#xff1a;“没有人能告诉您电影在市场上的表现。 直到电影在黑暗的剧院里放映并且银幕和观众之间都散发出火花。 (“The CEO of Motion Picture Association of America (MPAA) J. Valenti mentioned th…

Python操作Mysql实例代码教程在线版(查询手册)_python

实例1、取得MYSQL的版本在windows环境下安装mysql模块用于python开发MySQL-python Windows下EXE安装文件下载 复制代码 代码如下:# -*- coding: UTF-8 -*- #安装MYSQL DB for pythonimport MySQLdb as mdb con None try: #连接mysql的方法&#xff1a;connect(ip,user,pass…

批判性思维_为什么批判性思维技能对数据科学家至关重要

批判性思维As Alexander Pope said, to err is human. By that metric, who is more human than us data scientists? We devise wrong hypotheses constantly and then spend time working on them just to find out how wrong we were.正如亚历山大波普(Alexander Pope)所说…

Manjaro 17 搭建 redis 4.0.1 集群服务

安装Redis在Linux环境中 这里我们用的是manjaro一个小众一些的发行版 我选用的是manjaro 17 KDE 如果你已经安装好了manjaro 那么你需要准备一个redis.tar.gz包 这里我选用的是截至目前最新的redis 4.0.1版本 我们可以在官网进行下载 https://redis.io/download选择Stable &…

快速排序简便记_建立和测试股票交易策略的快速简便方法

快速排序简便记Note from Towards Data Science’s editors: While we allow independent authors to publish articles in accordance with our rules and guidelines, we do not endorse each author’s contribution. You should not rely on an author’s works without se…