机器学习 客户流失_通过机器学习预测流失

机器学习 客户流失

介绍 (Introduction)

This article is part of a project for Udacity “Become a Data Scientist Nano Degree”. The Jupyter Notebook with the code for this project can be downloaded from GitHub.

本文是Udacity“成为数据科学家纳米学位”项目的一部分。 可以从GitHub下载带有该项目代码的Jupyter Notebook。

I will create a series of articles about this project going through CRISP-DM process. This part is covering the data and business understanding steps.

我将针对CRISP-DM流程创建有关该项目的一系列文章。 这一部分涵盖了数据和业务理解步骤。

业务理解 (Business Understanding)

Let’s imagine for a moment that we are freshly hired data scientists working for a startup called “Sparkify”, which offers music streaming service through their website and App.

让我们想象一下,我们刚招聘了一位数据科学家,为一家名为“ Sparkify”的创业公司工作,该公司通过其网站和App提供音乐流媒体服务。

Our first job is to prepare a presentation for the management meeting on business strategy. The meeting is going to be in several hours from now. We have about 10 minutes for our presentation there.

我们的第一项工作是为业务战略管理会议准备演示文稿。 会议将在几个小时后开始。 我们在那里大约有10分钟的演讲时间。

Clearly we want to impress our managers with our machine learning skills, but there is simply no time to clean all the data, not to mention run machine learning on the huge 12 GB log of the last two months of user activities.

显然,我们希望用我们的机器学习技能来打动我们的经理,但是根本没有时间清理所有数据,更不用说在最近两个月的用户活动中,在庞大的12 GB日志上运行机器学习。

We decide to take about 1% of users from the log and prepare some statistical analysis and visualisations to answer the questions we expect our managers to be most interested in, such as:

我们决定从日志中抽取大约1%的用户,并准备一些统计分析和可视化图表,以回答我们希望经理们最感兴趣的问题,例如:

  1. Usage patterns

    使用方式
  2. Business development

    业务发展
  3. Threats to the business

    对企业的威胁

1.使用方式 (1. Usage patterns)

As a streaming service of course we would like to know how many songs are played every day:

作为流媒体服务,我们当然想知道每天播放多少首歌曲:

Image for post

We can see that there are only about half as much songs being played around weekends and unsurprisingly there is a large spike around Halloween. To get a better feeling of the usage frequency let’s look at the and average number of unique users per weekday:

我们可以看到,周末前后只播放大约一半的歌曲,毫不奇怪,万圣节前后会有很大的高峰。 为了更好地了解使用频率,让我们看一下每个工作日的唯一身份用户数和平均数量:

Image for post

Another interesting question is the distribution of user activity throughout the day. Let’s have a look at the average number of songs played by the hour:

另一个有趣的问题是一天中用户活动的分布。 让我们看一下每小时播放的平均歌曲数:

Image for post

And the user activity:

和用户活动:

Image for post

使用情况摘要 (Summary usage statistics)

Let’s formulate the key insights from our analysis:

让我们从分析中得出关键见解:

  • We have seen that usage statistics follow a weekly pattern with less users using Sparkify on weekends.

    我们已经看到,使用情况统计信息遵循每周模式,周末使用Sparkify的用户减少了。
  • Unsurprisingly there is a spike in streams around Halloween.

    毫无疑问,万圣节前后的溪流激增。
  • Throughout the day the number of users remains almost constant with a slight increase between 1 and 7 p.m.

    整天的用户数量几乎保持不变,下午1点至晚上7点之间略有增加
  • The number of songs played per user throughout the day has a pattern where it follows daily activities: get up, way to work, start of work, lunch break etc.

    全天每位用户播放的歌曲数量遵循以下日常活动模式:起床,工作方式,工作开始,午餐休息时间等。

More important is to know what we can do with this insights:

更重要的是要知道我们可以用这些见解做什么:

  • We can optimise licence costs knowing how many songs will be played.

    我们可以知道要播放多少首歌曲,从而优化许可费用。
  • We can optimise the number of servers running throughout the day and week to save electricity and networking costs based on user activity.

    我们可以优化每天和每周运行的服务器数量,以根据用户活动节省电费和网络成本。
  • We can target our user communication to the time frames where they are most likely to use our service.

    我们可以将我们的用户交流定位到最有可能使用我们服务的时间范围。

2.业务发展 (2. Business development)

The main revenue source for Sparkify are periodical subscription fees from paying users. We would like to know how many users have actually used “paid” and how many used “free” options:

Sparkify的主要收入来源是来自付费用户的定期订阅费用。 我们想知道实际上有多少用户使用了“付费”选项,有多少用户使用了“免费”选项:

Image for post

Another source of revenue is playing advertising clips for free users. How many clips are played every week?

另一个收入来源是为免费用户播放广告片段。 每周播放几段剪辑?

Image for post

Let’s also see how many ads on average are displayed to each user:

我们还要查看平均向每个用户展示多少个广告:

Image for post

摘要业务发展 (Summary business development)

Let’s formulate the key insights and takeaways for our business.

让我们为我们的业务制定关键的见解和要点。

Key insights

重要见解

  • The number of paying customers is increasing in the observation period.

    在观察期内,付费客户的数量正在增加。
  • The number of adverts decreases.

    广告数量减少。
  • The number of free customers is decreasing.

    免费客户的数量正在减少。

Takeaways for business

外卖业务

  • The number of paying customers is not changing much after the first week. Probably we need to motivate people to switch to paid account by limited time offer or free trial.

    第一周后,付费客户的数量变化不大。 可能我们需要激励人们通过限时优惠或免费试用来切换到付费帐户。
  • The number of free customers is decreasing at quite high rate. It seems that the free account is not very attractive. We have to look at the reasons more closely. Are the adverts to frequent? Do free users have limited access to the music titles?

    免费客户的数量正在以很高的速度减少。 看来免费帐户不是很吸引人。 我们必须更仔细地研究原因。 广告频繁吗? 免费用户对音乐标题的访问受限吗?
  • Although the number of adverts is falling the number of adverts per user is increasing. Perhaps we have taken the wrong road here given that free users are probably choosing to leave the service over upgrading their account?

    尽管广告数量在减少,但每位用户的广告数量却在增加。 鉴于免费用户可能选择离开服务而不是升级其帐户,也许我们走错了路?

3.对企业的威胁 (3. Threats to the business)

Finally let’s look at the account level upgrades, downgrades and cancellations:

最后,让我们看一下帐户级别的升级,降级和取消:

Image for post

To have a more clear picture let’s see which account level do users who cancel their account have:

为了更清楚地了解情况,让我们看看取消帐户的用户具有哪个帐户级别:

Image for post

摘要业务威胁 (Summary business threats)

Let’s formulate the key insights and takeaways for our business.

让我们为我们的业务制定关键的见解和要点。

Key insights

重要见解

  • The number of upgrades spiked in the first week of observation.

    在观察的第一周内,升级数量激增。
  • The number of upgrades is declining during the period of observation.

    在观察期间,升级次数正在减少。
  • The number of downgrades has a small spike in the week 41 and is almost steady with decline near the end.

    降级的数量在第41周有一个小峰值,并且几乎是稳定的,并且在接近尾声时有所下降。
  • The number of cancellations is almost steady with a small spike around week 42 and decline near the end.

    取消的数量几乎是稳定的,在第42周左右有一个小峰值,并在接近尾声时下降。
  • Paying users are cancelling their accounts more often then free users.

    付费用户比免费用户更频繁地取消帐户。

Takeaways for business

外卖业务

  • Whatever we have done in the week 40 we must keep doing that!

    不管我们在40周内做了什么,我们都必须继续这样做!
  • We need to understand why less and less customers choose to upgrade their accounts.

    我们需要了解为什么越来越少的客户选择升级他们的帐户。
  • Although the downgrade and cancellation rates are falling we need pay more attention to them.

    尽管降级和取消率正在下降,但我们需要更加注意它们。
  • The fact that paying users are choosing to cancel their account rather than to downgrade them is alarming. What have we done wrong to make them angry?

    付费用户选择取消其帐户而不是降级他们的事实令人震惊。 我们做错了什么使他们生气?

结论:我们可以确定流失的原因吗? (Conclusion: can we identify reasons for churn?)

The presentation went well. Most of the people in the room were not of technical background. They were impressed by comprehensive visualisations and clearly formulated statements about the current situation.

演讲进行得很顺利。 房间里的大多数人都不是技术背景。 全面的可视化效果和清晰表达的有关当前状况的陈述给他们留下了深刻的印象。

The consequence is that the management is now worried about churn. They ask us to find the reasons why the customers, especially paying ones are cancelling their accounts.

结果是管理层现在担心流失。 他们要求我们找出客户(尤其是付费客户)取消帐户的原因。

We will have to run machine learning on our data and it will take some days to find the right techniques on the small subset of data and then maybe some weeks to run the algorithms on the full dataset.

我们将不得不对数据进行机器学习,这将需要几天的时间才能在较小的数据子集上找到正确的技术,然后可能需要数周的时间才能在完整的数据集上运行算法。

Using our intuition we can try to find a quick fix, which may help our company on a short notice. Let’s look at the statistics of rolling adverts:

利用我们的直觉,我们可以尝试找到快速解决方案,这可能会在短时间内为我们的公司提供帮助。 让我们看一下滚动广告的统计信息:

Image for post

It turns out paying customers still may see or hear an advert. Can it be the reason why they choose to quit? Perhaps our web developers should look into that issue.

事实证明,付费客户仍然可以看到或听到广告。 这可能是他们选择退出的原因吗? 也许我们的Web开发人员应该调查该问题。

In my next article I will focus on machine learning techniques and how can they be applied to predict churn based on usage statistics.

在我的下一篇文章中,我将重点介绍机器学习技术以及如何将其应用于基于使用情况统计信息的客户流失率。

翻译自: https://medium.com/@viovioviovioviovio/predict-churn-with-machine-learning-ea00b8a42011

机器学习 客户流失

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389933.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Qt中的坐标系统

转载:原野追逐 Qt使用统一的坐标系统来定位窗口部件的位置和大小。 以屏幕的左上角为原点即(0, 0)点,从左向右为x轴正向,从上向下为y轴正向,这整个屏幕的坐标系统就用来定位顶层窗口; 此外,窗口内部也有自己…

预测股票价格 模型_建立有马模型来预测股票价格

预测股票价格 模型前言 (Preface) If you are reading this, it’s most likely because you love to solve puzzles. I’m a very competitive person by nature. The Mt. Everest of puzzles, in my opinion, is trying to find excess returns through active trading in th…

Python 模块 timedatetime

time & datetime 模块 在平常的代码中,我们常常需要与时间打交道。在Python中,与时间处理有关的模块就包括:time,datetime,calendar(很少用,不讲),下面分别来介绍。 在开始之前,首先要说明几…

柠檬工会_工会经营者

柠檬工会Hey guys! This week we’ll be going over some ways to work with result sets in MySQL. These result sets are the outputs of your everyday queries, such as:大家好! 本周,我们将介绍一些在MySQL中处理结果集的方法。 这些结果集是您日常…

写给Java开发者看的JavaScript对象机制

帮助面向对象开发者理解关于JavaScript对象机制 本文是以一个熟悉OO语言的开发者视角,来解释JavaScript中的对象。 对于不了解JavaScript 语言,尤其是习惯了OO语言的开发者来说,由于语法上些许的相似会让人产生心理预期,JavaScrip…

大数据ab 测试_在真实数据上进行AB测试应用程序

大数据ab 测试Hello Everyone!大家好! I am back with another article about Data Science. In this article, I will write about what is A-B testing and how to use it on real life data-set to compare two advertisement methods.我回来了另一篇有关数据科…

node:爬虫爬取网页图片

前言 周末自己在家闲着没事,刷着微信,玩着手机,发现自己的微信头像该换了,就去网上找了一下头像,看着图片,自己就想着作为一个码农,可以把这些图片都爬取下来做成一个微信小程序,说干…

如何更好的掌握一个知识点_如何成为一个更好的讲故事的人3个关键点

如何更好的掌握一个知识点You’re launching a digital transformation initiative in the middle of the ongoing pandemic. You are pretty excited about this big-ticket investment, which has the potential to solve remote-work challenges that your organization fac…

centos 搭建jenkins+git+maven

gitmavenjenkins持续集成搭建发布人:[李源] 2017-12-08 04:33:37 一、搭建说明 系统:centos 6.5 jdk:1.8.0_144 jenkins:jenkins-2.93-1.1 git:git-2.9.0 maven:Maven 3.3.9 二、部署 2.1、jdk安装 1)下…

什么事数据科学_如果您想进入数据科学,则必须知道的7件事

什么事数据科学No way. No freaking way to enter data science any time soon…That is exactly what I thought a year back.没门。 很快就不会出现进入数据科学的怪异方式 ……这正是我一年前的想法。 A little bit about my data science story: I am a complete beginner…

Java基础-基本数据类型

Java中常见的转义字符: 某些字符前面加上\代表了一些特殊含义: \r :return 表示把光标定位到本行行首. \n :next 表示把光标定位到下一行同样的位置. 单独使用在某些平台上会产生不同的效果.通常这两个一起使用,即:\r\n. 表示换行. \t :tab键,长度上相当于四个或者是八个空格 …

季节性时间序列数据分析_如何指导时间序列数据的探索性数据分析

季节性时间序列数据分析为什么要进行探索性数据分析? (Why Exploratory Data Analysis?) You might have heard that before proceeding with a machine learning problem it is good to do en end-to-end analysis of the data by carrying a proper exploratory …

TortoiseGit上传项目到GitHub

1. 简介 gitHub是一个面向开源及私有软件项目的托管平台,因为只支持git 作为唯一的版本库格式进行托管,故名gitHub。 2. 准备 2.1 安装git:https://git-scm.com/downloads。无脑安装 2.2 安装TortoiseGit(小乌龟):https://torto…

利用PHP扩展Taint找出网站的潜在安全漏洞实践

一、背景 笔者从接触计算机后就对网络安全一直比较感兴趣,在做PHP开发后对WEB安全一直比较关注,2016时无意中发现Taint这个扩展,体验之后发现确实好用;不过当时在查询相关资料时候发现关注此扩展的人数并不多;最近因为…

美团骑手检测出虚假定位_在虚假信息活动中检测协调

美团骑手检测出虚假定位Coordination is one of the central features of information operations and disinformation campaigns, which can be defined as concerted efforts to target people with false or misleading information, often with some strategic objective (…

CertUtil.exe被利用来下载恶意软件

1、前言 经过国外文章信息,CertUtil.exe下载恶意软件的样本。 2、实现原理 Windows有一个名为CertUtil的内置程序,可用于在Windows中管理证书。使用此程序可以在Windows中安装,备份,删除,管理和执行与证书和证书存储相…

335. 路径交叉

335. 路径交叉 给你一个整数数组 distance 。 从 X-Y 平面上的点 (0,0) 开始,先向北移动 distance[0] 米,然后向西移动 distance[1] 米,向南移动 distance[2] 米,向东移动 distance[3] 米,持续移动。也就是说&#x…

回归分析假设_回归分析假设的最简单指南

回归分析假设The Linear Regression is the simplest non-trivial relationship. The biggest mistake one can make is to perform a regression analysis that violates one of its assumptions! So, it is important to consider these assumptions before applying regress…

Spring Aop之Advisor解析

2019独角兽企业重金招聘Python工程师标准>>> 在上文Spring Aop之Target Source详解中,我们讲解了Spring是如何通过封装Target Source来达到对最终获取的目标bean进行封装的目的。其中我们讲解到,Spring Aop对目标bean进行代理是通过Annotatio…

为什么随机性是信息

用位思考 (Thinking in terms of Bits) Imagine you want to send outcomes of 3 coin flips to your friends house. Your friend knows that you want to send him those messages but all he can do is get the answer of Yes/No questions arranged by him. Lets assume th…