华盛顿特区与其他地区的差别_使用华盛顿特区地铁数据确定可获利的广告位置...

华盛顿特区与其他地区的差别

深度分析 (In-Depth Analysis)

Living in Washington DC for the past 1 year, I have come to realize how WMATA metro is the lifeline of this vibrant city. The metro network is enormous and well-connected throughout the DMV area. When I first moved to the Capital city with no car, I often used to hop on the metro to get around. I have always loved train journeys and therefore unsurprisingly, metro became my most favorite way to explore this beautiful city. On my travels, I often notice the product placements and advertisements on metro platforms, near escalators/elevators, inside the metro trains, etc. A good analysis of the metro rider data would help the advertisers to identify which metro stops are the busiest at what times so as to increase the ad exposure. I chanced upon this free dataset and decided to plunge deep into it. In this article, I’ll walk you through my analysis.

在过去的一年中,住在华盛顿特区,我逐渐意识到WMATA地铁是这座充满活力的城市的生命线。 地铁网络非常庞大,并且在DMV区域内连接良好。 当我第一次没有汽车搬到首都时,我经常跳上地铁到处走走。 我一直喜欢火车旅行,因此毫不奇怪,地铁成为我探索这座美丽城市的最喜欢的方式。 在旅途中,我经常注意到地铁站台,自动扶梯/电梯附近,地铁列车内等的产品位置和广告。对地铁乘客数据的良好分析将有助于广告商确定哪些地铁站最繁忙时间,以增加广告曝光率。 我偶然发现了这个免费数据集,并决定深入其中。 在本文中,我将指导您进行分析。

Step 1: Importing necessary libraries

步骤1:导入必要的库

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore")
from wordcloud import WordCloud, STOPWORDS
from nltk.corpus import stopwords

Step 2: Reading the data

步骤2:读取资料

Let us call our pandas dataframe as ‘df_metro’ which will contain the original data.

让我们将熊猫数据框称为“ df_metro”,它将包含原始数据。

df_metro = pd.read_csv("DC MetroData.csv"

Step 3: Eyeballing the data and length of the dataframe

步骤3:查看数据和数据帧的长度

df_metro.head()
Image for post
df_metro.columns
Image for post
len(df_metro)
Image for post

Step 4: Checking distinct values under different columns

步骤4:检查不同列下的不同值

Let us check what are the unique values in the column ‘Time’

让我们检查“时间”列中的唯一值是什么

df_metro['Time'].value_counts().sort_values()
Image for post

Unique values in the column ‘Day’ are as follows:

“天”列中的唯一值如下:

df_metro['Day'].value_counts().sort_values()
Image for post

Next step is to analyze few questions.

下一步是分析一些问题。

Q1。 什么是受欢迎的出入口? (Q1. What are the popular entrances and exits?)

The distinct count of records for each metro stop arranged in descending order will give us which are popular entrances and exits.

每个地铁站按降序排列的独特记录数将为我们提供受欢迎的出入口。

df_metro['Entrance'].value_counts().sort_values(ascending=False).head()
Image for post
df_metro['Exit'].value_counts().sort_values(ascending=False).head()
Image for post

Popular locations seem to be

热门地点似乎

  1. Gallery Place-Chinatown: Major attractions are Capital One Arena (drawing big crowds for sporting events and music concerts), restaurants, bars, etc.

    唐人街画廊广场:主要景点是首都一号竞技场(吸引大量体育赛事和音乐会),餐馆,酒吧等。

  2. Foggy Bottom: Government offices in the area makes it a popular commute destination

    有雾的底部:该地区的政府机关使其成为受欢迎的通勤目的地

  3. Pentagon City: Its location just 2 miles away from the National Mall in downtown Washington makes the area a popular site for hotels and businesses.

    五角大楼市:其位置距华盛顿市中心的国家购物中心仅2英里,使该地区成为酒店和企业的热门地点。

  4. Dupont Circle: International Embassies located in the area

    杜邦环岛:位于该地区的国际使馆

  5. Union Station: An important location for the long-distance travelers

    联合车站:长途旅行者的重要位置

  6. Metro center: A popular downtown location

    地铁中心:市中心热门地点

  7. Fort Totten: Its Metro station serves as a popular transfer point for the Green, Yellow and Red lines

    托滕堡(Fort Totten):其地铁站是绿线,黄线和红线的热门换乘点

Takeaway: Advertisers should target the above popular metro stations that have the high rider footfall to grab maximum buyer attention.

要点:广告商应该针对那些拥有较高人流的热门地铁站,以吸引最大的买家注意力。

Q2。 在一周的不同日期/时间,乘车情况如何? (Q2. What does the ridership look like during different days/times of the week?)

This can be answered by simply plotting the riders’ data across different days and times. We will make use of the seaborn library to create this viz.

只需绘制不同日期和时间的骑手数据即可解决。 我们将利用seaborn库来创建此viz。

sns.set_style("whitegrid") 
ax = sns.barplot(x="Day", y="Riders", hue="Time",
data = df_metro,
palette = "inferno_r")
ax.set(xlabel='Day', ylabel='# Riders')
plt.title("Rider Footfall on different Days/Times")
plt.show(ax)
Image for post

Takeaway: Metro is a popular choice of work commute in the city and therefore, as expected the rider footfall is the highest during the Weekday, particularly more so during AM Peak and PM Peak. Companies planning to roll out new products should target these slots to attract attention and generate interest in the consumers. For advertising opportunities during the weekend, the most attractive time slot seems to be Midday, closely followed by PM Peak.

要点:地铁是城市通勤的一种流行选择,因此,正如预期的那样,乘客的人流量在工作日期间最高,尤其是在AM Peak和PM Peak。 计划推出新产品的公司应针对这些广告位,以吸引注意力并引起消费者的兴趣。 对于周末的广告机会而言,最吸引人的时间段似乎是中午,紧随其后的是PM Peak。

Q3。 在典型的工作日中,哪些繁忙的路线? (Q3. What are the busy routes during a typical weekday?)

To analyze this question, we are going to consider a footfall of more than 500 riders at any given metro station. First, we will create a dataframe ‘busy_routes’ that contain data about routes with >500 riders. Second, we will filter this dataframe to contain data for only ‘AM Peak’. Third, we will sort this filtered output.

为了分析这个问题,我们将考虑在任何给定的地铁站有500多名乘客的人流。 首先,我们将创建一个数据框“ busy_routes”,其中包含有关骑行人数超过500人的数据。 其次,我们将过滤此数据框以仅包含“ AM Peak”的数据。 第三,我们将对过滤后的输出进行排序。

busy_routes = weekday[weekday['Riders']>500][['Merge', 'Time', 'Riders']]
peak_am = busy_routes.query('Time=="AM Peak"')
peak_am.sort_values('Riders').tail()
Image for post

Repeating the same steps for ‘PM Peak’.

对“ PM Peak”重复相同的步骤。

peak_pm = busy_routes.query('Time=="PM Peak"')
len(peak_pm)
peak_pm.sort_values('Riders').tail()
Image for post

Takeaway: We see that the routes with high footfall during AM Peak are the same with high footfall during the PM Peak such as West Falls Church — Farragut West, Vienna-Farragut West, Shady Grove — Farragut North. This tells us that these are the popular work commute routes as people going to work in Farragut during AM peak return to their homes in Vienna/Falls Church/Shady Grove during PM peak. Advertisers should target these high traffic commute routes to maximize on their advertisements and product placements.

要点:我们发现,在AM峰期间人流量大的路线与PM峰期间人流量大的路线相同,例如西瀑布教堂-西法拉格特,西维也纳-法拉古特,谢迪格罗夫-北法拉格特。 这告诉我们,这是最受欢迎的工作通勤路线,因为人们在AM高峰期间在Farragut上班,而在PM高峰期间返回维也纳/ Falls教堂/ Shady Grove的家中。 广告商应针对这些高流量的通勤路线,以最大程度地利用其广告和产品展示位置。

Q4。 周末有哪些热门的地铁路线? (Q4. What are the popular metro routes during the weekends?)

Let us perform a similar analysis as we did for the weekday. Since we are dealing with the weekend data here, we will consider metro stations with a footfall of more than 200 riders.

让我们进行与工作日相似的分析。 由于我们在这里处理周末数据,因此我们将考虑拥有200多名乘客的地铁站。

saturday = df_metro[df_metro['Day']=='Saturday']
busy_routes_sat = saturday[saturday['Riders']>200][['Merge', 'Time', 'Riders']]
busy_routes_sat.sort_values('Riders').tail()
Image for post
sunday = df_metro[df_metro['Day']=='Sunday']
busy_routes_sun = sunday[sunday['Riders']>200][['Merge', 'Time', 'Riders']]
busy_routes_sun.sort_values('Riders').tail()
Image for post

Takeaway: Smithsonian is an extremely popular destination with tourists as well as city-dwellers alike because of several museums and proximity to White House, The Capitol, national monuments, war memorials, etc. Our analysis tells us that the crowds head out from Crystal City, Pentagon City, Vienna, Franconia to the Smithsonian during the Midday, and return in the PM Peak. Most of these crowds are young families with kids which are an ideal audience for companies launching products meant for younger populations including children.

要点:史密森尼博物馆是一个非常受游客和城市居民欢迎的目的地,因为它拥有数个博物馆,而且邻近白宫,国会大厦,国家古迹,战争纪念馆等。我们的分析告诉我们,人群从水晶城出发,五角大楼市,维也纳,弗兰肯行政区到中午的史密森尼博物馆,然后在PM山顶返回。 这些人群中大多数是有孩子的年轻家庭,这是公司推出针对包括儿童在内的年轻人口产品的理想受众。

Q5。 作为广告客户,我应该在“深夜”中定位到哪些位置? (Q5. As an advertiser, which locations should I target during Late Night?)

We will do a similar analysis as above to identify which metro stations are ideal for putting out advertisements late in the night. For the ‘Late Night’, we will consider metro stations with a footfall of >50 riders.

我们将进行与上述类似的分析,以确定哪些地铁站最适合在深夜发布广告。 对于“深夜”,我们将考虑载客量超过50人的地铁站。

late_night = df_metro[df_metro['Day']=='Late Night']
busy_routes_latenight = late_night[late_night['Riders']>50][['Merge', 'Time', 'Riders']]
busy_routes_latenight.sort_values('Riders').tail()
Image for post

Takeaway: We see that late night the riders ride the metro from popular locations such as Gallery Place, Clarendon, Dupont Circle and U Street with a buzzing nightlife. Therefore, advertisers wanting to appeal to this section of the population (which normally would be a younger population) should potentially target these metro stations to grab maximum attention.

要点:我们看到深夜的时候,骑手们从热门场所(如Gallery Place,Clarendon,Dupont Circle和U Street)乘坐地铁,那里的夜生活很热闹。 因此,想要吸引这一部分人群(通常是较年轻的人群)的广告商应该以这些地铁站为目标,以吸引最大的关注。

Closing remarks: This dataset was fairly straightforward and hence, we did not spend a lot of time cleaning and wrangling the data. With the given data, we were able to find sweet spots that would ensure maximum moolah for advertisers’ money. Thanks for reading!

结束语:该数据集非常简单,因此,我们没有花费很多时间来清理和整理数据。 根据给定的数据,我们能够找到最佳点,以确保最大程度地减少广告客户的收入。 谢谢阅读!

翻译自: https://medium.com/@tanmayee92/identify-profitable-advertising-locations-using-washington-dc-metro-data-a03c5c4fc18f

华盛顿特区与其他地区的差别

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/390882.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Windows平台下kafka环境的搭建

近期在搞kafka,在Windows环境搭建的过程中遇到一些问题,把具体的流程几下来防止后面忘了。 准备工作: 1.安装jdk环境 http://www.oracle.com/technetwork/java/javase/downloads/index.html 2.下载kafka的程序安装包: http://kafk…

deeplearning.ai 改善深层神经网络 week2 优化算法

这一周的主题是优化算法。 1. Mini-batch: 上一门课讨论的向量化的目的是去掉for循环加速优化计算,X [x(1) x(2) x(3) ... x(m)],X的每一个列向量x(i)是一个样本,m是样本个数。但当样本很多时(比如m500万&#xff09…

gcc汇编汇编语言_什么是汇编语言?

gcc汇编汇编语言Assembly Language is the interface between higher level languages (C, Java, etc) and machine code (binary). For a compiled language, the compiler transforms higher level code into assembly language code.汇编语言是高级语言(C ,Java等…

铺装s路画法_数据管道的铺装之路

铺装s路画法Data is a key bet for Intuit as we invest heavily in new customer experiences: a platform to connect experts anywhere in the world with customers and small business owners, a platform that connects to thousands of institutions and aggregates fin…

leetcode421. 数组中两个数的最大异或值(贪心算法)

给你一个整数数组 nums &#xff0c;返回 nums[i] XOR nums[j] 的最大运算结果&#xff0c;其中 0 ≤ i ≤ j < n 。 进阶&#xff1a;你可以在 O(n) 的时间解决这个问题吗&#xff1f; 示例 1&#xff1a; 输入&#xff1a;nums [3,10,5,25,2,8] 输出&#xff1a;28 解…

IBM推全球首个5纳米芯片:计划2020年量产

IBM日前宣布&#xff0c;该公司已取得技术突破&#xff0c;利用5纳米技术制造出密度更大的芯片。这种芯片可以将300亿个5纳米开关电路集成在指甲盖大小的芯片上。 IBM推全球首个5纳米芯片 IBM表示&#xff0c;此次使用了一种新型晶体管&#xff0c;即堆叠硅纳米板&#xff0c;将…

drop sql语句_用于从表中删除数据SQL Drop View语句

drop sql语句介绍 (Introduction) This guide covers the SQL statement for dropping (deleting) one or more view objects.本指南介绍了用于删除(删除)一个或多个视图对象SQL语句。 A View is an object that presents data from one or more tables.视图是显示来自一个或多…

async 和 await的前世今生 (转载)

async 和 await 出现在C# 5.0之后&#xff0c;给并行编程带来了不少的方便&#xff0c;特别是当在MVC中的Action也变成async之后&#xff0c;有点开始什么都是async的味道了。但是这也给我们编程埋下了一些隐患&#xff0c;有时候可能会产生一些我们自己都不知道怎么产生的Bug&…

项目案例:qq数据库管理_2小时元项目:项目管理您的数据科学学习

项目案例:qq数据库管理Many of us are struggling to prioritize our learning as a working professional or aspiring data scientist. We’re told that we need to learn so many things that at times it can be overwhelming. Recently, I’ve felt like there could be …

react 示例_2020年的React Cheatsheet(+真实示例)

react 示例Ive put together for you an entire visual cheatsheet of all of the concepts and skills you need to master React in 2020.我为您汇总了2020年掌握React所需的所有概念和技能的完整视觉摘要。 But dont let the label cheatsheet fool you. This is more than…

leetcode 993. 二叉树的堂兄弟节点

在二叉树中&#xff0c;根节点位于深度 0 处&#xff0c;每个深度为 k 的节点的子节点位于深度 k1 处。 如果二叉树的两个节点深度相同&#xff0c;但 父节点不同 &#xff0c;则它们是一对堂兄弟节点。 我们给出了具有唯一值的二叉树的根节点 root &#xff0c;以及树中两个…

Java之Set集合的怪

工作中可能用Set比较少&#xff0c;但是如果用的时候&#xff0c;出的一些问题很让人摸不着头脑&#xff0c;然后我就看了一下Set的底层实现&#xff0c;大吃一惊。 ###看一个问题 Map map new HashMap();map.put(1,"a");map.put(12,"ab");map.put(123,&q…

为mysql数据库建立索引

前些时候&#xff0c;一位颇高级的程序员居然问我什么叫做索引&#xff0c;令我感到十分的惊奇&#xff0c;我想这绝不会是沧海一粟&#xff0c;因为有成千上万的开发者&#xff08;可能大部分是使用MySQL的&#xff09;都没有受过有关数据库的正规培训&#xff0c;尽管他们都为…

查询数据库中有多少个数据表_您的数据中有多少汁?

查询数据库中有多少个数据表97%. That’s the percentage of data that sits unused by organizations according to Gartner, making up so-called “dark data”.97 &#xff05;。 根据Gartner的说法&#xff0c;这就是组织未使用的数据百分比&#xff0c;即所谓的“ 暗数据…

记录一个Python鼠标自动模块用法和selenium加载网页插件的设置

写爬虫&#xff0c;或者网页自动化&#xff0c;让程序自动完成一些重复性的枯燥的网页操作&#xff0c;是最常见的需求。能够解放双手&#xff0c;空出时间看看手机&#xff0c;或者学习别的东西&#xff0c;甚至还能帮朋友亲戚减轻工作量。 然而&#xff0c;网页自动化代码编写…

和css3实例教程_最好CSS和CSS3教程

和css3实例教程级联样式表(CSS) (Cascading Style Sheets (CSS)) CSS is an acronym for Cascading Style Sheets. It was first invented in 1996, and is now a standard feature of all major web browsers.CSS是层叠样式表的缩写。 它于1996年首次发明&#xff0c;现在已成…

leetcode 1442. 形成两个异或相等数组的三元组数目(位运算)

给你一个整数数组 arr 。 现需要从数组中取三个下标 i、j 和 k &#xff0c;其中 (0 < i < j < k < arr.length) 。 a 和 b 定义如下&#xff1a; a arr[i] ^ arr[i 1] ^ … ^ arr[j - 1] b arr[j] ^ arr[j 1] ^ … ^ arr[k] 注意&#xff1a;^ 表示 按位异…

数据科学与大数据技术的案例_作为数据科学家解决问题的案例研究

数据科学与大数据技术的案例There are two myths about how data scientists solve problems: one is that the problem naturally exists, hence the challenge for a data scientist is to use an algorithm and put it into production. Another myth considers data scient…

AJAX, callback,promise and generator

AJAX with jQuery $.ajax({url:??,type:??,data:??,success: function(){??} //callback,error:function(jqXHR,textStatus,error){??} })think about what AJAX wants from human , AJAX asks questions : tell Me By Which Way You Want To Do Things : —— GET …

Spring-Boot + AOP实现多数据源动态切换

2019独角兽企业重金招聘Python工程师标准>>> 最近在做保证金余额查询优化&#xff0c;在项目启动时候需要把余额全量加载到本地缓存&#xff0c;因为需要全量查询所有骑手的保证金余额&#xff0c;为了不影响主数据库的性能&#xff0c;考虑把这个查询走从库。所以涉…