华盛顿特区与其他地区的差别_使用华盛顿特区地铁数据确定可获利的广告位置...

华盛顿特区与其他地区的差别

深度分析 (In-Depth Analysis)

Living in Washington DC for the past 1 year, I have come to realize how WMATA metro is the lifeline of this vibrant city. The metro network is enormous and well-connected throughout the DMV area. When I first moved to the Capital city with no car, I often used to hop on the metro to get around. I have always loved train journeys and therefore unsurprisingly, metro became my most favorite way to explore this beautiful city. On my travels, I often notice the product placements and advertisements on metro platforms, near escalators/elevators, inside the metro trains, etc. A good analysis of the metro rider data would help the advertisers to identify which metro stops are the busiest at what times so as to increase the ad exposure. I chanced upon this free dataset and decided to plunge deep into it. In this article, I’ll walk you through my analysis.

在过去的一年中,住在华盛顿特区,我逐渐意识到WMATA地铁是这座充满活力的城市的生命线。 地铁网络非常庞大,并且在DMV区域内连接良好。 当我第一次没有汽车搬到首都时,我经常跳上地铁到处走走。 我一直喜欢火车旅行,因此毫不奇怪,地铁成为我探索这座美丽城市的最喜欢的方式。 在旅途中,我经常注意到地铁站台,自动扶梯/电梯附近,地铁列车内等的产品位置和广告。对地铁乘客数据的良好分析将有助于广告商确定哪些地铁站最繁忙时间,以增加广告曝光率。 我偶然发现了这个免费数据集,并决定深入其中。 在本文中,我将指导您进行分析。

Step 1: Importing necessary libraries

步骤1:导入必要的库

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore")
from wordcloud import WordCloud, STOPWORDS
from nltk.corpus import stopwords

Step 2: Reading the data

步骤2:读取资料

Let us call our pandas dataframe as ‘df_metro’ which will contain the original data.

让我们将熊猫数据框称为“ df_metro”,它将包含原始数据。

df_metro = pd.read_csv("DC MetroData.csv"

Step 3: Eyeballing the data and length of the dataframe

步骤3:查看数据和数据帧的长度

df_metro.head()
Image for post
df_metro.columns
Image for post
len(df_metro)
Image for post

Step 4: Checking distinct values under different columns

步骤4:检查不同列下的不同值

Let us check what are the unique values in the column ‘Time’

让我们检查“时间”列中的唯一值是什么

df_metro['Time'].value_counts().sort_values()
Image for post

Unique values in the column ‘Day’ are as follows:

“天”列中的唯一值如下:

df_metro['Day'].value_counts().sort_values()
Image for post

Next step is to analyze few questions.

下一步是分析一些问题。

Q1。 什么是受欢迎的出入口? (Q1. What are the popular entrances and exits?)

The distinct count of records for each metro stop arranged in descending order will give us which are popular entrances and exits.

每个地铁站按降序排列的独特记录数将为我们提供受欢迎的出入口。

df_metro['Entrance'].value_counts().sort_values(ascending=False).head()
Image for post
df_metro['Exit'].value_counts().sort_values(ascending=False).head()
Image for post

Popular locations seem to be

热门地点似乎

  1. Gallery Place-Chinatown: Major attractions are Capital One Arena (drawing big crowds for sporting events and music concerts), restaurants, bars, etc.

    唐人街画廊广场:主要景点是首都一号竞技场(吸引大量体育赛事和音乐会),餐馆,酒吧等。

  2. Foggy Bottom: Government offices in the area makes it a popular commute destination

    有雾的底部:该地区的政府机关使其成为受欢迎的通勤目的地

  3. Pentagon City: Its location just 2 miles away from the National Mall in downtown Washington makes the area a popular site for hotels and businesses.

    五角大楼市:其位置距华盛顿市中心的国家购物中心仅2英里,使该地区成为酒店和企业的热门地点。

  4. Dupont Circle: International Embassies located in the area

    杜邦环岛:位于该地区的国际使馆

  5. Union Station: An important location for the long-distance travelers

    联合车站:长途旅行者的重要位置

  6. Metro center: A popular downtown location

    地铁中心:市中心热门地点

  7. Fort Totten: Its Metro station serves as a popular transfer point for the Green, Yellow and Red lines

    托滕堡(Fort Totten):其地铁站是绿线,黄线和红线的热门换乘点

Takeaway: Advertisers should target the above popular metro stations that have the high rider footfall to grab maximum buyer attention.

要点:广告商应该针对那些拥有较高人流的热门地铁站,以吸引最大的买家注意力。

Q2。 在一周的不同日期/时间,乘车情况如何? (Q2. What does the ridership look like during different days/times of the week?)

This can be answered by simply plotting the riders’ data across different days and times. We will make use of the seaborn library to create this viz.

只需绘制不同日期和时间的骑手数据即可解决。 我们将利用seaborn库来创建此viz。

sns.set_style("whitegrid") 
ax = sns.barplot(x="Day", y="Riders", hue="Time",
data = df_metro,
palette = "inferno_r")
ax.set(xlabel='Day', ylabel='# Riders')
plt.title("Rider Footfall on different Days/Times")
plt.show(ax)
Image for post

Takeaway: Metro is a popular choice of work commute in the city and therefore, as expected the rider footfall is the highest during the Weekday, particularly more so during AM Peak and PM Peak. Companies planning to roll out new products should target these slots to attract attention and generate interest in the consumers. For advertising opportunities during the weekend, the most attractive time slot seems to be Midday, closely followed by PM Peak.

要点:地铁是城市通勤的一种流行选择,因此,正如预期的那样,乘客的人流量在工作日期间最高,尤其是在AM Peak和PM Peak。 计划推出新产品的公司应针对这些广告位,以吸引注意力并引起消费者的兴趣。 对于周末的广告机会而言,最吸引人的时间段似乎是中午,紧随其后的是PM Peak。

Q3。 在典型的工作日中,哪些繁忙的路线? (Q3. What are the busy routes during a typical weekday?)

To analyze this question, we are going to consider a footfall of more than 500 riders at any given metro station. First, we will create a dataframe ‘busy_routes’ that contain data about routes with >500 riders. Second, we will filter this dataframe to contain data for only ‘AM Peak’. Third, we will sort this filtered output.

为了分析这个问题,我们将考虑在任何给定的地铁站有500多名乘客的人流。 首先,我们将创建一个数据框“ busy_routes”,其中包含有关骑行人数超过500人的数据。 其次,我们将过滤此数据框以仅包含“ AM Peak”的数据。 第三,我们将对过滤后的输出进行排序。

busy_routes = weekday[weekday['Riders']>500][['Merge', 'Time', 'Riders']]
peak_am = busy_routes.query('Time=="AM Peak"')
peak_am.sort_values('Riders').tail()
Image for post

Repeating the same steps for ‘PM Peak’.

对“ PM Peak”重复相同的步骤。

peak_pm = busy_routes.query('Time=="PM Peak"')
len(peak_pm)
peak_pm.sort_values('Riders').tail()
Image for post

Takeaway: We see that the routes with high footfall during AM Peak are the same with high footfall during the PM Peak such as West Falls Church — Farragut West, Vienna-Farragut West, Shady Grove — Farragut North. This tells us that these are the popular work commute routes as people going to work in Farragut during AM peak return to their homes in Vienna/Falls Church/Shady Grove during PM peak. Advertisers should target these high traffic commute routes to maximize on their advertisements and product placements.

要点:我们发现,在AM峰期间人流量大的路线与PM峰期间人流量大的路线相同,例如西瀑布教堂-西法拉格特,西维也纳-法拉古特,谢迪格罗夫-北法拉格特。 这告诉我们,这是最受欢迎的工作通勤路线,因为人们在AM高峰期间在Farragut上班,而在PM高峰期间返回维也纳/ Falls教堂/ Shady Grove的家中。 广告商应针对这些高流量的通勤路线,以最大程度地利用其广告和产品展示位置。

Q4。 周末有哪些热门的地铁路线? (Q4. What are the popular metro routes during the weekends?)

Let us perform a similar analysis as we did for the weekday. Since we are dealing with the weekend data here, we will consider metro stations with a footfall of more than 200 riders.

让我们进行与工作日相似的分析。 由于我们在这里处理周末数据,因此我们将考虑拥有200多名乘客的地铁站。

saturday = df_metro[df_metro['Day']=='Saturday']
busy_routes_sat = saturday[saturday['Riders']>200][['Merge', 'Time', 'Riders']]
busy_routes_sat.sort_values('Riders').tail()
Image for post
sunday = df_metro[df_metro['Day']=='Sunday']
busy_routes_sun = sunday[sunday['Riders']>200][['Merge', 'Time', 'Riders']]
busy_routes_sun.sort_values('Riders').tail()
Image for post

Takeaway: Smithsonian is an extremely popular destination with tourists as well as city-dwellers alike because of several museums and proximity to White House, The Capitol, national monuments, war memorials, etc. Our analysis tells us that the crowds head out from Crystal City, Pentagon City, Vienna, Franconia to the Smithsonian during the Midday, and return in the PM Peak. Most of these crowds are young families with kids which are an ideal audience for companies launching products meant for younger populations including children.

要点:史密森尼博物馆是一个非常受游客和城市居民欢迎的目的地,因为它拥有数个博物馆,而且邻近白宫,国会大厦,国家古迹,战争纪念馆等。我们的分析告诉我们,人群从水晶城出发,五角大楼市,维也纳,弗兰肯行政区到中午的史密森尼博物馆,然后在PM山顶返回。 这些人群中大多数是有孩子的年轻家庭,这是公司推出针对包括儿童在内的年轻人口产品的理想受众。

Q5。 作为广告客户,我应该在“深夜”中定位到哪些位置? (Q5. As an advertiser, which locations should I target during Late Night?)

We will do a similar analysis as above to identify which metro stations are ideal for putting out advertisements late in the night. For the ‘Late Night’, we will consider metro stations with a footfall of >50 riders.

我们将进行与上述类似的分析,以确定哪些地铁站最适合在深夜发布广告。 对于“深夜”,我们将考虑载客量超过50人的地铁站。

late_night = df_metro[df_metro['Day']=='Late Night']
busy_routes_latenight = late_night[late_night['Riders']>50][['Merge', 'Time', 'Riders']]
busy_routes_latenight.sort_values('Riders').tail()
Image for post

Takeaway: We see that late night the riders ride the metro from popular locations such as Gallery Place, Clarendon, Dupont Circle and U Street with a buzzing nightlife. Therefore, advertisers wanting to appeal to this section of the population (which normally would be a younger population) should potentially target these metro stations to grab maximum attention.

要点:我们看到深夜的时候,骑手们从热门场所(如Gallery Place,Clarendon,Dupont Circle和U Street)乘坐地铁,那里的夜生活很热闹。 因此,想要吸引这一部分人群(通常是较年轻的人群)的广告商应该以这些地铁站为目标,以吸引最大的关注。

Closing remarks: This dataset was fairly straightforward and hence, we did not spend a lot of time cleaning and wrangling the data. With the given data, we were able to find sweet spots that would ensure maximum moolah for advertisers’ money. Thanks for reading!

结束语:该数据集非常简单,因此,我们没有花费很多时间来清理和整理数据。 根据给定的数据,我们能够找到最佳点,以确保最大程度地减少广告客户的收入。 谢谢阅读!

翻译自: https://medium.com/@tanmayee92/identify-profitable-advertising-locations-using-washington-dc-metro-data-a03c5c4fc18f

华盛顿特区与其他地区的差别

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/390882.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Windows平台下kafka环境的搭建

近期在搞kafka,在Windows环境搭建的过程中遇到一些问题,把具体的流程几下来防止后面忘了。 准备工作: 1.安装jdk环境 http://www.oracle.com/technetwork/java/javase/downloads/index.html 2.下载kafka的程序安装包: http://kafk…

铺装s路画法_数据管道的铺装之路

铺装s路画法Data is a key bet for Intuit as we invest heavily in new customer experiences: a platform to connect experts anywhere in the world with customers and small business owners, a platform that connects to thousands of institutions and aggregates fin…

IBM推全球首个5纳米芯片:计划2020年量产

IBM日前宣布,该公司已取得技术突破,利用5纳米技术制造出密度更大的芯片。这种芯片可以将300亿个5纳米开关电路集成在指甲盖大小的芯片上。 IBM推全球首个5纳米芯片 IBM表示,此次使用了一种新型晶体管,即堆叠硅纳米板,将…

async 和 await的前世今生 (转载)

async 和 await 出现在C# 5.0之后,给并行编程带来了不少的方便,特别是当在MVC中的Action也变成async之后,有点开始什么都是async的味道了。但是这也给我们编程埋下了一些隐患,有时候可能会产生一些我们自己都不知道怎么产生的Bug&…

项目案例:qq数据库管理_2小时元项目:项目管理您的数据科学学习

项目案例:qq数据库管理Many of us are struggling to prioritize our learning as a working professional or aspiring data scientist. We’re told that we need to learn so many things that at times it can be overwhelming. Recently, I’ve felt like there could be …

react 示例_2020年的React Cheatsheet(+真实示例)

react 示例Ive put together for you an entire visual cheatsheet of all of the concepts and skills you need to master React in 2020.我为您汇总了2020年掌握React所需的所有概念和技能的完整视觉摘要。 But dont let the label cheatsheet fool you. This is more than…

查询数据库中有多少个数据表_您的数据中有多少汁?

查询数据库中有多少个数据表97%. That’s the percentage of data that sits unused by organizations according to Gartner, making up so-called “dark data”.97 %。 根据Gartner的说法,这就是组织未使用的数据百分比,即所谓的“ 暗数据…

数据科学与大数据技术的案例_作为数据科学家解决问题的案例研究

数据科学与大数据技术的案例There are two myths about how data scientists solve problems: one is that the problem naturally exists, hence the challenge for a data scientist is to use an algorithm and put it into production. Another myth considers data scient…

Spring-Boot + AOP实现多数据源动态切换

2019独角兽企业重金招聘Python工程师标准>>> 最近在做保证金余额查询优化,在项目启动时候需要把余额全量加载到本地缓存,因为需要全量查询所有骑手的保证金余额,为了不影响主数据库的性能,考虑把这个查询走从库。所以涉…

leetcode 1738. 找出第 K 大的异或坐标值

本文正在参加「Java主题月 - Java 刷题打卡」&#xff0c;详情查看 活动链接 题目 给你一个二维矩阵 matrix 和一个整数 k &#xff0c;矩阵大小为 m x n 由非负整数组成。 矩阵中坐标 (a, b) 的 值 可由对所有满足 0 < i < a < m 且 0 < j < b < n 的元素…

商业数据科学

数据科学 &#xff0c; 意见 (Data Science, Opinion) “There is a saying, ‘A jack of all trades and a master of none.’ When it comes to being a data scientist you need to be a bit like this, but perhaps a better saying would be, ‘A jack of all trades and …

leetcode 692. 前K个高频单词

题目 给一非空的单词列表&#xff0c;返回前 k 个出现次数最多的单词。 返回的答案应该按单词出现频率由高到低排序。如果不同的单词有相同出现频率&#xff0c;按字母顺序排序。 示例 1&#xff1a; 输入: ["i", "love", "leetcode", "…

数据显示,中国近一半的独角兽企业由“BATJ”四巨头投资

中国的互联网行业越来越有被巨头垄断的趋势。百度、阿里巴巴、腾讯、京东&#xff0c;这四大巨头支撑起了中国近一半的独角兽企业。CB Insights日前发表了题为“Nearly Half Of China’s Unicorns Backed By Baidu, Alibaba, Tencent, Or JD.com”的数据分析文章&#xff0c;列…

Java的Servlet、Filter、Interceptor、Listener

写在前面&#xff1a; 使用Spring-Boot时&#xff0c;嵌入式Servlet容器可以通过扫描注解&#xff08;ServletComponentScan&#xff09;的方式注册Servlet、Filter和Servlet规范的所有监听器&#xff08;如HttpSessionListener监听器&#xff09;。 Spring boot 的主 Servlet…

leetcode 1035. 不相交的线(dp)

在两条独立的水平线上按给定的顺序写下 nums1 和 nums2 中的整数。 现在&#xff0c;可以绘制一些连接两个数字 nums1[i] 和 nums2[j] 的直线&#xff0c;这些直线需要同时满足满足&#xff1a; nums1[i] nums2[j] 且绘制的直线不与任何其他连线&#xff08;非水平线&#x…

SPI和RAM IP核

学习目的&#xff1a; &#xff08;1&#xff09; 熟悉SPI接口和它的读写时序&#xff1b; &#xff08;2&#xff09; 复习Verilog仿真语句中的$readmemb命令和$display命令&#xff1b; &#xff08;3&#xff09; 掌握SPI接口写时序操作的硬件语言描述流程&#xff08;本例仅…

个人技术博客Alpha----Android Studio UI学习

项目联系 这次的项目我在前端组&#xff0c;负责UI&#xff0c;下面简略讲下学到的内容和使用AS过程中遇到的一些问题及其解决方法。 常见UI控件的使用 1.TextView 在TextView中&#xff0c;首先用android:id给当前控件定义一个唯一标识符。在活动中通过这个标识符对控件进行事…

数据科学家数据分析师_站出来! 分析人员,数据科学家和其他所有人的领导和沟通技巧...

数据科学家数据分析师这一切如何发生&#xff1f; (How did this All Happen?) As I reflect on my life over the past few years, even though I worked my butt off to get into Data Science as a Product Analyst, I sometimes still find myself begging the question, …

react-hooks_在5分钟内学习React Hooks-初学者教程

react-hooksSometimes 5 minutes is all youve got. So in this article, were just going to touch on two of the most used hooks in React: useState and useEffect. 有时只有5分钟。 因此&#xff0c;在本文中&#xff0c;我们仅涉及React中两个最常用的钩子&#xff1a; …

分析工作试用期收获_免费使用零编码技能探索数据分析

分析工作试用期收获Have you been hearing the new industry buzzword — Data Analytics(it was AI-ML earlier) a lot lately? Does it sound complicated and yet simple enough? Understand the logic behind models but dont know how to code? Apprehensive of spendi…