python 仪表盘_如何使用Python刮除仪表板

python 仪表盘

Dashboard scraping is a useful skill to have when the only way to interact with the data you need is through a dashboard. We’re going to learn how to scrape data from a dashboard using the Selenium and Beautiful Soup packages in Python. The Selenium package allows you to write Python code to automate web browser interaction, and the Beautiful Soup package allows you to easily pull data from the HTML code that produces the webpage you want to scrape.

当与所需数据进行交互的唯一方法是通过仪表板时,仪表板抓取是一项有用的技能。 我们将学习如何使用Python中的Selenium和Beautiful Soup软件包从仪表板上抓取数据。 Selenium程序包允许您编写Python代码来自动执行Web浏览器交互,而Beautiful Soup程序包则使您可以轻松地从生成您要抓取的网页HTML代码中提取数据。

Our goal is to scrape the Fort Bend County Community Impact Dashboard that visualizes the COVID-19 situation in Fort Bend County in Texas. We will extract the history of total tests performed and the daily case counts reported so that we can estimate the percent of positive cases in Fort Bend County.

我们的目标是刮擦本德堡县社区影响仪表板 ,以可视化方式显示德克萨斯州本德堡县的COVID-19情况。 我们将提取进行的总检测的历史记录和每日报告的病例计数,以便我们可以估算本德堡县阳性病例的百分比。

Note that all of the code in this tutorial is written in Python version 3.6.2.

请注意,本教程中的所有代码都是使用Python 3.6.2版编写的。

步骤1:导入Python软件包,模块和方法 (Step 1: Import Python Packages, Modules, and Methods)

The first step is to import the Python packages, modules, and methods needed for dashboard scraping. The versions of the packages used in this tutorial are listed below.

第一步是导入仪表板抓取所需的Python包,模块和方法。 下面列出了本教程中使用的软件包的版本。

步骤2:抓取HTML源代码 (Step 2: Scrape HTML Source Code)

The next step is to write Python code to automate our interaction with the dashboard. Before writing any code, we must look at the dashboard and inspect its source code to identify the HTML elements that contain the data we need. The dashboard source code refers to the HTML code that tells your browser how to render the dashboard web page. To view the dashboard source code, navigate to the dashboard and use the keyboard shortcut Ctrl+Shift+I. An interactive panel containing the dashboard source code will appear.

下一步是编写Python代码来自动化我们与仪表板的交互。 在编写任何代码之前,我们必须查看仪表板并检查其源代码以识别包含我们所需数据HTML元素。 仪表板源代码是指HTML代码,它告诉您的浏览器如何呈现仪表板网页。 要查看仪表板源代码,请导航至仪表板并使用键盘快捷键Ctrl+Shift+I 将出现一个包含仪表板源代码的交互式面板。

Notice that the history of total tests performed and the daily case counts reported are only visible after clicking the “History” tab in the “Total Numbers of Tests Performed at County Sites” panel and the “Daily Case Count” tab in the “Confirmed Cases” panel, respectively. This means that we need to write Python code that automatically clicks on the “History” and “Daily Case Count” tabs so that the history of total tests performed and the daily case counts reported will be visible to Beautiful Soup.

请注意,仅在单击“县站点执行的测试总数”面板中的“历史记录”选项卡和“已确认案例”中的“每日案例计数”选项卡之后,才可以执行总测试的历史记录和报告的每日案例计数”面板。 这意味着我们需要编写Python代码,该代码自动单击“历史记录”和“每日案例计数”选项卡,以便Beautiful Soup可以看到执行的总测试的历史记录和每日报告的案例计数。

Image for post
Fort Bend County Community Impact Dashboard on July 10th, 2020本德堡县社区影响仪表板

To find the HTML element that contains the “History” tab, use the shortcut Ctrl+Shift+C and then click on the "History" tab. You will see in the source code panel that the "History" tab is in a div element with ID "ember208".

要查找包含“历史记录”选项卡HTML元素,请使用快捷键Ctrl+Shift+C ,然后单击“历史记录”选项卡。 您将在源代码面板中看到“历史记录”选项卡位于ID为“ ember208”的div元素中。

Image for post
History Tab Source Code
历史记录选项卡源代码

Following the same steps for the “Daily Case Count” tab, you will see that the “Daily Case Count” tab is in a div element with ID “ember238”.

按照“每日案件计数”标签的相同步骤,您将看到“每日案件计数”标签位于ID为“ ember238”的div元素中。

Image for post
Source Code of Daily Case Count Tab
每日病例计数选项卡的源代码

Now that we have identified the elements we need, we can write code that:

现在我们已经确定了所需的元素,我们可以编写代码:

  1. Launches the dashboard in Chrome

    在Chrome中启动仪表板
  2. Clicks on the “History” tab once the “History” tab finishes loading

    一旦“历史记录”选项卡完成加载,请单击“历史记录”选项卡
  3. Clicks on the “Daily Case Count” tab once the “Daily Case Count” tab finishes loading

    一旦“每日病例数”选项卡完成加载,请单击“每日病例数”选项卡
  4. Extracts the dashboard HTML source code

    提取仪表板HTML源代码
  5. Exits Chrome

    退出Chrome

步骤3:从HTML解析数据 (Step 3: Parse Data from HTML)

Now, we need to parse the HTML source code to extract the history of total tests performed and the daily case counts reported. We will begin by looking at the dashboard source code to identify the HTML elements that contain the data.

现在,我们需要解析HTML源代码,以提取执行的总测试的历史记录和每日报告的病例数。 我们将从查看仪表板源代码开始,以识别包含数据HTML元素。

To find the div element that contains the history of total tests performed, use the Ctrl+Shift+C shortcut and then click in the general area of the "Testing Sites" plot. You will see in the source code that the entire plot is in the div element with ID "ember96".

要查找包含已执行的全部测试的历史记录的div元素,请使用Ctrl+Shift+C快捷键,然后单击“测试站点”图的常规区域。 您会在源代码中看到整个图位于ID为“ ember96”的div元素中。

Image for post
Source Code of Testing Sites Plot
测试站点图的源代码

If you hover over a specific data point, a label containing the date and number of tests performed will appear. Use the Ctrl+Shift+C shortcut and click on a specific data point. You will see that the label text is stored as the aria-label attribute of a g element.

如果将鼠标悬停在特定数据点上,则会显示一个标签,其中包含执行的测试的日期和数量。 使用Ctrl+Shift+C快捷键,然后单击特定的数据点。 您将看到标签文本存储为g元素的aria-label属性。

Image for post
Source Code of Testing Sites Data Labels
测试站点数据标签的源代码

Following the same steps for the daily case counts reported, you will see that the plot of daily case counts is in the div element with ID “ember143”.

按照报告的每日案件计数的相同步骤,您将看到每日案件计数的图位于ID为“ ember143”的div元素中。

Image for post
Source Code of Daily Cases based on Report Date Plot
基于报告日期图的日常案例源代码

If you hover over a specific data point, a label containing the date and the number of positive cases reported will appear. Using the Ctrl+Shift+C shortcut, you will notice that the data are also stored in the aria-label attribute of g elements.

如果将鼠标悬停在特定数据点上,将显示一个标签,其中包含日期和报告的阳性病例数。 使用Ctrl+Shift+C快捷键,您会注意到数据也存储在g元素的aria-label属性中。

Image for post
Source Code of Daily Cases based on Report Date Data Labels
基于报告日期数据标签的日常案例的源代码

Once we have the elements that contain the data, we can write code that:

一旦有了包含数据的元素,就可以编写代码:

  1. Finds the div element that contains the plot of the total tests performed and pulls the total tests performed data

    查找包含执行的总测试次数的图的div元素,并提取执行的总测试数据
  2. Finds the div element that contains the plot of the daily case counts and pulls the daily case count data

    查找包含每日案件计数图的div元素,并提取每日案件计数数据
  3. Combines the data in a pandas dataframe and exports it to a CSV

    将数据合并到pandas数据框中,并将其导出到CSV

步骤4:计算正率 (Step 4: Calculate Positivity Rate)

Now, we can finally estimate the COVID-19 positivity rate in Fort Bend County. We will divide the cases reported by the tests performed and calculate the 7-day moving averages. It is unclear from the dashboard whether the reported positive cases include cases that were determined through tests not conducted by the county (e.g. tests conducted at a hospital or clinic). It is also unclear when the tests for the positive cases were conducted since the dashboard only displays the reported case date. That is why the positivity rates derived from these data are only considered a rough estimate for the true positivity rate.

现在,我们终于可以估算出本德堡县的COVID-19阳性率。 我们将通过执行的测试报告的案例相除,并计算7天移动平均值。 从仪表板尚不清楚,报告的阳性病例是否包括那些不是由县进行的检测(例如,在医院或诊所进行的检测)确定的病例。 还不清楚何时进行阳性病例的测试,因为仪表板仅显示报告的病例日期。 这就是为什么仅将这些数据得出的阳性率视为真实阳性率的粗略估计。

Image for post

翻译自: https://towardsdatascience.com/how-to-scrape-a-dashboard-with-python-8b088f6cecf3

python 仪表盘

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/388348.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

VS2015 定时服务及控制端

一. 服务端 如下图—新建项目—经典桌面—Windows服务—起名svrr2. 打到server1 改名为svrExecSqlInsert 右击对应的设计界面&#xff0c;添加安装服务目录结构如图 3. svrExecSqlInsert里有打到OnStart()方法开始写代码如下 /// <summary>/// 服务开启操作/// </su…

css文件如何设置scss,Webpack - 如何将scss编译成单独的css文件?

2 个答案:答案 0 :(得分&#xff1a;3)这是我在尝试将css编译成单独文件时使用的webpack.config.js文件|-- App|-- dist|-- src|-- css|-- header.css|-- sass|-- img|-- partials|-- _variables.scss|-- main.scss|--ts|-- tsconfig.json|-- user.ts|-- main.js|-- app.js|-- …

Iphone表视图的简单操作

1.创建一个Navigation—based—Application项目&#xff0c;这样Interface Builder中会自动生成一个Table View&#xff0c;然后将Search Bar拖放到表示图上&#xff0c;以我们要给表示图添加搜索功能&#xff0c;不要忘记将Search Bar的delegate连接到File‘s Owner项&#xf…

PhantomJS的使用

PhantomJS安装下载地址 配置环境变量 成功&#xff01; 转载于:https://www.cnblogs.com/hankleo/p/9736323.html

aws emr 大数据分析_DataOps —使用AWS Lambda和Amazon EMR的全自动,低成本数据管道

aws emr 大数据分析Progression is continuous. Taking a flashback journey through my 25 years career in information technology, I have experienced several phases of progression and adaptation.进步是连续的。 在我25年的信息技术职业生涯中经历了一次闪回之旅&…

21eval 函数

eval() 函数十分强大 ---- 将字符串 当成 有效的表达式 来求职 并 返回计算结果 1 # 基本的数学计算2 print(eval("1 1")) # 23 4 # 字符串重复5 print(eval("* * 5")) # *****6 7 # 将字符串转换成列表8 print(eval("[1, 2, 3, 4]")) # [1,…

联想r630服务器开启虚拟化,整合虚拟化 联想万全R630服务器上市

虚拟化技术的突飞猛进&#xff0c;对运行虚拟化应用的服务器平台的运算性能提出了更高的要求。近日&#xff0c;联想万全R630G7正式对外发布。这款计算性能强劲&#xff0c;IO吞吐能力强大的四路四核服务器&#xff0c;主要面向高端企业级应用而开发。不仅能够完美承载大规模数…

Iphone屏幕旋转

该示例是想在手机屏幕方向发生改变时重新定位视图&#xff08;这里是一个button&#xff09; 1.创建一个View—based Application项目,并在View窗口中添加一个Round Rect Button视图&#xff0c;通过尺寸检查器设置其位置&#xff0c;然后单击View窗口右上角的箭头图标来旋转窗…

先进的NumPy数据科学

We will be covering some of the advanced concepts of NumPy specifically functions and methods required to work on a realtime dataset. Concepts covered here are more than enough to start your journey with data.我们将介绍NumPy的一些高级概念&#xff0c;特别是…

lsof命令详解

基础命令学习目录首页 原文链接&#xff1a;https://www.cnblogs.com/ggjucheng/archive/2012/01/08/2316599.html 简介 lsof(list open files)是一个列出当前系统打开文件的工具。在linux环境下&#xff0c;任何事物都以文件的形式存在&#xff0c;通过文件不仅仅可以访问常规…

Xcode中捕获iphone/ipad/ipod手机摄像头的实时视频数据

目的&#xff1a;打开、关闭前置摄像头&#xff0c;绘制图像&#xff0c;并获取摄像头的二进制数据。 需要的库 AVFoundation.framework 、CoreVideo.framework 、CoreMedia.framework 、QuartzCore.framework 该摄像头捕抓必须编译真机的版本&#xff0c;模拟器下编译不了。 函…

统计和冰淇淋

Photo by Irene Kredenets on UnsplashIrene Kredenets在Unsplash上拍摄的照片 摘要 (Summary) In this article, you will learn a little bit about probability calculations in R Studio. As it is a Statistical language, R comes with many tests already built in it, …

信息流服务器哪种好,选购存储服务器需要注意六大关键因素,你知道几个?

原标题&#xff1a;选购存储服务器需要注意六大关键因素&#xff0c;你知道几个&#xff1f;信息技术的飞速发展带动了整个信息产业的发展。越来越多的电子商务平台和虚拟化环境出现在企业的日常应用中。存储服务器作为企业建设环境的核心设备&#xff0c;在整个信息流中承担着…

t3 深入Tornado

3.1 Application settings 前面的学习中&#xff0c;在创建tornado.web.Application的对象时&#xff0c;传入了第一个参数——路由映射列表。实际上Application类的构造函数还接收很多关于tornado web应用的配置参数。 参数&#xff1a; debug&#xff0c;设置tornado是否工作…

vml编辑器

<HTML xmlns:v> <HEAD> <META http-equiv"Content-Type" content"text/html; Charsetgb2312"> <META name"GENERATOR" content"网络程序员伴侣(Lshdic)2004"> <META name"GENERATORDOWNLOADADDRESS&q…

对数据仓库进行数据建模_确定是否可以对您的数据进行建模

对数据仓库进行数据建模Some data sets are just not meant to have the geospatial representation that can be clustered. There is great variance in your features, and theoretically great features as well. But, it doesn’t mean is statistically separable.某些数…

15 并发编程-(IO模型)

一、IO模型介绍 1、阻塞与非阻塞指的是程序的两种运行状态 阻塞&#xff1a;遇到IO就发生阻塞&#xff0c;程序一旦遇到阻塞操作就会停在原地&#xff0c;并且立刻释放CPU资源 非阻塞&#xff08;就绪态或运行态&#xff09;&#xff1a;没有遇到IO操作&#xff0c;或者通过某种…

arduino消息服务器,在C(Arduino IDE)中将API链接消息解析为服务器(示例代码)

我正在使用Arduino IDE来编程我的微控制器&#xff0c;它有一个内置的Wi-Fi芯片(ESP8266 NodeMCU)&#xff0c;它连接到我的互联网路由器&#xff0c;然后有一个特定的IP(就像192.168.1.5)。所以我想通过添加到链接的消息发送命令(和数据)&#xff0c;然后链接变为&#xff1a;…

不提拔你,就是因为你只想把工作做好

2019独角兽企业重金招聘Python工程师标准>>> 我有个朋友&#xff0c;他30出头&#xff0c;在500强公司做技术经理。他戴无边眼镜&#xff0c;穿一身土黄色的夹克&#xff0c;下面是一条常年不洗的牛仔裤加休闲皮鞋&#xff0c;典型技术高手范。 三 年前&#xff0c;…

python内置函数多少个_每个数据科学家都应该知道的10个Python内置函数

python内置函数多少个Python is the number one choice of programming language for many data scientists and analysts. One of the reasons of this choice is that python is relatively easier to learn and use. More importantly, there is a wide variety of third pa…