python 仪表盘_如何使用Python刮除仪表板

python 仪表盘

Dashboard scraping is a useful skill to have when the only way to interact with the data you need is through a dashboard. We’re going to learn how to scrape data from a dashboard using the Selenium and Beautiful Soup packages in Python. The Selenium package allows you to write Python code to automate web browser interaction, and the Beautiful Soup package allows you to easily pull data from the HTML code that produces the webpage you want to scrape.

当与所需数据进行交互的唯一方法是通过仪表板时,仪表板抓取是一项有用的技能。 我们将学习如何使用Python中的Selenium和Beautiful Soup软件包从仪表板上抓取数据。 Selenium程序包允许您编写Python代码来自动执行Web浏览器交互,而Beautiful Soup程序包则使您可以轻松地从生成您要抓取的网页HTML代码中提取数据。

Our goal is to scrape the Fort Bend County Community Impact Dashboard that visualizes the COVID-19 situation in Fort Bend County in Texas. We will extract the history of total tests performed and the daily case counts reported so that we can estimate the percent of positive cases in Fort Bend County.

我们的目标是刮擦本德堡县社区影响仪表板 ,以可视化方式显示德克萨斯州本德堡县的COVID-19情况。 我们将提取进行的总检测的历史记录和每日报告的病例计数,以便我们可以估算本德堡县阳性病例的百分比。

Note that all of the code in this tutorial is written in Python version 3.6.2.

请注意,本教程中的所有代码都是使用Python 3.6.2版编写的。

步骤1:导入Python软件包,模块和方法 (Step 1: Import Python Packages, Modules, and Methods)

The first step is to import the Python packages, modules, and methods needed for dashboard scraping. The versions of the packages used in this tutorial are listed below.

第一步是导入仪表板抓取所需的Python包,模块和方法。 下面列出了本教程中使用的软件包的版本。

步骤2:抓取HTML源代码 (Step 2: Scrape HTML Source Code)

The next step is to write Python code to automate our interaction with the dashboard. Before writing any code, we must look at the dashboard and inspect its source code to identify the HTML elements that contain the data we need. The dashboard source code refers to the HTML code that tells your browser how to render the dashboard web page. To view the dashboard source code, navigate to the dashboard and use the keyboard shortcut Ctrl+Shift+I. An interactive panel containing the dashboard source code will appear.

下一步是编写Python代码来自动化我们与仪表板的交互。 在编写任何代码之前,我们必须查看仪表板并检查其源代码以识别包含我们所需数据HTML元素。 仪表板源代码是指HTML代码,它告诉您的浏览器如何呈现仪表板网页。 要查看仪表板源代码,请导航至仪表板并使用键盘快捷键Ctrl+Shift+I 将出现一个包含仪表板源代码的交互式面板。

Notice that the history of total tests performed and the daily case counts reported are only visible after clicking the “History” tab in the “Total Numbers of Tests Performed at County Sites” panel and the “Daily Case Count” tab in the “Confirmed Cases” panel, respectively. This means that we need to write Python code that automatically clicks on the “History” and “Daily Case Count” tabs so that the history of total tests performed and the daily case counts reported will be visible to Beautiful Soup.

请注意,仅在单击“县站点执行的测试总数”面板中的“历史记录”选项卡和“已确认案例”中的“每日案例计数”选项卡之后,才可以执行总测试的历史记录和报告的每日案例计数”面板。 这意味着我们需要编写Python代码,该代码自动单击“历史记录”和“每日案例计数”选项卡,以便Beautiful Soup可以看到执行的总测试的历史记录和每日报告的案例计数。

Image for post
Fort Bend County Community Impact Dashboard on July 10th, 2020本德堡县社区影响仪表板

To find the HTML element that contains the “History” tab, use the shortcut Ctrl+Shift+C and then click on the "History" tab. You will see in the source code panel that the "History" tab is in a div element with ID "ember208".

要查找包含“历史记录”选项卡HTML元素,请使用快捷键Ctrl+Shift+C ,然后单击“历史记录”选项卡。 您将在源代码面板中看到“历史记录”选项卡位于ID为“ ember208”的div元素中。

Image for post
History Tab Source Code
历史记录选项卡源代码

Following the same steps for the “Daily Case Count” tab, you will see that the “Daily Case Count” tab is in a div element with ID “ember238”.

按照“每日案件计数”标签的相同步骤,您将看到“每日案件计数”标签位于ID为“ ember238”的div元素中。

Image for post
Source Code of Daily Case Count Tab
每日病例计数选项卡的源代码

Now that we have identified the elements we need, we can write code that:

现在我们已经确定了所需的元素,我们可以编写代码:

  1. Launches the dashboard in Chrome

    在Chrome中启动仪表板
  2. Clicks on the “History” tab once the “History” tab finishes loading

    一旦“历史记录”选项卡完成加载,请单击“历史记录”选项卡
  3. Clicks on the “Daily Case Count” tab once the “Daily Case Count” tab finishes loading

    一旦“每日病例数”选项卡完成加载,请单击“每日病例数”选项卡
  4. Extracts the dashboard HTML source code

    提取仪表板HTML源代码
  5. Exits Chrome

    退出Chrome

步骤3:从HTML解析数据 (Step 3: Parse Data from HTML)

Now, we need to parse the HTML source code to extract the history of total tests performed and the daily case counts reported. We will begin by looking at the dashboard source code to identify the HTML elements that contain the data.

现在,我们需要解析HTML源代码,以提取执行的总测试的历史记录和每日报告的病例数。 我们将从查看仪表板源代码开始,以识别包含数据HTML元素。

To find the div element that contains the history of total tests performed, use the Ctrl+Shift+C shortcut and then click in the general area of the "Testing Sites" plot. You will see in the source code that the entire plot is in the div element with ID "ember96".

要查找包含已执行的全部测试的历史记录的div元素,请使用Ctrl+Shift+C快捷键,然后单击“测试站点”图的常规区域。 您会在源代码中看到整个图位于ID为“ ember96”的div元素中。

Image for post
Source Code of Testing Sites Plot
测试站点图的源代码

If you hover over a specific data point, a label containing the date and number of tests performed will appear. Use the Ctrl+Shift+C shortcut and click on a specific data point. You will see that the label text is stored as the aria-label attribute of a g element.

如果将鼠标悬停在特定数据点上,则会显示一个标签,其中包含执行的测试的日期和数量。 使用Ctrl+Shift+C快捷键,然后单击特定的数据点。 您将看到标签文本存储为g元素的aria-label属性。

Image for post
Source Code of Testing Sites Data Labels
测试站点数据标签的源代码

Following the same steps for the daily case counts reported, you will see that the plot of daily case counts is in the div element with ID “ember143”.

按照报告的每日案件计数的相同步骤,您将看到每日案件计数的图位于ID为“ ember143”的div元素中。

Image for post
Source Code of Daily Cases based on Report Date Plot
基于报告日期图的日常案例源代码

If you hover over a specific data point, a label containing the date and the number of positive cases reported will appear. Using the Ctrl+Shift+C shortcut, you will notice that the data are also stored in the aria-label attribute of g elements.

如果将鼠标悬停在特定数据点上,将显示一个标签,其中包含日期和报告的阳性病例数。 使用Ctrl+Shift+C快捷键,您会注意到数据也存储在g元素的aria-label属性中。

Image for post
Source Code of Daily Cases based on Report Date Data Labels
基于报告日期数据标签的日常案例的源代码

Once we have the elements that contain the data, we can write code that:

一旦有了包含数据的元素,就可以编写代码:

  1. Finds the div element that contains the plot of the total tests performed and pulls the total tests performed data

    查找包含执行的总测试次数的图的div元素,并提取执行的总测试数据
  2. Finds the div element that contains the plot of the daily case counts and pulls the daily case count data

    查找包含每日案件计数图的div元素,并提取每日案件计数数据
  3. Combines the data in a pandas dataframe and exports it to a CSV

    将数据合并到pandas数据框中,并将其导出到CSV

步骤4:计算正率 (Step 4: Calculate Positivity Rate)

Now, we can finally estimate the COVID-19 positivity rate in Fort Bend County. We will divide the cases reported by the tests performed and calculate the 7-day moving averages. It is unclear from the dashboard whether the reported positive cases include cases that were determined through tests not conducted by the county (e.g. tests conducted at a hospital or clinic). It is also unclear when the tests for the positive cases were conducted since the dashboard only displays the reported case date. That is why the positivity rates derived from these data are only considered a rough estimate for the true positivity rate.

现在,我们终于可以估算出本德堡县的COVID-19阳性率。 我们将通过执行的测试报告的案例相除,并计算7天移动平均值。 从仪表板尚不清楚,报告的阳性病例是否包括那些不是由县进行的检测(例如,在医院或诊所进行的检测)确定的病例。 还不清楚何时进行阳性病例的测试,因为仪表板仅显示报告的病例日期。 这就是为什么仅将这些数据得出的阳性率视为真实阳性率的粗略估计。

Image for post

翻译自: https://towardsdatascience.com/how-to-scrape-a-dashboard-with-python-8b088f6cecf3

python 仪表盘

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/388348.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

VS2015 定时服务及控制端

一. 服务端 如下图—新建项目—经典桌面—Windows服务—起名svrr2. 打到server1 改名为svrExecSqlInsert 右击对应的设计界面&#xff0c;添加安装服务目录结构如图 3. svrExecSqlInsert里有打到OnStart()方法开始写代码如下 /// <summary>/// 服务开启操作/// </su…

Iphone表视图的简单操作

1.创建一个Navigation—based—Application项目&#xff0c;这样Interface Builder中会自动生成一个Table View&#xff0c;然后将Search Bar拖放到表示图上&#xff0c;以我们要给表示图添加搜索功能&#xff0c;不要忘记将Search Bar的delegate连接到File‘s Owner项&#xf…

aws emr 大数据分析_DataOps —使用AWS Lambda和Amazon EMR的全自动,低成本数据管道

aws emr 大数据分析Progression is continuous. Taking a flashback journey through my 25 years career in information technology, I have experienced several phases of progression and adaptation.进步是连续的。 在我25年的信息技术职业生涯中经历了一次闪回之旅&…

先进的NumPy数据科学

We will be covering some of the advanced concepts of NumPy specifically functions and methods required to work on a realtime dataset. Concepts covered here are more than enough to start your journey with data.我们将介绍NumPy的一些高级概念&#xff0c;特别是…

lsof命令详解

基础命令学习目录首页 原文链接&#xff1a;https://www.cnblogs.com/ggjucheng/archive/2012/01/08/2316599.html 简介 lsof(list open files)是一个列出当前系统打开文件的工具。在linux环境下&#xff0c;任何事物都以文件的形式存在&#xff0c;通过文件不仅仅可以访问常规…

统计和冰淇淋

Photo by Irene Kredenets on UnsplashIrene Kredenets在Unsplash上拍摄的照片 摘要 (Summary) In this article, you will learn a little bit about probability calculations in R Studio. As it is a Statistical language, R comes with many tests already built in it, …

信息流服务器哪种好,选购存储服务器需要注意六大关键因素,你知道几个?

原标题&#xff1a;选购存储服务器需要注意六大关键因素&#xff0c;你知道几个&#xff1f;信息技术的飞速发展带动了整个信息产业的发展。越来越多的电子商务平台和虚拟化环境出现在企业的日常应用中。存储服务器作为企业建设环境的核心设备&#xff0c;在整个信息流中承担着…

t3 深入Tornado

3.1 Application settings 前面的学习中&#xff0c;在创建tornado.web.Application的对象时&#xff0c;传入了第一个参数——路由映射列表。实际上Application类的构造函数还接收很多关于tornado web应用的配置参数。 参数&#xff1a; debug&#xff0c;设置tornado是否工作…

对数据仓库进行数据建模_确定是否可以对您的数据进行建模

对数据仓库进行数据建模Some data sets are just not meant to have the geospatial representation that can be clustered. There is great variance in your features, and theoretically great features as well. But, it doesn’t mean is statistically separable.某些数…

15 并发编程-(IO模型)

一、IO模型介绍 1、阻塞与非阻塞指的是程序的两种运行状态 阻塞&#xff1a;遇到IO就发生阻塞&#xff0c;程序一旦遇到阻塞操作就会停在原地&#xff0c;并且立刻释放CPU资源 非阻塞&#xff08;就绪态或运行态&#xff09;&#xff1a;没有遇到IO操作&#xff0c;或者通过某种…

不提拔你,就是因为你只想把工作做好

2019独角兽企业重金招聘Python工程师标准>>> 我有个朋友&#xff0c;他30出头&#xff0c;在500强公司做技术经理。他戴无边眼镜&#xff0c;穿一身土黄色的夹克&#xff0c;下面是一条常年不洗的牛仔裤加休闲皮鞋&#xff0c;典型技术高手范。 三 年前&#xff0c;…

python内置函数多少个_每个数据科学家都应该知道的10个Python内置函数

python内置函数多少个Python is the number one choice of programming language for many data scientists and analysts. One of the reasons of this choice is that python is relatively easier to learn and use. More importantly, there is a wide variety of third pa…

C#使用TCP/IP与ModBus进行通讯

C#使用TCP/IP与ModBus进行通讯1. ModBus的 Client/Server模型 2. 数据包格式及MBAP header (MODBUS Application Protocol header) 3. 大小端转换 4. 事务标识和缓冲清理 5. 示例代码 0. MODBUS MESSAGING ON TCP/IP IMPLEMENTATION GUIDE 下载地址&#xff1a;http://www.modb…

Hadoop HDFS常用命令

1、查看hdfs文件目录 hadoop fs -ls / 2、上传文件 hadoop fs -put 文件路径 目标路径 在浏览器查看:namenodeIP:50070 3、下载文件 hadoop fs -get 文件路径 保存路径 4、设置副本数量 -setrep 转载于:https://www.cnblogs.com/chaofan-/p/9742633.html

SAP UI 搜索分页技术

搜索分页技术往往和另一个术语Lazy Loading&#xff08;懒加载&#xff09;联系起来。今天由Jerry首先介绍S/4HANA&#xff0c;CRM Fiori和S4CRM应用里的UI搜索分页的实现原理。后半部分由SAP成都研究院菜园子小哥王聪向您介绍Twitter的懒加载实现。 关于王聪的背景介绍&#x…

万彩录屏服务器不稳定,万彩录屏 云服务器

万彩录屏 云服务器 内容精选换一换内网域名是指仅在VPC内生效的虚拟域名&#xff0c;无需购买和注册&#xff0c;无需备案。云解析服务提供的内网域名功能&#xff0c;可以让您在VPC中拥有权威DNS&#xff0c;且不会将您的DNS记录暴露给互联网&#xff0c;解析性能更高&#xf…

针对数据科学家和数据工程师的4条SQL技巧

SQL has become a common skill requirement across industries and job profiles over the last decade.在过去的十年中&#xff0c;SQL已成为跨行业和职位描述的通用技能要求。 Companies like Amazon and Google will often demand that their data analysts, data scienti…

全排列算法实现

版权声明&#xff1a;本文为博主原创文章&#xff0c;未经博主允许不得转载。 https://blog.csdn.net/summerxiachen/article/details/605796231.全排列的定义和公式&#xff1a; 从n个数中选取m&#xff08;m<n&#xff09;个数按照一定的顺序进行排成一个列&#xff0c;叫…

14.并发容器之ConcurrentHashMap(JDK 1.8版本)

1.ConcurrentHashmap简介 在使用HashMap时在多线程情况下扩容会出现CPU接近100%的情况&#xff0c;因为hashmap并不是线程安全的&#xff0c;通常我们可以使用在java体系中古老的hashtable类&#xff0c;该类基本上所有的方法都采用synchronized进行线程安全的控制&#xff0c;…

服务器虚拟化网口,服务器安装虚拟网口

服务器安装虚拟网口 内容精选换一换Atlas 800 训练服务器(型号 9010)安装上架、服务器基础参数配置、安装操作系统等操作请参见《Atlas 800 训练服务器 用户指南 (型号9010)》。Atlas 800 训练服务器(型号 9010)适配操作系统如表1所示。请参考表2下载驱动和固件包。Atlas 800 训…