交互式和非交互式
Python中的Visual EDA (Visual EDA in Python)
I like to learn about different tools and technologies that are available to accomplish a task. When I decided to explore data regarding COVID-19 (Coronavirus), I knew that I would want the ability to present visualizations interactively. After all, the Coronavirus pandemic is tracked, monitored, and reported daily, from all over the world. Data science and analysis projects that involve temporal data lend themselves well to interactive plotting and timeline animation.
我喜欢学习可用于完成任务的不同工具和技术。 当我决定探索有关COVID-19(冠状病毒)的数据时,我知道我希望能够以交互方式呈现可视化效果。 毕竟,每天跟踪,监视和报告来自世界各地的冠状病毒大流行。 涉及时间数据的数据科学和分析项目非常适合交互式绘图和时间线动画。
To support the desired interactive capabilities, notebooks for this project were composed in Deepnote, an online, Jupyter-style environment that enables the publishing of complete Python notebooks that retain interactive outputs. The Plotly Express library was used to produce interactive plot objects. Finally, the embedding of those individual visualizations in this article is made possible by the Datapane library for Python.
为了支持所需的交互功能,该项目的笔记本由Deepnote (一种在线Jupyter风格的环境)组成,可以发布保留交互输出的完整Python笔记本。 Plotly Express库用于生成交互式绘图对象。 最后,通过Python的Datapane库,可以在本文中嵌入这些单独的可视化文件。
This article presents a brief overview of the project, including the following.
本文简要介绍了该项目,包括以下内容。
- Motivations for the project 项目动机
- Methods of investigation 调查方法
- Summary highlights and representative, interactive plots 摘要亮点和代表性的互动情节
Note: While this article includes interactive examples of cell outputs from project notebooks, we will not be demonstrating any code. You can, however, find links to the related repository on Github, linked below.
注意:虽然本文包括项目笔记本中单元输出的交互式示例,但我们不会演示任何代码。 不过,您可以在Github上找到指向相关存储库的链接,如下所示。
概述和动机 (Overview and Motivation)
Effective July 1, 2020, the state of Virginia entered the third phase of the “Forward Virginia” plan to gradually ease restrictions in place for COVID-19. On July 28, additional restrictions were imposed on restaurants and bars in the Hampton Roads area of Southeastern Virginia (Schneider, Gregory S., Virginia governor adds restrictions in Hampton Roads region after surge in coronavirus cases (July 28, 2020). The Washington Post.).
从2020年7月1日起,弗吉尼亚州进入“ Forward Virginia”计划的第三阶段,以逐步放宽对COVID-19的限制。 7月28日,对东南弗吉尼亚州汉普顿路地区的餐馆和酒吧施加了额外的限制( 弗吉尼亚州州长施奈德,格雷戈里S. 在冠状病毒病例激增之后 (2020年7月28日) 在汉普顿路地区增加了限制 。 )。
This project is inspired in part by a subsequent interest in comparing the severity of later outbreaks, in the Hampton Roads region, with the number and proportion of cases in other areas of the state. In other words, in areas where cases, hospitalizations, or deaths were decreasing, were they higher or lower than in lately restricted areas?
该项目的部分灵感来自于后来的兴趣,即比较汉普顿路地区后来爆发的严重程度与该州其他地区的病例数量和比例。 换句话说,在病例,住院或死亡人数减少的地区,它们比最近限制的地区高还是低?
Of course, the goal of the project was not to perform a full, medical study. Along with comparing aggregated case data for various localities, the project was strongly motivated by an interest in exploring the options we employ to publish relatively simple-but-informative, animated plots.
当然,该项目的目标不是进行完整的医学研究。 除了比较各个地区的汇总案例数据外,该项目还受到了对探索我们用来发布相对简单但内容丰富的动画情节的选择的兴趣的强烈推动。
数据集 (The Datasets)
Coronavirus data for this exploration is sourced from the Virginia Department of Health (VDH). The particular copy of the Virginia public COVID-19 cases dataset used in this repository was last updated on July 30, 2020. VDH is itself a robust source of data and visualizations related to this health crisis. Their dataset continues to be updated regularly.
此次勘探的冠状病毒数据来自弗吉尼亚卫生署 (VDH)。 此存储库中使用的弗吉尼亚州公共COVID-19病例数据集的特定副本最近一次更新是在2020年7月30日。VDH本身是与该健康危机相关的数据和可视化的可靠来源。 他们的数据集将继续定期更新。
Each row in the dataset represents the overall count of COVID-19 cases, hospitalizations, and deaths for each locality in Virginia by report date since reporting began.
自报告开始以来,按报告日期,数据集中的每一行代表弗吉尼亚州每个地区的COVID-19病例,住院和死亡总数。
As we progress through the project, we bring in population data for additional context and insight.
随着项目的进展,我们会引入人口数据以获取更多背景信息和见解。
Population estimates data was sourced from the University of Virginia’s Weldon Cooper Center for Public Service Demographics Research Group, published on January 27, 2020. The group notes that estimates are population approximations “based on a variety of observed administrative record data, such as births, deaths, school enrollment, and residential housing construction.” The above-linked site happens to include a handy, interactive map that highlights a relevant row of population data as the cursor moves over the relevant locality segment.
人口估算数据来自弗吉尼亚大学韦尔顿·库珀公共服务人口统计研究中心,该研究组于2020年1月27日发布。该组指出,估算值是“基于各种观察到的行政记录数据(例如出生,死亡,入学率和住宅建设。” 上面链接的站点碰巧包括一个方便的交互式地图,当光标移到相关位置区域上时,该地图突出显示了相关的人口数据行。
方法 (Methods)
To gauge how the Hampton Roads numbers compare to other areas of Virginia, such as the state’s capital city of Richmond, this study primarily investigates data using interactive plotting. This approach enables visualization of data for multiple localities on a single figure, with the option to hover a cursor over the plot for detail.
为了评估汉普顿公路的数量与弗吉尼亚州其他地区(例如该州的首府里士满)的比较,该研究主要使用交互式绘图调查数据。 这种方法可以在单个图形上可视化多个位置的数据,并可以选择将光标悬停在图形上以获取详细信息。
The covered time period spans between two-and-four months. We include a few static plots, for the ten localities with the highest reported numbers in each statistical area; but expecting readers to take-in multiple measures for multiple areas over 60–120 days, using only static plots, seemed like an unrealistic ask. Using interactive plots will help viewers quickly understand how the data changes over time or easily isolate features of the dataset at a particular point, within the context of a broader time frame.
涵盖的时间跨度为两到四个月。 对于每个统计区域中报告的数字最高的十个地区,我们包括一些静态图; 但是,希望读者在60-120天之内仅使用静态图表,对多个区域采取多种措施,似乎是不切实际的要求。 使用交互式绘图将帮助查看者快速了解数据随时间的变化,或在较宽的时间范围内轻松隔离特定点的数据集特征。
The project is not a predictive analysis. Instead, it serves a comparative purpose for a limited subset of relevant data. Of course, it is topical, as we move into the 2020–2021 school year and take into account the precautions required for a safe and effective educational environment.
该项目不是预测分析。 相反,它仅对相关数据的有限子集起到比较作用。 当然,这是热门话题,因为我们进入2020-2021学年,并考虑到安全有效的教育环境所需的预防措施。
观察结果 (Observations)
Let’s review some of our project discoveries:
让我们回顾一下我们的一些项目发现:
- Cases in some Northern Virginia localities exceeded those in Southeastern Virginia localities, many times over. The Fairfax locality, to the west of Washington, D.C., exceeds Southeastern Virginia localities in total cases, hospitalizations, and deaths throughout our timeframe. Total hospitalizations in Fairfax between the middle of March and the end of July 2020, number 138,320. Chesapeake’s total for the same period is 11,378. 北弗吉尼亚州某些地区的病例数比维吉尼亚州东南部地区的病例数高出许多倍。 在整个时间范围内,华盛顿特区以西的费尔法克斯地区在总病例,住院和死亡人数方面均超过弗吉尼亚东南地区。 截至3月中旬至2020年7月底,费尔法克斯的住院总人数为138,320。 切萨皮克在同一时期的总数为11,378。
- For a more balanced comparison, we narrow our broad, preliminary view to focus on the state capital of Richmond as it compares to select independent cities and counties of the Hampton Roads region. 为了更平衡地进行比较,我们将广义的初步观点缩小为集中在里士满州首府,因为它与汉普顿路地区的选定独立城市和县进行了比较。
Note: Each of the following plot animations may be played by selecting the triangle at the start of the timeline.
注意:可以通过选择时间轴开始处的三角形来播放以下每个情节动画 。
- Among the localities of interest, Richmond led in total cases from March through July, when it then was surpassed by Norfolk and Virginia Beach. 在感兴趣的地区中,从3月到7月,里士满(Richmond)领导着所有案件,随后被诺福克(Norfolk)和弗吉尼亚海滩(Virginia Beach)超越。
- An animated plot highlights that Richmond presented a greater number of hospitalizations due to Coronavirus, even as Virginia Beach eventually surpassed it for related cases and deaths. 动画情节突出显示,即使弗吉尼亚海滩因相关病例和死亡最终超过了里希蒙,也由于冠状病毒而使里士满住院的人数增加了。
- Similarly, Richmond reports a larger proportion of hospitalization and mortality per 1,000 of the population than each of the other localities, by the end of our timeline. 同样,到我们的时间表结束时,里士满报告的每千人中住院和死亡率的比例高于其他每个地方。
演练之前的后退 (A Step Back Before the Walkthrough)
We will break here.
我们将在这里休息。
This article previewed our process for working with Pandas datasets in Deepnote’s online, interactive notebook environment. We also explored using Plotly and Datapane, to create interactive plots that we were then able to embed in this article.
本文预览了我们在Deepnote的在线交互式笔记本环境中使用Pandas数据集的过程。 我们还探索了如何使用Plotly和Datapane创建交互式图,然后将其嵌入到本文中。
In addition to interactive deployment, the full project benefits from the following:
除了交互式部署,整个项目还可以从以下方面受益:
- The merging of multiple data sources into Pandas dataframes 将多个数据源合并到Pandas数据框中
- Transformation of raw data, for comparison as a proportion of the population 转换原始数据,以便在总人口中进行比较
- The ability to be time-scaled and to limit or expand location scope 具有时间缩放能力以及限制或扩展位置范围的能力
Though we avoided the use of interactive choropleth maps in this project, Plotly offers significant potential for including additional, geospatial analysis using state-or-county-level maps, lat./lon. coordinates, and/or geoJSON data
尽管在该项目中我们避免使用交互式的弧度图,但Plotly 具有很大的潜力 ,可以使用州/县级地图(纬度/经度)进行其他地理空间分析。 坐标和/或geoJSON数据
You can follow me, here, to be notified when I publish new articles. In the meantime, you can find code and links to interactive notebooks available on my Github repository.
您可以在这里关注我,以便在我发表新文章时得到通知。 同时,您可以在我的Github 存储库中找到代码和指向交互式笔记本的链接。
翻译自: https://medium.com/the-innovation/publishing-interactive-plots-86a637c9fb74
交互式和非交互式
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389969.shtml
如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!