COVID-19和世界幸福报告数据告诉我们什么?

For many people, the idea of ​​staying home actually sounded good at first. This process was really efficient for Netflix and Amazon. But then sad truths awaited us. What was boring was the number of dead and intubated patients one after the other. We all know the aftermath well.

对于很多人来说,居家的想法一开始听起来确实不错。 对于Netflix和Amazon,此过程确实非常有效。 但是可悲的事实等待着我们。 无聊的是死者和插管者的数量接连不断增加。 我们都知道后果。

In this article, we will try to examine the covid-19 virus, which can affect all countries in the world, and the relationship between it and the countries explained in the happiness report.

在本文中,我们将尝试检查可能影响世界所有国家的covid-19病毒,以及它与幸福报告中解释的国家之间的关系。

Before we start, let’s get to know our datasets:

在开始之前,让我们了解我们的数据集:

  • ‘covid19_Confirmed_dataset.csv’ (Data include 96 days from the first case)

    'covid19_Confirmed_dataset.csv'(数据包括自第一种情况起的96天)
  • ‘worldwide_happiness_report.csv’

    'worldwide_happiness_report.csv'

And of course the libraries we will use:

当然,我们将使用的库:

import pandas as pd 
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

First of all, we will need a small clearing of our data. We will follow the following procedures to obtain our data frame, where ‘Lat’ and ‘Long’ are removed from the columns:

首先,我们将需要少量清除数据。 我们将按照以下过程获取数据框,其中从列中删除了“ Lat”和“ Long”:

corona_dataset_csv.drop(["Lat", "Long"],axis=1,inplace=True)

And only the country names and the number of cases are shown day by day:

每天仅显示国家名称和案件数:

corona_dataset_aggregated = corona_dataset_csv.groupby("Country/Region").sum()
Image for post
Our first aggregated data frame will look like this.
我们的第一个聚合数据帧将如下所示。

For 3 countries to be shown in the same graphic:

在同一图形中显示3个国家/地区:

corona_dataset_aggregated.loc["China"].plot()
corona_dataset_aggregated.loc["Italy"].plot()
corona_dataset_aggregated.loc["Spain"].plot()plt.legend()
Image for post

We will use the derivative function in order to better observe which periods stand out and the prominent trend points in infected numbers. Of course for this: diff ()

我们将使用导数函数,以便更好地观察哪些时期脱颖而出以及感染人数中的显着趋势点。 当然要这样做:diff()

corona_dataset_aggregated.loc["China"].diff().plot()
Image for post
Max notation for China with diff ()
diff()在中国的最大符号

We add ‘max_infection_rate’ as a new column and refresh our data frame.

我们将“ max_infection_rate”添加为新列,并刷新我们的数据框。

countries = list(corona_dataset_aggregated.index)
max_infection_rates = []
for c in countries : max_infection_rates.append(corona_dataset_aggregated.loc[c].diff().max())max_infection_rates

Meanwhile, we are starting to process our data from the happiness report.To import:

同时,我们开始处理幸福报告中的数据。要导入:

happiness_report_csv = pd.read_csv("worldwide_happiness_report.csv")
Image for post

We extract “Overall rank”, “Score”, “Generosity”, “Perceptions of corruption” from among the columns.

我们从各列中提取“总体排名”,“得分”,“慷慨”,“腐败感”。

useless_cols = ["Overall rank","Score","Generosity","Perceptions of corruption"]

Now we are including “max_infection_rate” in this framework and we are making countries fits for themselves.

现在,我们将“ max_infection_rate”包含在此框架中,并且正在使国家适合自己。

data = corona_data.join(happiness_report_csv,how="inner")
data.head()
Image for post

We will use the corr () function for the correlation matrix:

我们将对相关矩阵使用corr()函数:

data.corr()
Image for post

As you can see this matrix consists of correlation coefficcients off every two columns in our data set.

如您所见,此矩阵由数据集中每两列的相关系数组成。

We have ‘max infection rate’ and ‘GDP per capita’ and this data frame is the correlation coefficient between these two variables. As this value gets higher, it means the correlation between these two variables is also higher.

我们有“最大感染率”和“人均GDP”,并且此数据框是这两个变量之间的相关系数。 随着该值变高,这意味着这两个变量之间的相关性也变高。

If you look at other of life factors, for example social support has the life expectancy and freedom to make life choices is also we can see that we have positive correlations betwen all off other life factors.

如果您查看其他生活因素,例如社会支持具有预期寿命和自由选择生活的机会,那么我们也可以看到我们与其他生活因素之间存在正相关关系。

But our work is not done yet. We know that our Analysis is not finished unless we visualize the results in terms figures and graphs so that everyone can understand what you get out of our analysis.

但是我们的工作还没有完成。 我们知道,除非我们用术语图和图形将结果可视化,以便每个人都可以理解您从分析中得到的结果,否则分析尚未完成。

We found out that there are positive correlation between the max inf rate and all off the life factors that we have in our data set.

我们发现,最大INF比率与数据集中所有寿命因素之间存在正相关。

In this task, i am going to use seaborn module, which is a very handed tool for regionalisation. What we want to do is to plot every each of these columns.

在此任务中,我将使用seaborn模块,这是用于区域化的非常有用的工具。 我们要做的是绘制每个这些列。

x = data["GDP per capita"]
y = data["max_infection_rate"]
sns.scatterplot(x,y)
Image for post

However, it is not possible to examine the graph in detail. So this difference between in X axis and Y access has caused the problem that we cannot enough details in our data. So for so to solve this problem, what we can do is to use log scaling:

但是,无法详细检查图表。 因此,X轴访问和Y轴访问之间的差异导致了问题,即我们的数据中没有足够的细节。 因此,要解决此问题,我们可以做的是使用日志缩放:

x = data["GDP per capita"]
y = data["max_infection_rate"]
sns.scatterplot(x, np.log(y))
Image for post

This is completely shows us as it goes. So this slope, as you can see there is increase. There is a correlation positive.

这完全向我们展示了一切。 如您所见,这个斜率在增加。 存在正相关。

sns.regplot(x, np.log(y))
Image for post

Very clearly there is a positive slope between these two variables (“max inf rate” & “GDP per capita”)

很明显,这两个变量之间存在正斜率(“最大通胀率”和“人均GDP”)

所以 (Consequently)

We have found very interestingresult in this analysis. This result shows that people who are living in developed countries are more prone to getting the infection off Covid-19 with compare off with compared to less developed countries.Can be said that this result is because off lack of corona test kits in less developed countries, in order to prove that this is not the case.

我们在这项分析中发现了非常有趣的结果。 该结果表明,与欠发达国家相比,生活在发达国家的人更容易感染Covid-19,这可以说是因为欠发达国家缺少电晕测试仪,以证明事实并非如此。

Even so i recommend to do the similar analysis on the data said related to cumulative number of the deaths.

即便如此,我还是建议对与死亡总数相关的数据进行类似分析。

See here for more: https://github.com/fk-pixel/Coursera-Project-Network/blob/master/Covid19_DataAnalysis%20.ipynb

有关更多信息,请参见此处: https : //github.com/fk-pixel/Coursera-Project-Network/blob/master/Covid19_DataAnalysis%20.ipynb

翻译自: https://medium.com/think-make/what-does-covid-19-and-world-happiness-report-data-tell-us-c76bdd44b7ac

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/391266.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

iOS 开发一定要尝试的 Texture(ASDK)

原文链接 - iOS 开发一定要尝试的 Texture(ASDK)(排版正常, 包含视频) 前言 本篇所涉及的性能问题我都将根据滑动的流畅性来评判, 包括掉帧情况和一些实际体验 ASDK 已经改名为 Texture, 我习惯称作 ASDK 编译环境: MacOS 10.13.3, Xcode 9.2 参与测试机型: iPhone 6 10.3.3, i…

lisp语言是最好的语言_Lisp可能不是数据科学的最佳语言,但是我们仍然可以从中学到什么呢?...

lisp语言是最好的语言This article is in response to Emmet Boudreau’s article ‘Should We be Using Lisp for Data-Science’.本文是对 Emmet Boudreau的文章“我们应该将Lisp用于数据科学”的 回应 。 Below, unless otherwise stated, lisp refers to Common Lisp; in …

static、volatile、synchronize

原子性(排他性):不论是多核还是单核,具有原子性的量,同一时刻只能有一个线程来对它进行操作!可见性:多个线程对同一份数据操作,thread1改变了某个变量的值,要保证thread2…

1.10-linux三剑客之sed命令详解及用法

内容:1.sed命令介绍2.语法格式,常用功能查询 增加 替换 批量修改文件名第1章 sed是什么字符流编辑器 Stream Editor第2章 sed功能与版本处理出文本文件,日志,配置文件等增加,删除,修改,查询sed --versionsed -i 修改文件内容第3章 语法格式3.1 语法格式sed [选项] [sed指令…

python pca主成分_超越“经典” PCA:功能主成分分析(FPCA)应用于使用Python的时间序列...

python pca主成分FPCA is traditionally implemented with R but the “FDASRSF” package from J. Derek Tucker will achieve similar (and even greater) results in Python.FPCA传统上是使用R实现的,但是J. Derek Tucker的“ FDASRSF ”软件包将在Python中获得相…

初探Golang(2)-常量和命名规范

1 命名规范 1.1 Go是一门区分大小写的语言。 命名规则涉及变量、常量、全局函数、结构、接口、方法等的命名。 Go语言从语法层面进行了以下限定:任何需要对外暴露的名字必须以大写字母开头,不需要对外暴露的则应该以小写字母开头。 当命名&#xff08…

大数据平台构建_如何像产品一样构建数据平台

大数据平台构建重点 (Top highlight)Over the past few years, many companies have embraced data platforms as an effective way to aggregate, handle, and utilize data at scale. Despite the data platform’s rising popularity, however, little literature exists on…

初探Golang(3)-数据类型

Go语言拥有两大数据类型,基本数据类型和复合数据类型。 1. 数值类型 ##有符号整数 int8(-128 -> 127) int16(-32768 -> 32767) int32(-2,147,483,648 -> 2,147,483,647) int64&#x…

时间序列预测 时间因果建模_时间序列建模以预测投资基金的回报

时间序列预测 时间因果建模Time series analysis, discussed ARIMA, auto ARIMA, auto correlation (ACF), partial auto correlation (PACF), stationarity and differencing.时间序列分析,讨论了ARIMA,自动ARIMA,自动相关(ACF),…

(58)PHP开发

LAMP0、使用include和require命令来包含外部PHP文件。使用include_once命令,但是include和include_once命令相比的不足就是这两个命令并不关心请求的文件是否实际存在,如果不存在,PHP解释器就会直接忽略这个命令并且显示一个错误消息&#xf…

css flexbox模型_如何将Flexbox后备添加到CSS网格

css flexbox模型I shared how to build a calendar with CSS Grid in the previous article. Today, I want to share how to build a Flexbox fallback for the same calendar. 在上一篇文章中,我分享了如何使用CSS Grid构建日历。 今天,我想分享如何为…

贝塞尔修正_贝塞尔修正背后的推理:n-1

贝塞尔修正A standard deviation seems like a simple enough concept. It’s a measure of dispersion of data, and is the root of the summed differences between the mean and its data points, divided by the number of data points…minus one to correct for bias.标…

RESET MASTER和RESET SLAVE使用场景和说明【转】

【前言】在配置主从的时候经常会用到这两个语句,刚开始的时候还不清楚这两个语句的使用特性和使用场景。 经过测试整理了以下文档,希望能对大家有所帮助; 【一】RESET MASTER参数 功能说明:删除所有的binglog日志文件,…

Kubernetes 入门(1)基本概念

1. Kubernetes简介 作为一个目前在生产环境已经广泛使用的开源项目 Kubernetes 被定义成一个用于自动化部署、扩容和管理容器应用的开源系统;它将一个分布式软件的一组容器打包成一个个更容易管理和发现的逻辑单元。 Kubernetes 是希腊语『舵手』的意思&#xff0…

android 西班牙_分析西班牙足球联赛(西甲)

android 西班牙The Spanish football league commonly known as La Liga is the first national football league in Spain, being one of the most popular professional sports leagues in the world. It was founded in 1929 and has been held every year since then with …

Goalng软件包推荐

2019独角兽企业重金招聘Python工程师标准>>> 前言 哈喽大家好呀! 马上要迎来狗年了大家是不是已经怀着过年的心情了呢? 今天笔者给大家带来了一份礼物, Goalng的软件包推荐, 主要总结了一下在go语言中大家开源的优秀的软件, 大家了解之后在后续使用过程有遇到如下软…

Kubernetes 入门(2)基本组件

1. C/S架构 Kubernetes 遵循非常传统的客户端服务端架构,客户端通过 RESTful 接口或者直接使用 kubectl 与 Kubernetes 集群进行通信,这两者在实际上并没有太多的区别,后者也只是对 Kubernetes 提供的 RESTful API 进行封装并提供出来。 左侧…

【powerdesign】从mysql数据库导出到powerdesign,生成数据字典

使用版本powerdesign16.5,mysql 5.5,windows 64 步骤: 1.下载mysql驱动【注意 32和64的驱动都下载下来,具体原因查看第三步 依旧会报错处】 下载地址:https://dev.mysql.com/downloads/connector/odbc/5.3.html 请下…

php amazon-s3_推荐亚马逊电影-一种协作方法

php amazon-s3Item-based collaborative and User-based collaborative approach for recommendation system with simple coding.推荐系统的基于项目的协作和基于用户的协作方法,编码简单。 推荐系统概述 (Overview of Recommendation System) There are many met…

python:使用Djangorestframework编写post和get接口

1、安装django pip install django 2、新建一个django工程 python manage.py startproject cainiao_monitor_api 3、新建一个app python manage.py startapp monitor 4、安装DRF pip install djangorestframework 5、编写视图函数 views.py from rest_framework.views import A…