COVID-19和世界幸福报告数据告诉我们什么?

For many people, the idea of ​​staying home actually sounded good at first. This process was really efficient for Netflix and Amazon. But then sad truths awaited us. What was boring was the number of dead and intubated patients one after the other. We all know the aftermath well.

对于很多人来说,居家的想法一开始听起来确实不错。 对于Netflix和Amazon,此过程确实非常有效。 但是可悲的事实等待着我们。 无聊的是死者和插管者的数量接连不断增加。 我们都知道后果。

In this article, we will try to examine the covid-19 virus, which can affect all countries in the world, and the relationship between it and the countries explained in the happiness report.

在本文中,我们将尝试检查可能影响世界所有国家的covid-19病毒,以及它与幸福报告中解释的国家之间的关系。

Before we start, let’s get to know our datasets:

在开始之前,让我们了解我们的数据集:

  • ‘covid19_Confirmed_dataset.csv’ (Data include 96 days from the first case)

    'covid19_Confirmed_dataset.csv'(数据包括自第一种情况起的96天)
  • ‘worldwide_happiness_report.csv’

    'worldwide_happiness_report.csv'

And of course the libraries we will use:

当然,我们将使用的库:

import pandas as pd 
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

First of all, we will need a small clearing of our data. We will follow the following procedures to obtain our data frame, where ‘Lat’ and ‘Long’ are removed from the columns:

首先,我们将需要少量清除数据。 我们将按照以下过程获取数据框,其中从列中删除了“ Lat”和“ Long”:

corona_dataset_csv.drop(["Lat", "Long"],axis=1,inplace=True)

And only the country names and the number of cases are shown day by day:

每天仅显示国家名称和案件数:

corona_dataset_aggregated = corona_dataset_csv.groupby("Country/Region").sum()
Image for post
Our first aggregated data frame will look like this.
我们的第一个聚合数据帧将如下所示。

For 3 countries to be shown in the same graphic:

在同一图形中显示3个国家/地区:

corona_dataset_aggregated.loc["China"].plot()
corona_dataset_aggregated.loc["Italy"].plot()
corona_dataset_aggregated.loc["Spain"].plot()plt.legend()
Image for post

We will use the derivative function in order to better observe which periods stand out and the prominent trend points in infected numbers. Of course for this: diff ()

我们将使用导数函数,以便更好地观察哪些时期脱颖而出以及感染人数中的显着趋势点。 当然要这样做:diff()

corona_dataset_aggregated.loc["China"].diff().plot()
Image for post
Max notation for China with diff ()
diff()在中国的最大符号

We add ‘max_infection_rate’ as a new column and refresh our data frame.

我们将“ max_infection_rate”添加为新列,并刷新我们的数据框。

countries = list(corona_dataset_aggregated.index)
max_infection_rates = []
for c in countries : max_infection_rates.append(corona_dataset_aggregated.loc[c].diff().max())max_infection_rates

Meanwhile, we are starting to process our data from the happiness report.To import:

同时,我们开始处理幸福报告中的数据。要导入:

happiness_report_csv = pd.read_csv("worldwide_happiness_report.csv")
Image for post

We extract “Overall rank”, “Score”, “Generosity”, “Perceptions of corruption” from among the columns.

我们从各列中提取“总体排名”,“得分”,“慷慨”,“腐败感”。

useless_cols = ["Overall rank","Score","Generosity","Perceptions of corruption"]

Now we are including “max_infection_rate” in this framework and we are making countries fits for themselves.

现在,我们将“ max_infection_rate”包含在此框架中,并且正在使国家适合自己。

data = corona_data.join(happiness_report_csv,how="inner")
data.head()
Image for post

We will use the corr () function for the correlation matrix:

我们将对相关矩阵使用corr()函数:

data.corr()
Image for post

As you can see this matrix consists of correlation coefficcients off every two columns in our data set.

如您所见,此矩阵由数据集中每两列的相关系数组成。

We have ‘max infection rate’ and ‘GDP per capita’ and this data frame is the correlation coefficient between these two variables. As this value gets higher, it means the correlation between these two variables is also higher.

我们有“最大感染率”和“人均GDP”,并且此数据框是这两个变量之间的相关系数。 随着该值变高,这意味着这两个变量之间的相关性也变高。

If you look at other of life factors, for example social support has the life expectancy and freedom to make life choices is also we can see that we have positive correlations betwen all off other life factors.

如果您查看其他生活因素,例如社会支持具有预期寿命和自由选择生活的机会,那么我们也可以看到我们与其他生活因素之间存在正相关关系。

But our work is not done yet. We know that our Analysis is not finished unless we visualize the results in terms figures and graphs so that everyone can understand what you get out of our analysis.

但是我们的工作还没有完成。 我们知道,除非我们用术语图和图形将结果可视化,以便每个人都可以理解您从分析中得到的结果,否则分析尚未完成。

We found out that there are positive correlation between the max inf rate and all off the life factors that we have in our data set.

我们发现,最大INF比率与数据集中所有寿命因素之间存在正相关。

In this task, i am going to use seaborn module, which is a very handed tool for regionalisation. What we want to do is to plot every each of these columns.

在此任务中,我将使用seaborn模块,这是用于区域化的非常有用的工具。 我们要做的是绘制每个这些列。

x = data["GDP per capita"]
y = data["max_infection_rate"]
sns.scatterplot(x,y)
Image for post

However, it is not possible to examine the graph in detail. So this difference between in X axis and Y access has caused the problem that we cannot enough details in our data. So for so to solve this problem, what we can do is to use log scaling:

但是,无法详细检查图表。 因此,X轴访问和Y轴访问之间的差异导致了问题,即我们的数据中没有足够的细节。 因此,要解决此问题,我们可以做的是使用日志缩放:

x = data["GDP per capita"]
y = data["max_infection_rate"]
sns.scatterplot(x, np.log(y))
Image for post

This is completely shows us as it goes. So this slope, as you can see there is increase. There is a correlation positive.

这完全向我们展示了一切。 如您所见,这个斜率在增加。 存在正相关。

sns.regplot(x, np.log(y))
Image for post

Very clearly there is a positive slope between these two variables (“max inf rate” & “GDP per capita”)

很明显,这两个变量之间存在正斜率(“最大通胀率”和“人均GDP”)

所以 (Consequently)

We have found very interestingresult in this analysis. This result shows that people who are living in developed countries are more prone to getting the infection off Covid-19 with compare off with compared to less developed countries.Can be said that this result is because off lack of corona test kits in less developed countries, in order to prove that this is not the case.

我们在这项分析中发现了非常有趣的结果。 该结果表明,与欠发达国家相比,生活在发达国家的人更容易感染Covid-19,这可以说是因为欠发达国家缺少电晕测试仪,以证明事实并非如此。

Even so i recommend to do the similar analysis on the data said related to cumulative number of the deaths.

即便如此,我还是建议对与死亡总数相关的数据进行类似分析。

See here for more: https://github.com/fk-pixel/Coursera-Project-Network/blob/master/Covid19_DataAnalysis%20.ipynb

有关更多信息,请参见此处: https : //github.com/fk-pixel/Coursera-Project-Network/blob/master/Covid19_DataAnalysis%20.ipynb

翻译自: https://medium.com/think-make/what-does-covid-19-and-world-happiness-report-data-tell-us-c76bdd44b7ac

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/391266.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Python:self理解

Python类 class Student:# 类变量,可以通过类.类变量(Student.classroom)或者实例.类变量(a.classroom)方式调用classroom 火箭班def __init__(self, name, age):# self代表类的实例,self.name name表示当实例化Student时传入的name参数赋值给类的实例…

leetcode 633. 平方数之和(双指针)

给定一个非负整数 c ,你要判断是否存在两个整数 a 和 b,使得 a2 b2 c 。 示例 1: 输入:c 5 输出:true 解释:1 * 1 2 * 2 5 示例 2: 输入:c 3 输出:false 示例 3&…

洛谷 P2919 [USACO08NOV]守护农场Guarding the Farm

题目描述 The farm has many hills upon which Farmer John would like to place guards to ensure the safety of his valuable milk-cows. He wonders how many guards he will need if he wishes to put one on top of each hill. He has a map supplied as a matrix of int…

iOS 开发一定要尝试的 Texture(ASDK)

原文链接 - iOS 开发一定要尝试的 Texture(ASDK)(排版正常, 包含视频) 前言 本篇所涉及的性能问题我都将根据滑动的流畅性来评判, 包括掉帧情况和一些实际体验 ASDK 已经改名为 Texture, 我习惯称作 ASDK 编译环境: MacOS 10.13.3, Xcode 9.2 参与测试机型: iPhone 6 10.3.3, i…

lisp语言是最好的语言_Lisp可能不是数据科学的最佳语言,但是我们仍然可以从中学到什么呢?...

lisp语言是最好的语言This article is in response to Emmet Boudreau’s article ‘Should We be Using Lisp for Data-Science’.本文是对 Emmet Boudreau的文章“我们应该将Lisp用于数据科学”的 回应 。 Below, unless otherwise stated, lisp refers to Common Lisp; in …

链接访问后刷新颜色回到初始_如何使链接可访问(提示:颜色不够)

链接访问后刷新颜色回到初始Link accessibility is one of the most important aspects of usability. However, designers often dont understand what it takes to make links accessible. Most frequently, they only distinguish links by color, which makes it hard for …

567

567 转载于:https://www.cnblogs.com/Forever77/p/11519678.html

leetcode 403. 青蛙过河(dp)

一只青蛙想要过河。 假定河流被等分为若干个单元格,并且在每一个单元格内都有可能放有一块石子(也有可能没有)。 青蛙可以跳上石子,但是不可以跳入水中。 给你石子的位置列表 stones(用单元格序号 升序 表示&#xff…

static、volatile、synchronize

原子性(排他性):不论是多核还是单核,具有原子性的量,同一时刻只能有一个线程来对它进行操作!可见性:多个线程对同一份数据操作,thread1改变了某个变量的值,要保证thread2…

tensorflow基本教程

转载自 http://tensornews.cn/ 转载于:https://www.cnblogs.com/Chris-01/p/11523316.html

1.10-linux三剑客之sed命令详解及用法

内容:1.sed命令介绍2.语法格式,常用功能查询 增加 替换 批量修改文件名第1章 sed是什么字符流编辑器 Stream Editor第2章 sed功能与版本处理出文本文件,日志,配置文件等增加,删除,修改,查询sed --versionsed -i 修改文件内容第3章 语法格式3.1 语法格式sed [选项] [sed指令…

python pca主成分_超越“经典” PCA:功能主成分分析(FPCA)应用于使用Python的时间序列...

python pca主成分FPCA is traditionally implemented with R but the “FDASRSF” package from J. Derek Tucker will achieve similar (and even greater) results in Python.FPCA传统上是使用R实现的,但是J. Derek Tucker的“ FDASRSF ”软件包将在Python中获得相…

blender视图缩放_如何使用主视图类型缩放Elm视图

blender视图缩放A concept to help Elm Views scale as applications grow larger and more complicated.当应用程序变得更大和更复杂时,可帮助Elm Views扩展的概念。 In Elm, there are a lot of great ways to scale the Model, and update, but there is more c…

初探Golang(2)-常量和命名规范

1 命名规范 1.1 Go是一门区分大小写的语言。 命名规则涉及变量、常量、全局函数、结构、接口、方法等的命名。 Go语言从语法层面进行了以下限定:任何需要对外暴露的名字必须以大写字母开头,不需要对外暴露的则应该以小写字母开头。 当命名&#xff08…

789

789 转载于:https://www.cnblogs.com/Forever77/p/11524161.html

sql的split()函数

ALTER function [dbo].[StrToList_Test](Str varchar(max), fg NVARCHAR(200)) returns table table(value nvarchar(max) ) as begindeclare tempStr nvarchar(max),len INT LEN(fg); --去除前后分割符 while substring(Str,1,len)fg beginset Strsubstring(Str,len1,len(S…

大数据平台构建_如何像产品一样构建数据平台

大数据平台构建重点 (Top highlight)Over the past few years, many companies have embraced data platforms as an effective way to aggregate, handle, and utilize data at scale. Despite the data platform’s rising popularity, however, little literature exists on…

初探Golang(3)-数据类型

Go语言拥有两大数据类型,基本数据类型和复合数据类型。 1. 数值类型 ##有符号整数 int8(-128 -> 127) int16(-32768 -> 32767) int32(-2,147,483,648 -> 2,147,483,647) int64&#x…

freecodecamp_freeCodeCamp的服务器到底发生了什么?

freecodecampUpdate at 17:00 California time: We have now fixed most of the problems. Were still working on a few known issues, but /learn is now fully operational.加利福尼亚时间17:00更新 :我们现在解决了大多数问题。 我们仍在处理一些已知问题&#…

为什么Linux下的环境变量要用大写而不是小写

境变量的名称通常用大写字母来定义。实际上用小写字母来定义环境变量也不会报错,只是习惯上都是用大写字母来表示的。 首先说明一下,在Windows下是不区分大小写的,所以在Windows下怎么写都能获取到值。 而Linux下不同,区分大小写&…