人口密度可视化_使用GeoPandas可视化菲律宾的人口密度

人口密度可视化

GeoVisualization /菲律宾。 (GeoVisualization /Philippines.)

Population density is a crucial concept in urban planning. Theories on how it affects economic growth are divided. Some claim, as Rappaport does, that an economy is a form of “spatial equilibrium”: that net flows of residents and employment gradually move to be balanced with one another.

人口密度是城市规划中的关键概念。 关于它如何影响经济增长的理论存在分歧。 就像拉帕波特所做的那样,有人声称经济是“空间均衡”的一种形式: 居民和就业的净流动逐渐走向相互平衡。

The thought that density has some sort of relationship with economic growth has long been established by multiple studies. But whether the same theory holds for the Philippines and to what predates what (density follows urban development or urban development follows density) is a classic data science problem.

关于密度与经济增长之间存在某种关系的观点早已由多项研究确立。 但是,对于菲律宾来说,是否适用相同的理论以及先于什么(密度跟随城市发展,密度跟随城市发展)是一个经典的数据科学问题。

Before we can test out any models, however, let’s do a fun exercise and visualize our dataset.

但是,在测试任何模型之前,让我们做一个有趣的练习并使数据集可视化。

The 2015 Philippines’ Population Dataset

2015年菲律宾的人口数据集

The Philippine Statistic Authority publishes population data every five (5) years. At the time of the writing, only the 2015 Dataset is published so we will be using this.

菲律宾统计局每五(5)年发布一次人口数据。 在撰写本文时,仅发布了2015年数据集,因此我们将使用它。

Importing Packages

导入包

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.colors as colors #to customize our colormap for legendimport numpy as np
import seaborn as sns; sns.set(style="ticks", color_codes=True)import geopandas as gpd
import descartes #important for integrating Shapely Geometry with the Matplotlib Library
import mapclassify #You will need this to implement a Choropleth
import geoplot #You will need this to implement a Choropleth%matplotlib inline

A lot of the packages we will be using needs to be installed. For those having trouble installing GeoPandas, check out my article about this. Note that geoplot requires cartopy package and can be installed as any dependencies discussed in my article.

我们将要使用的许多软件包都需要安装。 对于那些在安装GeoPandas时遇到麻烦的人,请查看有关此的文章 。 请注意,geoplot需要cartopy软件包,并且可以作为本文中讨论的任何依赖项进行安装。

Loading Shapefiles

加载Shapefile

Shapefiles are needed to create “shape” to your geographical or political boundaries.

需要Shapefile来为您的地理或政治边界创建“形状”。

Download the shapefile and load it using GeoPandas.

下载shapefile并使用GeoPandas加载它。

An important note here when extracting the zip package: all the contents should be in one folder, even though you will simply be using the “.shp” file or else it won’t work. (this means that the “.cpg”, “.dbf”, “.prj” and so forth should be in the same location as your “.shp” file.

解压缩zip包时的重要注意事项:所有内容都应放在一个文件夹中,即使您只是使用“ .shp”文件,否则它也将不起作用。 (这意味着“ .cpg”,“。dbf”,“。prj”等应与“ .shp”文件位于同一位置。

You can download the shapefile of the Philippines in gadm.org (https://gadm.org/).

您可以在gadm.org( https://gadm.org/ )中下载菲律宾的shapefile。

Note: You can likewise download the shapefiles from: PhilGIS (http://philgis.org/). It will probably be better for Philippine data though some of it is sourced with GADM, but let’s go with GADM as I have more experience in it.

注意:您也可以从以下位置下载shapefile:PhilGIS( http://philgis.org/ )。 尽管其中一些数据来自GADM,但对于菲律宾数据而言可能会更好一些,但是随着我对GADM的更多了解,让我们开始吧。

#The level of adminsitrative boundaries are given by 0 to 3; the details and boundaries get more detailed as the level increasecountry = gpd.GeoDataFrame.from_file("Shapefiles/gadm36_PHL_shp/gadm36_PHL_0.shp")
provinces = gpd.GeoDataFrame.from_file("Shapefiles/gadm36_PHL_shp/gadm36_PHL_1.shp")
cities = gpd.GeoDataFrame.from_file("Shapefiles/gadm36_PHL_shp/gadm36_PHL_2.shp")
barangay = gpd.GeoDataFrame.from_file("Shapefiles/gadm36_PHL_shp/gadm36_PHL_3.shp")

At this point, you can view the shapefiles and examine the boundaries. You can do this by plotting the shapefiles.

此时,您可以查看shapefile并检查边界。 您可以通过绘制shapefile来实现。

#the GeoDataFrame of pandas has built-in plot which we can use to view the shapefilefig, axes = plt.subplots(2,2, figsize=(10,10));#Color here refers to the fill-color of the graph while 
#edgecolor refers to the line colors (you can use words, hex values but not rgb and rgba)country.plot(ax=axes[0][0], color='white', edgecolor = '#2e3131');
provinces.plot(ax=axes[0][1], color='white', edgecolor = '#2e3131');
cities.plot(ax=axes[1][0], color='white', edgecolor = '#2e3131');
barangay.plot(ax=axes[1][1], color='white', edgecolor = '#555555');#Adm means administrative boundaries level - others refer to this as "political boundaries"
adm_lvl = ["Country Level", "Provincial Level", "City Level", "Barangay Level"]
i = 0
for ax in axes:
for axx in ax:
axx.set_title(adm_lvl[i])
i = i+1
axx.spines['top'].set_visible(False)
axx.spines['right'].set_visible(False)
axx.spines['bottom'].set_visible(False)
axx.spines['left'].set_visible(False)
Image for post
Darker fills imply more boundaries
较深的填充表示更多的边界

Load Population Density Data

负荷人口密度数据

Population data and Density per SQ Kilometers are usually collected by the Philippine Statistics Authority (PSA).

人口数据和每SQ公里的密度通常由菲律宾统计局(PSA)收集。

You can do this with other demographics or macroeconomic data as the Philippines have been advancing on the provision of these. (Good Job Philippines!)

您可以使用其他人口统计数据或宏观经济数据来做到这一点,因为菲律宾一直在提供这些数据。 (菲律宾好工作!)

Because we want to amp up the challenge, let’s go with the most detailed one: the city and municipality level.

因为我们要应对挑战,所以让我们来探讨最详细的挑战:城市和市政级别。

We first load the data and examine it:

我们首先加载数据并检查它:

df = pd.read_excel(r'data\2015 Population Density.xlsx',
header=1,
skipfooter=25,
usecols='A,B,D,E',
names=["City", 'Population', "landArea_sqkms", "Density_sqkms"])

Cleaning the Data

清理数据

Before we can proceed, we have to clean our data. We should:

在继续之前,我们必须清除数据。 我们应该:

  • drop rows with empty values

    删除具有空值的行
  • remove non-alphabet characters after the names (* denoting footnotes)

    删除名称后的非字母字符(*表示脚注)
  • remove the words “(capital)” and “excluding” after each city name

    在每个城市名称后删除“(大写)”和“排除”
  • remove leading and trailing spaces

    删除前导和尾随空格
  • and many more….

    还有很多…。

Cleaning really will take the bulk of the work when merging data with shapefiles.

将数据与shapefile合并时,清理确实会占用大量工作。

This is true for the Philippines, which have cities that are named similarly after one another. (e.g. San Isidro, San Juan, San Pedro, etc).

对于菲律宾来说,这是正确的,因为菲律宾的城市彼此之间有着相似的名字。 (例如,圣伊西德罗,圣胡安,圣佩德罗等)。

Let’s skip this part in the article but for those who would like to know how I did it, visit my Github repository. The code will apply to any PSA data on a municipality/city level.

让我们跳过本文的这一部分,但是对于那些想知道我是如何做到的,请访问我的Github存储库 。 该代码将适用于市政/城市级别的任何PSA数据。

Exploratory Data Analysis

探索性数据分析

One of my favorite way to implement EDA is through a scatter plot. Let’s do it just to see which cities have high densities in chart form.

我最喜欢的实现EDA的方法之一是通过散点图。 让我们来看一下图表中哪些城市的人口密度高。

Matplotlib is workable but I like the style of seaborn plots so I prefer to use it more often.

Matplotlib是可行的,但是我喜欢海洋情节的风格,因此我更喜欢使用它。

#First sort the dataframe according to Density from highest to lowest
sorted_df = df.sort_values("Density_sqkms", ascending=False,ignore_index=True )[:50]fig, ax = plt.subplots(figsize=(10,15));
scatter = sns.scatterplot(x=df.index, y=df.Density_sqkms)#Labeling the top 20 data (limiting so it won't get too cluttered)
#First sort the dataframe according to Density from highest to lowest
sorted_df = df.sort_values("Density_sqkms", ascending=False)[:20]#Since text annotations,overlap for something like this, let's import a library that adjusts this automatically
from adjustText import adjust_texttexts = [ax.text(p[0], p[1],"{},{}".format(sorted_df.City.loc[p[0]], round(p[1])),
size='large') for p in zip(sorted_df.index, sorted_df.Density_sqkms)];adjust_text(texts, arrowprops=dict(arrowstyle="->", color='r', lw=1), precision=0.01)
Image for post
Scatter plot of densities for Philippines’ Cities and Municipalities. You can visually see that there are cities that are outliers in terms of density. Note that I made use of the library adjust_texts to make sure that labels are legible.
菲律宾城市的密度散点图。 您可以从视觉上看到在密度上有些城市是离群值。 请注意,我利用了adjust_texts库来确保标签清晰易读。

With this chart, we can already see which cities are above the average of “Nationa Capital Region”, namely, Mandaluyong, Pasay, Caloocan, Navotas, Makati, Malabon, and Marikina.

通过此图表,我们已经可以看到哪些城市位于“国家首都地区”的平均水平之上,即曼达卢永帕赛卡卢奥坎纳沃塔斯马卡蒂马拉本马利基纳

Within the top 20 as well, we can see that most of these cities are located in the “National Capital Region” and nearby provinces such as Laguna. Notice as well how the city of Manila is an outlier for this dataset.

同样在前20名中,我们可以看到这些城市中的大多数都位于“国家首都地区”和附近的省份,例如拉古纳。 还要注意,马尼拉市是该数据集的离群值。

GeoPandas Visualization

GeoPandas可视化

The First Law of Geography, according to Waldo Tobler, is “everything is related to everything else, but near things are more related than distant things.”

根据沃尔多· 托伯勒 (Waldo Tobler)的说法, “地理第一定律”是“所有事物都与其他事物相关,但近处的事物比远处的事物更相关”。

This is why in real estate, it is important to examine and visualize, how proximity affects values. Ultimately, GeoVisualization is one of the ways we can do this.

这就是为什么在房地产中,重要的是检查和可视化邻近性如何影响价值。 最终,GeoVisualization是我们执行此操作的方法之一。

We can already visualize our data using the builtin plot method of GeoPandas.

我们已经可以使用GeoPandas的内置绘图方法来可视化我们的数据。

k = 1600 #I find that the more colors, the smoother the viz becomes as data points are spread across gradients
cmap = 'Blues'
figsize=(12,12)
scheme= 'Quantiles'ax = merged_df.plot(column='Density_sqkms', cmap=cmap, figsize=figsize,
scheme=scheme, k=k, legend=False)ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)#Adding Colorbar for legibility
# normalize color
vmin, vmax, vcenter = merged_df.Density_sqkms.min(), merged_df.Density_sqkms.max(), merged_df.Density_sqkms.mean()
divnorm = colors.TwoSlopeNorm (vmin=vmin, vcenter=vcenter, vmax=vmax)# create a normalized colorbar
cbar = plt.cm.ScalarMappable(norm=divnorm, cmap=cmap)
fig.colorbar(cbar, ax=ax)
# plt.show()
Image for post
Visualization using the built-in plot method of GeoPandas
使用GeoPandas的内置绘图方法进行可视化

Some analysts prefer monotonic colormaps such as Blues or Greens, but when data is highly-skewed (having many outliers), I find it is better to use diverging colormaps.

一些分析人员更喜欢单调的颜色图,例如蓝色或绿色,但是当数据高度偏斜(具有许多离群值)时,我发现使用分散的颜色图更好。

Image for post
Diverging Colormaps to Visualize Data Dispersion
分散颜色图以可视化数据分散

Using diverging colormaps, we can visualize the dispersion of density values. Even looking at the colorbar legend indicates how density values in the Philippines contain outliers on the high side.

使用发散的颜色图,我们可以可视化密度值的分散。 即使查看色标图例,也表明菲律宾的密度值如何包含较高的离群值。

Plotting using Geoplot

使用Geoplot进行绘图

In addition to the built-in plot function of GeoPandas, you can plot this using geoplot.

除了GeoPandas的内置绘图功能外,您还可以使用geoplot对其进行绘图。

k = 1600
cmap = 'Greens'
figsize=(12,12)
scheme= 'Quantiles'geoplot.choropleth(
merged_df, hue=merged_df.Density_sqkms, scheme=scheme,
cmap=cmap, figsize=figsize
)

In the next series, let’s try to plot this more interactively or use some machine learning algorithms to extract more insights.

在下一个系列中,让我们尝试以更具交互性的方式进行绘制,或者使用一些机器学习算法来提取更多的见解。

For the full code, check out my Github repository.

有关完整的代码,请查看我的Github存储库 。

The code to preprocess data on the municipality and city level applies to other PSA reported statistics as well.

预处理市政和城市级别数据的代码也适用于PSA报告的其他统计数据。

Let me know what dataset you would like for us to try and visualize in the future.

让我知道您希望我们将来尝试并可视化的数据集。

翻译自: https://towardsdatascience.com/psvisualizing-the-philippines-population-density-using-geopandas-ab8190f52ed1

人口密度可视化

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/388867.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Unity - Humanoid设置Bip骨骼导入报错

报错如下: 解决: 原因是biped骨骼必须按照Unity humanoid的要求设置,在max中设置如下: 转载于:https://www.cnblogs.com/CloudLiu/p/10746052.html

Kubernetes - - k8s - v1.12.3 OpenLDAP统一认证

1,基本概念 为了方便管理和集成jenkins,k8s、harbor、jenkins均使用openLDAP统一认证。2,部署openLDAP 根据之前的文档,openLDAP使用GFS进行数据持久化。下载对应的openLDAP文件git clone https://github.com/xiaoqshuo/k8s-clust…

srpg 胜利条件设定_英雄联盟获胜条件

srpg 胜利条件设定介绍 (Introduction) The e-sports community has been growing rapidly in the past few years, and what used to be a casual pastime has morphed into an industry projected to generate $1.8 B in revenue by 2022. While there are many video games …

机器学习 综合评价_PyCaret:机器学习综合

机器学习 综合评价Any Machine Learning project journey starts with loading the dataset and ends (continues ?!) with the finalization of the optimum model or ensemble of models for predictions on unseen data and production deployment.任何机器学习项目的旅程都…

silverlight 3D 游戏开发

http://www.postvision.net/SilverMotion/DemoTech.aspx silverlight 3D 游戏开发 时间:2010-10-22 06:33来源:开心银光 作者:黎东海 点击: 562次意外发现一个silverlight的实时3D渲染引擎。性能比开源那些强很多。 而且支持直接加载maya,3Dmax等主流3D模型文件。 附件附上它的…

皮尔逊相关系数 相似系数_皮尔逊相关系数

皮尔逊相关系数 相似系数数据科学和机器学习统计 (STATISTICS FOR DATA SCIENCE AND MACHINE LEARNING) In the last post, we analyzed the relationship between categorical variables and categorical and continuous variables. In this case, we will analyze the relati…

Kubernetes持续交付-Jenkins X的Helm部署

Jenkins X 是一个集成化的 CI / CD 平台,可用于 部署在Kubernetes集群或云计算中心。支持在云计算环境下简单地开发和部署应用。本项目是在Kubernetes上的安装支持工具集。 本工具集中包含: Jenkins - 定制好的流水线和运行环境,完全整合CI/C…

中国石油大学(华东)暑期集训--二进制(BZOJ5294)【线段树】

问题 C: 二进制 时间限制: 1 Sec 内存限制: 128 MB提交: 8 解决: 2[提交] [状态] [讨论版] [命题人:]题目描述 pupil发现对于一个十进制数,无论怎么将其的数字重新排列,均不影响其是不是3的倍数。他想研究对于二进制,是否也有类似的性质。于…

Java 8 新特性之Stream API

1. 概述 1.1 简介 Java 8 中有两大最为重要的改革,第一个是 Lambda 表达式,另外一个则是 Stream API(java.util.stream.*)。 Stream 是 Java 8 中处理集合的关键抽象概念,它可以指定你希望对集合进行的操作&#xff0c…

Ubuntu中NS2安装详细教程

前言: NS2是指 Network Simulator version 2,NS(Network Simulator) 是一种针对网络技术的源代码公开的、免费的软件模拟平台,研究人员使用它可以很容易的进行网络技术的开发,而且发展到今天,它…

14.vue路由脚手架

一.vue路由:https://router.vuejs.org/zh/ 1、定义 let router new VueRouter({mode:"history/hash",base:"基本路径" 加一些前缀 必须在history模式下有效linkActiveClass:"active", 范围选择linkExactActiveClass:"exact&qu…

linux-buff/cache过大导致内存不足-程序异常

2019独角兽企业重金招聘Python工程师标准>>> 问题描述 Linux内存使用量超过阈值,使得Java应用程序无可用内存,最终导致程序崩溃。即使在程序没有挂掉时把程序停掉,系统内存也不会被释放。 找原因的过程 这个问题已经困扰我好几个月…

Android 适配(一)

一、Android适配基础参数1.常见分辨率(px)oppx 2340x1080oppR15 2280x1080oppor11sp 2160*10801080*1920 (主流屏幕16:9)1080*216018:9 手机主流分辨率: 1080*2160高端 16:9 手机主流分辨率: 1080P (1080*1920) 或 2K …

Source Insight 创建工程(linux-2.6.22.6内核源码)

1. 软件设置 安装完Source Insight,需要对其进行设置添加对“.S”汇编文件的支持: 2. 新建linux-2.6.22.6工程 1)选择工程存放的路径: 2)下载linux-2.6.22.6内核源码,并解压。在Source Insight中 指定源码的…

课时20:内嵌函数和闭包

目录: 一、global关键字 二、内嵌函数 三、闭包 四、课时20课后习题及答案 ******************** 一、global关键字 ******************** 全局变量的作用域是整个模块(整个代码段),也就是代码段内所有的函数内部都可以访问到全局…

盛严谨,严谨,再严谨。_评估员工调查的统计严谨性

盛严谨,严谨,再严谨。The human resources industry relies heavily on a wide range of assessments to support its functions. In fact, to ensure unbiased and fair hiring practices the US department of labor maintains a set of guidelines (Uniform Guidelines) to …

开根号的笔算算法图解_一个数的开根号怎么计算

一个数的开根号怎么计算2020-11-08 15:46:47文/钟诗贺带根号的式子可以直接进行开平方的运算。一些特殊的根号运算有;√2≈1.414、1/2-√3≈0.5-1.732≈-1.232、2√5≈22.236≈4.236、√7-√6≈2.646-2.449≈0.197。开平方的笔算方法1.将被开方数的整数部分从个位起…

arima 预测模型_预测未来:学习使用Arima模型进行预测

arima 预测模型XTS对象 (XTS Objects) If you’re not using XTS objects to perform your forecasting in R, then you are likely missing out! The major benefits that we’ll explore throughout are that these objects are a lot easier to work with when it comes to …

bigquery_在BigQuery中链接多个SQL查询

bigqueryBigquery is a fantastic tool! It lets you do really powerful analytics works all using SQL like syntax.Bigquery是一个很棒的工具! 它使您能够使用像语法一样SQL来进行真正强大的分析工作。 But it lacks chaining the SQL queries. We cannot run …

大理石在哪儿 (Where is the Marble?,UVa 10474)

题目描述&#xff1a;算法竞赛入门经典例题5-1 1 #include <iostream>2 #include <algorithm>3 using namespace std;4 int maxn 10000 ;5 int main()6 {7 int n,q,a[maxn] ,k0;8 while(scanf("%d%d",&n,&q)2 && n &&q…