多维空间可视化_使用GeoPandas进行空间可视化

多维空间可视化

Recently, I was working on a project where I was trying to build a model that could predict housing prices in King County, Washington — the area that surrounds Seattle. After looking at the features, I wanted a way to determine the houses’ worth based on location.

最近,我在一个项目中尝试建立一个可以预测华盛顿金县(西雅图周边地区)房价的模型。 在查看了这些功能之后,我想找到一种根据位置确定房屋价值的方法。

The dataset included latitude and longitude and it was easy to google them to take a look at the houses, their neighborhoods, their distance from the water, etc. But with over 17000 observations, that was a fool’s task. I had to find an easier way.

数据集包括纬度和经度,可以很容易地用谷歌浏览一下房屋,附近,距水的距离等。但是,通过17000多个观察,这是一个傻瓜的任务。 我必须找到一种更简单的方法。

I had used Geographic Information Systems (GIS) only once before but not in Python. So I did what I do best: I googled, and ran into this amazing package called GeoPandas. I am going to let the GeoPandas team sum up what they do because they can say much better than I can.

我以前只使用过一次地理信息系统(GIS),而没有在Python中使用过。 因此,我做了我最擅长的事情:我搜索了Google,并遇到了一个名为GeoPandas的惊人软件包。 我要让GeoPandas团队总结他们所做的事情,因为他们的发言能力比我更好。

GeoPandas is an open source project to make working with geospatial data in python easier. GeoPandas extends the datatypes used by pandas to allow spatial operations on geometric types. Geometric operations are performed by shapely. GeoPandas further depends on fiona for file access and descartes and matplotlib for plotting. — Description from GeoPandas Website (2020)

GeoPandas是一个开源项目,可简化使用python中的地理空间数据的工作。 GeoPandas扩展了熊猫使用的数据类型,以允许对几何类型进行空间操作。 几何运算是通过匀称进行的。 GeoPandas进一步依赖于fiona进行文件访问,并依赖笛卡尔和matplotlib进行绘图。 — GeoPandas网站(2020)的说明

This blew my mind, and what I wanted was really just the most basic of the features. I am going to show you how to run this code and do what I did — plotting accurate points on a map.

这让我大吃一惊,而我想要的实际上只是最基本的功能。 我将向您展示如何运行此代码并完成我的工作-在地图上绘制准确的点。

You are going to need several packages and some files in addition to the basic pandas and matplotlib. They include:

除了基本的pandasmatplotlib外,您还需要几个软件包和一些文件 它们包括:

  • geopandas — the package that makes all of this possible

    geopandas-使所有这些成为可能的软件包
  • shapely — package for manipulation and analysis of planar geometric objects

    匀称 —用于处理和分析平面几何对象的程序包

  • descartes — provides a nicer integration of Shapely geometry objects with Matplotlib. It’s not needed every time but I import it just to be safe

    笛卡尔(笛卡尔) -将Shapely几何对象与Matplotlib更好地集成。 并非每次都需要它,但为了安全起见我将其导入

  • Any .shp file — this is going to be the backdrop of the plot. Mine is going to have King County, but you should be able to find one from any city’s data department. Don’t delete any files from the .zip file it comes in. Something always breaks.

    任何.shp文件-这将是情节的背景。 我的将有金县,但您应该可以从任何城市的数据部门中找到一个。 不要从它所包含的.zip文件中删除任何文件。总有东西会中断。

More information about shapefiles can be found here, but the long and short of it is that these aren’t normal images. They are a vector data storage format that has information linking to locations — coordinates and the rest.

关于shapefile的更多信息可以在这里找到,但总的来说,它们不是正常图像。 它们是矢量数据存储格式,具有链接到位置(坐标和其余位置)的信息。

First I imported the basic packages that I needed and then the new packages:

首先,我导入了所需的基本软件包,然后导入了新软件包:

import matplotlib.pyplot as plt
import numpy as np from shapely.geometry import Point,Polygon
import geopandas as gpd
import descartes

The Point and Polygon features are what help me match my data to the map I make.

多边形功能可以帮助我将数据与我制作的地图进行匹配。

Next, I load in my data. This is basic pandas but for those that are new, everything in quotations is the name of the file I had to access the housing records.

接下来,我加载我的数据。 这是基本的大熊猫,但对于新熊猫,引号中的所有内容都是我必须访问房屋记录的文件的名称。

df = pd.read_csv('kc_house_data_train.csv')

With all of the packages imported and the data ready to go, I wanted to take a look at the map I was going to be plotting. I did this by finding a shape file made by the King County government website. They have done all the hard work of surveying and cataloging the land — it would be rude to not use their freely offered services. Loading in the shape file is easy and comparable to loading in a csv file with pandas.

导入了所有软件包并准备好数据后,我想看一下我要绘制的地图。 我通过查找金县政府网站制作的形状文件来完成此操作。 他们已经完成了土地测量和分类的所有艰苦工作-不使用免费提供的服务是不礼貌的。 加载到shape文件中很容易,并且与使用pandas加载到csv文件中相当。

kings_county = gpd.read_file('*file_path_here*/School_Districts_in_King_County___schdst_area.shp')

You can open this up if you want to take a look at the data. The King County shape file was just a dataframe of locations matched with their school districts, geometry coordinates, and area. But the best part is when we plot it and yes, we have to plot it. This isn’t an image you can just call — it will have the coordinates built in so our data can be placed down like a point on a 5th grade (x,y) graph.

如果要查看数据,可以打开此窗口。 金县形状文件只是与他们的学区,几何坐标和面积相匹配的位置的数据框。 但是最好的部分是当我们绘制它时,是的,我们必须绘制它。 这不是您只能调用的图像-它具有内置的坐标,因此我们的数据可以像5级(x,y)图上的点一样放置。

Using the below code (notice how I edited it the same way I would edit a graph):

使用下面的代码(注意,我以与编辑图形相同的方式对其进行编辑):

fig, ax = plt.subplots(figsize = (15,15))
kings_county.plot(ax=ax)
ax.set_title('King County',fontdict = {'fontsize': 30})
ax.set_ylabel('Latitude',fontdict = {'fontsize': 20})
ax.set_xlabel('Longitude',fontdict = {'fontsize': 20})

My output looked like this:

我的输出看起来像这样:

Image for post
Graphic by Author
图形作者

Before we start adding our housing data we should look at utilizing the shape file to the fullest. Let’s take a look at the file.

在开始添加房屋数据之前,我们应该充分利用形状文件。 让我们看一下文件。

OID  D#  NAME                              geometry
0   1   1   Seattle           MULTIPOLYGON (((-122.40324 47.66637...
1   2   210 Federal Way       POLYGON ((-122.29057 47.39374...
2   3   216 Enumclaw          POLYGON ((-121.84898 47.34708...
3   4   400 Mercer Island     POLYGON ((-122.24475 47.59601...
4   5   401 Highline          POLYGON ((-122.35853 47.51553...- Truncated for clarity

As you can see, the county is divided on school districts — each with a shape used as boundaries. We will now try to plot the shape file and annotate the districts using the data provided like so:

如您所见,该县分为多个学区-每个学区的形状都用作边界。 现在,我们将尝试绘制形状文件并使用提供的数据对区域进行注释,如下所示:

left = ['Riverview','Snoqualmie Valley']
center = ['Skykomish','Kent','Auburn','Tahoma','VashonIsland','Northshore','Shoreline','Renton','Highline','Issaquah','Enumclaw','Seattle','FederalWay','Bellevue','Mercer Island','LakeWashington','Tukwila']
right = ['Fife']
kings_county.plot(figsize = (15,15),cmap = 'gist_earth')
for idx, row in kings_county.iterrows():if row['NAME'] in left:plt.annotate(s=row['NAME'], xy=row['coords'],ha='left', color = 'red')elif row['NAME'] in center:plt.annotate(s=row['NAME'], xy=row['coords'],ha='center', color = 'red')elif row['NAME'] in right:plt.annotate(s=row['NAME'], xy=row['coords'],ha='right', color = 'red')
plt.title('School Districts in Kings County, WA', fontdict = {'fontsize': 20})
plt.ylabel('Latitude',fontdict = {'fontsize': 20})
plt.xlabel('Longitude',fontdict = {'fontsize': 20})

The lists — left, right, center — are from trial and error with the placement of the district names. Some overlapped or needed to be manipulated so that they did not stray too far from their actual district.

列表(左,右,中心)来自地区名称的放置,反复尝试。 有些重叠或需要进行操纵,以使它们不会偏离实际区域。

I’ve changed the color map to gist_earth for clarity. Next, I iterated through each row using the entry in the NAME series, and placing the title at a point that was definitely in the polygon. I aligned the names based on the lists I had made earlier. And this was out output:

为了清楚起见,我将颜色映射更改为gist_earth 。 接下来,我使用NAME系列中的条目遍历每一行,并将标题放置在肯定位于多边形中的点上。 我根据之前的清单排列了名称。 这是输出:

Image for post
School Districts of King County. Graphic by Author
金县学区。 图形作者

Each of the regions signifies a school district in King County. This matches the data I found about the twenty school districts in the county. I never really thought about the size and shape of a county, so I googled it just to be sure.

每个地区都代表金县的学区。 这与我发现的有关该县二十个学区的数据相匹配。 我从来没有真正考虑过一个县的大小和形状,所以我用谷歌搜索只是为了确定。

Image of Washington State with King County highlighted. From Google Maps
Source: Google Maps
资料来源:Google地图

It seemed like the Google Maps image was the perfect hole for my puzzle piece. From here, it was just a matter of formatting my data to fit the shape file. I did that by initiating my coordinate system and creating applicable points using the latitude and longitude of my houses.

似乎Google Maps图像是我的拼图的完美选择。 从这里开始,只需要格式化我的数据以适合形状文件即可。 我通过启动坐标系并使用房屋的纬度和经度来创建适用的点来完成此操作。

crs = {'init': 'epsg:4326'} # initiating my coordinate system
geometry = [Point(x,y) for x,y in zip(df.long,df.lat)] # creating points

If you were to look at an entry in geometry, you only get back that they are shapely objects. They need to be applied to our original dataframe. Below, you can see as I make a brand new dataframe that has the coordinate system built in, the old dataframe, and the addition of the points created by the intersection of the Latitude and Longitude of the houses.

如果要查看几何图形中的条目,您只会发现它们是匀称的对象。 它们需要应用于我们的原始数据框。 在下面,您可以看到当我制作一个全新的数据框时,该数据框内置了坐标系,旧的数据框,并添加了房屋的经度和纬度相交点。

geo_df = gpd.GeoDataFrame(df, # the dataframecrs = crs, # coordinate systemgeometry = geometry) # geometric points

That was the last step before we can plot the houses. Now, we put it all together.

那是我们绘制房屋之前的最后一步。 现在,我们将所有内容放在一起。

fig, ax = plt.subplots(figsize = (15,16))
kings_county.plot(ax=ax, alpha = 0.8, color = 'black')
geo_df.plot(ax = ax , markersize = 2, color = 'blue',marker ='o',label = 'House', aspect = 1)
plt.legend(prop = {'size':10} )
ax.set_title('Houses in Kings County, WA', fontdict = {'fontsize':20})
ax.set_ylabel('Latitude',fontdict = {'fontsize': 20})
ax.set_xlabel('Longitude',fontdict = {'fontsize': 20})

在上面的代码中,步骤包括: (In the code above, the steps include:)

  1. Calling an object to plot.

    调用对象进行绘图。
  2. Plotting the King County shape file.

    绘制金县形状文件。
  3. Plotting the data I made that includes the geometry point.

    绘制我制作的包括几何点的数据。

    This includes making markers, choosing the aspect, and adding the label for the legend.

    这包括制作标记,选择外观以及为图例添加标签。

  4. Adding a legend, title, and axis labels.

    添加图例,标题和轴标签。

These steps were done for each of the graphs.

对每个图形都完成了这些步骤。

Our output:

我们的输出:

Image for post

This is a great product but our goal is to learn something from this visualization. While this gives some information, like the outliers far to the eastern part of the county, it doesn’t give much else. We have to play with parameters. Let’s try splitting the data by price. These are the houses that are listed for less than $750,000.

这是一个很棒的产品,但是我们的目标是从可视化中学习一些东西。 尽管这提供了一些信息,例如该县东部的离群值,但它并没有提供其他信息。 我们必须使用参数。 让我们尝试按价格划分数据。 这些房屋的标价低于750,000美元。

fig, ax = plt.subplots(figsize = (15,25))
kings_county.plot(ax=ax, alpha = 0.8, color = 'black')
geo_df[geo_df['price'] < 750000].plot(ax = ax , markersize = 2,color = 'red',marker = 's',label = 'Price < 750k',aspect = 1.5)
plt.legend(prop = {'size':15} )
ax.set_title('Houses by Price in Kings County, WA', fontdict ={'fontsize': 20})
ax.set_ylabel('Latitude',fontdict = {'fontsize': 20})
ax.set_xlabel('Longitude',fontdict = {'fontsize': 20})
Image for post
Houses priced below $750,000. Graphic by Author
价格低于750,000美元的房屋。 图形作者

Now we graph the houses greater than or equal to $750,000.

现在我们绘制大于或等于750,000美元的房子的图。

fig, ax = plt.subplots(figsize = (15,25))
kings_county.plot(ax=ax, alpha = 0.8, color = 'black')
geo_df[geo_df['price'] >= 750000].plot(ax = ax , markersize = 2,color = 'yellow',marker = 'v',label = 'Price >=750k', aspect = 1.5)
plt.legend(prop = {'size':15})
ax.set_title('Houses by Price in Kings County, WA', fontdict ={'fontsize': 20})
ax.set_ylabel('Latitude',fontdict = {'fontsize': 20})
ax.set_xlabel('Longitude',fontdict = {'fontsize': 20})
Image for post
Houses priced above $750,000. Graphic by Author
价格在750,000美元以上的房屋。 图形作者

There is a big difference in terms of both location and quantity. But that is not the end, we can also layer them one on top of the other. We will be doing the expensive on top of the cheap because it is scarcer.

在位置和数量上都存在很大差异。 但这还没有结束,我们也可以将它们一个接一个地放置。 我们将在便宜的基础上再做昂贵的,因为它稀缺。

fig, ax = plt.subplots(figsize = (15,25))
kings_county.plot(ax=ax, alpha = 0.8, color = 'black')
geo_df[geo_df['price'] < 750000].plot(ax = ax , markersize = 1,color = 'red',marker = 's',label = 'Price <750k = Red', aspect = 1.5)
geo_df[geo_df['price'] >= 750000].plot(ax = ax , markersize = 1,color = 'yellow',marker = 'v',label = 'Price>= 750k = Yellow',aspect = 1.5)
plt.legend(prop = {'size':12})
ax.set_title('Houses by Price in Kings County, WA', fontdict ={'fontsize': 20})
ax.set_ylabel('Latitude',fontdict = {'fontsize': 20})
ax.set_xlabel('Longitude',fontdict = {'fontsize': 20})
Image for post
Side by side comparison. Graphic by Author
并排比较。 图形作者

The picture painted by this map is interesting. There is a plethora of housing in King County that falls below the bar we’ve set. Most of the houses on the lower end of the price scale falls more inland than the more expensive classes.

该地图绘制的图片很有趣。 金县的住房过多,低于我们设定的标准。 价格范围较低端的大多数房屋比昂贵的房屋价格下跌的地区更多。

If you zoom in, the more expensive houses dot the waterside. They also are more centrally located around the Seattle city center. There are several physical outliers but the trend is clear.

如果放大,则较贵的房屋将点缀在水边。 它们还位于西雅图市中心附近的中心位置。 有几个物理异常值,但趋势很明显。

Overall, the visualization has done its job. We have made several determinations from the houses on the map. Pricier houses are collected around the downtown area and spread around Puget Sound. They are also a minority in the data, which could be telling for predicting housing prices. The houses priced on the cheaper side are much more numerous and have a varied location. This will be useful for further EDA.

总体而言,可视化已完成工作。 我们已经从地图上的房屋中做出了一些决定。 价格较高的房屋在市区周围收集,并分布在普吉特海湾附近。 他们也是数据中的少数,这可能有助于预测房价。 价格便宜的房屋数量更多,并且位置各异。 这对于进一步的EDA很有用。

If you want to connect to talk more about this technique, you can find me on LinkedIn. If you would like to check out the code, take a look at my Github.

如果您想联系以更多地谈论这种技术,可以在LinkedIn上找到我。 如果您想查看代码,请查看我的Github 。

资料来源 (Sources)

  • King County Dataset — here

    金县数据集- 此处

    King County Shape File —

    金县形状文件—

    here

    这里

  • Geopandas

    大熊猫

  • Shapely

    匀称

  • Descartes

    笛卡尔

  • Fiona

    菲奥娜

翻译自: https://towardsdatascience.com/using-geopandas-for-spatial-visualization-21e78984dc37

多维空间可视化

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/390912.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

蛮力写算法_蛮力算法解释

蛮力写算法Brute Force Algorithms are exactly what they sound like – straightforward methods of solving a problem that rely on sheer computing power and trying every possibility rather than advanced techniques to improve efficiency.蛮力算法听起来确实像是–…

NoClassDefFoundError和ClassNotFoundException之间有什么区别?是由什么导致的?

问题&#xff1a; NoClassDefFoundError和ClassNotFoundException之间有什么区别?是由什么导致的&#xff1f; NoClassDefFoundError和ClassNotFoundException之前的区别是什么? 是什么导致它们被抛出?这些问题我们要怎么样解决? 当我在为了引入新的jar包而修改现有代码…

关于Tensorflow安装opencv和pygame

1.安装opencv https://www.lfd.uci.edu/~gohlke/pythonlibs/#opencv C:\ProgramData\Anaconda3\Lib\site-packages>pip install opencv_python-3.3.1-cp36-cp36m-win_amd64.whlProcessing c:\programdata\anaconda3\lib\site-packages\opencv_python-3.3.1-cp36-cp36m-win_a…

内置的常用协议实现模版

SuperSocket内置的常用协议实现模版 中文&#xff08;中国&#xff09;Toggle Dropdownv1.6Toggle Dropdown关键字: TerminatorReceiveFilter, CountSpliterReceiveFilter, FixedSizeReceiveFilter, BeginEndMarkReceiveFilter, FixedHeaderReceiveFilter 阅读了前面一篇文档之…

机器学习 来源框架_机器学习的秘密来源:策展

机器学习 来源框架成功的机器学习/人工智能方法 (Methods for successful Machine learning / Artificial Intelligence) It’s widely stated that data is the new oil, and like oil, data needs the right refinement to evolve to be utilised perfectly. The power of ma…

linux gcc 示例_最好的Linux示例

linux gcc 示例Linux is a powerful operating system that powers most servers and most mobile devices. In this guide, we will show you examples of how to use some of its most powerful features. This involves using the Bash command line.Linux是功能强大的操作系…

帆软报表和jeecg的进一步整合--ajax给后台传递map类型的参数

下面是页面代码&#xff1a; <% page language"java" contentType"text/html; charsetUTF-8" pageEncoding"UTF-8"%> <%include file"/context/mytags.jsp"%> <% String deptIds (String)request.getAttribute("…

@Nullable 注解的用法

问题&#xff1a;Nullable 注解的用法 我看到java中的一些方法声明为: void foo(Nullable Object obj){…}在这里Nullable是什么意思?这是不是意味着输入可以为空? 没有这个注解&#xff0c;输入仍然可以是null&#xff0c;所以我猜这不是它的用法? 回答一 它清楚地说明…

WebLogic调用WebService提示Failed to localize、Failed to create WsdlDefinitionFeature

在本地Tomcat环境下调用WebService正常&#xff0c;但是部署到WebLogic环境中&#xff0c;则提示警告&#xff1a;[Failed to localize] MEX0008.PARSING_MDATA_FAILURE<SOAP_1_2 ......警告&#xff1a;[Failed to localize] MEX0008.PARSING_MDATA_FAILURE<SOAP_1_1 ..…

呼吁开放外网_服装数据集:呼吁采取行动

呼吁开放外网Getting a dataset with images is not easy if you want to use it for a course or a book. Yes, there are many datasets with images, but few of them are suitable for commercial or educational use.如果您想将其用于课程或书籍&#xff0c;则获取带有图像…

git push命令_Git Push命令解释

git push命令The git push command allows you to send (or push) the commits from your local branch in your local Git repository to the remote repository.git push命令允许您将提交(或推送 )从本地Git存储库中的本地分支发送到远程存储库。 To be able to push to you…

在Java里面使用Pairs或者二元组

问题&#xff1a;在Java里面使用Pairs或者二元组 在Java里面&#xff0c;我的Hashtable要用到一个元组结构。在Java里面&#xff0c;我可以使用的什么数据结构呢&#xff1f; Hashtable<Long, Tuple<Set<Long>,Set<Long>>> table ...回答一 我不认…

github 搜索技巧

1、关键词 指定开发语言 bitcoin language:javascript 2、关键词 stars 数量 forks 数量 bitcoin stars:>100 forks:>50

React JS 组件间沟通的一些方法

刚入门React可能会因为React的单向数据流的特性而遇到组件间沟通的麻烦&#xff0c;这篇文章主要就说一说如何解决组件间沟通的问题。 1.组件间的关系 1.1 父子组件 ReactJS中数据的流动是单向的&#xff0c;父组件的数据可以通过设置子组件的props传递数据给子组件。如果想让子…

数据可视化分析票房数据报告_票房收入分析和可视化

数据可视化分析票房数据报告Welcome back to my 100 Days of Data Science Challenge Journey. On day 4 and 5, I work on TMDB Box Office Prediction Dataset available on Kaggle.欢迎回到我的100天数据科学挑战之旅。 在第4天和第5天&#xff0c;我将研究Kaggle上提供的TM…

sql limit子句_SQL子句解释的位置:之间,之间,类似和其他示例

sql limit子句什么是SQL Where子句&#xff1f; (What is a SQL Where Clause?) WHERE子句(和/或IN &#xff0c; BETWEEN和LIKE ) (The WHERE Clause (and/or, IN , BETWEEN , and LIKE )) The WHERE clause is used to limit the number of rows returned.WHERE子句用…

在Java里面使用instanceof的性能影响

问题&#xff1a;在Java里面使用instanceof的性能影响 我正在写一个应用程序&#xff0c;其中一种设计方案包含了instanceof操作的大量使用。虽然我知道面向对象设计通常试图避免使用instanceof&#xff0c;但那是另一回事了&#xff0c;这个问题纯粹只是讨论与性能有关。我想…

Soot生成控制流图

1.将soot.jar文件复制到工程bin目录下&#xff1b;2.在cmd中执行如下命令java -cp soot-trunck.jar soot.tools.CFGViewer --soot-classpath .;"%JAVA_HOME%"\jre\lib\rt.jar com.wauoen.paper.classes.Activity其中&#xff0c;JAVA_HOME是jdk目录&#xff1b;com.w…

Centos 6.5安装MySQL-python

报错信息&#xff1a;Using cached MySQL-python-1.2.5.zip Complete output from command python setup.py egg_info: sh: mysql_config: command not found Traceback (most recent call last): File "<string>", line 1, in <module&g…

react 最佳实践_最佳React教程

react 最佳实践React is a JavaScript library for building user interfaces. It was voted the most loved in the “Frameworks, Libraries, and Other Technologies” category of Stack Overflow’s 2017 Developer Survey.React是一个用于构建用户界面JavaScript库。 在S…