基于plotly数据可视化_[Plotly + Datashader]可视化大型地理空间数据集

基于plotly数据可视化

简介(我们将创建的内容): (Introduction (what we’ll create):)

Unlike the previous tutorials in this map-based visualization series, we will be dealing with a very large dataset in this tutorial (about 2GB of lat, lon coordinates). We will learn how to use the Datashader library to convert this data into a pixel-density raster, which can be superimposed on a Mapbox base-map to create cool visualizations. The image below shows what you will create by the end of this tutorial.

与本基于地图的可视化系列文章中的先前教程不同,本教程将处理非常大的数据集(约2GB的经纬度坐标)。 我们将学习如何使用Datashader库将该数据转换为像素密度栅格,该栅格可以叠加在Mapbox底图上以创建出色的可视化效果。 下图显示了本教程结束时将创建的内容。

本教程的结构: (Structure of the tutorial:)

The tutorial is structured into the following sections:

本教程分为以下几节:

  1. Pre-requisites

    先决条件

  2. About Datashader

    关于Datashader

  3. Getting started with the tutorial

    教程入门

  4. When to use this library

    何时使用此库

先决条件: (Pre-requisites:)

This tutorial assumes that you are familiar with python and that you have python downloaded and installed in your machine. If you are not familiar with python but have some experience of programming in some other languages, you may still be able to follow this tutorial, depending on your proficiency.

本教程假定您熟悉python,并且已在计算机中下载并安装了python。 如果您不熟悉python,但有一些使用其他语言进行编程的经验,那么您仍然可以根据自己的熟练程度来学习本教程。

It is very strongly recommended that you go through the Plotly tutorial before going through this tutorial. In this tutorial, the installation of plotly and the concepts covered in the Plotly tutorial will not be repeated.

强烈建议您先阅读Plotly教程,然后再进行本教程。 在本教程中,不会重复安装plotly和Plotly教程中涵盖的概念。

Also, you are strongly encouraged to go through the ‘About Mapbox’ section in the [Plotly + Mapbox] Interactive Choropleth visualization tutorial. We will not repeat that section here, but it is very much a part of this tutorial.

另外,强烈建议您阅读[Plotly + Mapbox] Interactive Choropleth可视化教程中的“关于Mapbox”部分。 我们不会在这里重复该部分,但这是本教程的大部分内容。

关于Datashader: (About Datashader:)

Quoting the official Datashader website,

引用Datashader官方网站 ,

Datashader is a graphics pipeline system for creating meaningful representations of large datasets quickly and flexibly

Datashader是一个图形管道系统,用于快速,灵活地创建大型数据集的有意义的表示形式

In layman terms, datashader converts the millions of lat-lon coordinates into a pixel-density map. Say you have a million lat-lon coordinates bound between latitudes [x,y] and longitudes [a,b]. Now, you create a 100x100 pixels image with the corners corresponding to the extreme lat-lon pairs. So you now have a total of 10,000 pixels. Each pixel corresponds to a physical tile of say 100 sq. km. (actual area will depend on the values of x,y,a,b). Now, if tile1 has 100 lat-lon coordinates within it and tile2 has 1000 coordinates, tile2 has a coordinate density 10 times higher than tile 1. Thus, the pixel corresponding to tile2 will be 10 times brighter than the pixel corresponding to tile1. So essentially, a million lat-lon coordinates now get converted into 10,000 pixel-density mappings. Essentially, the coordinates have been converted into a raster image. This is what makes datashader so powerful.

用外行术语来说,数据着色器将数百万个纬度坐标转换为像素密度图。 假设您在纬度[x,y]和经度[a,b]之间绑定了一百万个纬度坐标。 现在,您创建一个100x100像素的图像,其角对应于极端纬度对。 因此,您现在总共有10,000个像素。 每个像素对应于例如100平方公里的物理图块。 (实际面积取决于x,y,a,b的值)。 现在,如果tile1中具有100个纬度坐标,而tile2中具有1000个坐标,则tile2的坐标密度将比tile 1高10倍。因此,与tile2对应的像素将比与tile1对应的像素亮10倍。 因此从本质上讲,现在可以将一百万个纬度坐标转换为10,000个像素密度映射。 实质上,坐标已转换为光栅图像。 这就是使datashader如此强大的原因。

安装数据着色器: (Installing datashader:)

If you are using Anaconda,

如果您正在使用Anaconda,

conda install datashader

Else, you can use the pip installer:

另外,您可以使用pip安装程序:

pip install datashader

See the Getting Started guide on the datashader website for more information.

有关更多信息,请参见datashader网站上的《 入门指南》 。

教程入门: (Getting started with the tutorial:)

GitHub repo: https://github.com/carnot-technologies/MapVisualizations

GitHub回购: https : //github.com/carnot-technologies/MapVisualizations

Relevant notebook: DatashaderDemo.ipynb

相关笔记本: DatashaderDemo.ipynb

View notebook on NBViewer: Click Here

在NBViewer上查看笔记本: 单击此处

导入相关软件包: (Import relevant packages:)

import dask.dataframe as dd
import datashader as ds
import plotly.express as px

Note the import of dask.dataframe instead of pandas. Because we are dealing with a large dataset, dask will be much faster than pandas. For perspective, the .read_csv() operation takes 19 seconds with pandas and less than a second with dask. Click here to know more about why dask is preferred for large datasets. The gist is that dask utilizes all the cores on your machine, which pandas is unable to do.

注意dask.dataframe而不是pandas的导入。 由于我们要处理的是大型数据集,因此dask的速度将比pandas快得多。 出于透视考虑,.read_csv()操作使用熊猫需要19秒,而使用dask则不到一秒。 单击此处以了解更多关于为什么dask是大型数据集首选的原因。 要点是,dask可以利用计算机上的所有内核,而pandas则无法做到。

导入和清除数据: (Import and clean data:)

Since the relevant CSV for this tutorial is about 2 GB large (74 million + coordinates), it was not possible to host this on GitHub. It can be downloaded from this Google Drive link. It is recommended that you download this file and save it into your data folder. Once that is done, you can simply import it like any other CSV.

由于本教程的相关CSV大小约为2 GB(7400万个+坐标),因此无法在GitHub上托管。 可以从此Google云端硬盘链接下载。 建议您下载此文件并将其保存到数据文件夹中。 完成后,您可以像导入其他CSV一样简单地导入它。

Note: Make sure that you don’t have any other heavy software open when you are loading this dataset, especially if your RAM is comparable to the file size.

注意:加载此数据集时,请确保没有打开任何其他繁琐的软件,尤其是在您的RAM与文件大小相当的情况下。

df = dd.read_csv('data/lat_lon_data.csv')

Now, we will perform some basic cleaning of the data. Since our region of interest is India, we will make sure that all coordinates outside the lat-lon bounds of India are excluded.

现在,我们将对数据进行一些基本清理。 由于我们的关注区域是印度,因此我们将确保排除印度经纬度范围以外的所有坐标。

#Remove any unwanted columns
df = df[['latitude','longitude']]#Clean data, remove any out of bounds points
df = df[df['latitude'] > 6]
df = df[df['latitude'] < 38]
df = df[df['longitude'] > 68]
df = df[df['longitude'] < 98]

创建数据着色器画布: (Creating the datashader canvas:)

cvs = ds.Canvas(plot_width=1000, plot_height=1000)
agg = cvs.points(df, x='longitude', y='latitude')
# agg is an xarray object, see http://xarray.pydata.org/en/stable/coords_lat, coords_lon = agg.coords['latitude'].values, agg.coords['longitude'].values# Corners of the image, which need to be passed to mapbox
coordinates = [[coords_lon[0], coords_lat[0]],
[coords_lon[-1], coords_lat[0]],
[coords_lon[-1], coords_lat[-1]],
[coords_lon[0], coords_lat[-1]]]

We have created a 1000 x 1000 canvas cvs . Next, we projected the longitude and latitude from the dataframe onto the canvas, using cvs.points. Then we fetch the projected coordinates and determine the corner points for the image.

我们创建了一个1000 x 1000的画布cvs 。 接下来,我们使用cvs.points将数据cvs.points的经度和纬度投影到画布上。 然后,我们获取投影坐标并确定图像的角点。

Now that we have the canvas ready, let us define the colormap for the visualization. We will use the hot colormap. You can use other alternatives, like fire, or any other color map of your choice.

现在我们已经准备好画布,让我们为可视化定义颜色图。 我们将使用hot表。 您可以使用其他替代方法,例如火或您选择的任何其他颜色图。

from matplotlib.cm import hot
import datashader.transfer_functions as tf
img=(tf.shade(agg, cmap = hot, how='log'))[::-1].to_pil()#pil stands for Python Image Library

A couple of things to note here. We are using a transfer function to shade the projected coordinates, using the hot colormap. We have specified the mapping methodology as log. This is to ensure that even the low-intensity points get represented adequately in the visualization. If we chose the linear mapping, then the high intensity points completely overshadow the low-intensity points.

这里有几件事要注意。 我们正在使用传递函数,通过hot色图来阴影投影坐标。 我们已将映射方法指定为log 。 这是为了确保即使是低强度的点也可以在可视化中得到充分的体现。 如果我们选择linear映射,则高强度点将完全覆盖低强度点。

Another mapping option is eq_hist , which produces a result similar to the log transformation. You can read more about it here. A comparison of the outputs of the 3 transformations in shown below.

另一个映射选项是eq_hist ,它产生的结果类似于对数转换。 您可以在此处了解更多信息。 下面显示了3个转换的输出的比较。

Image for post
Different transformations
不同的转变

As you can see, almost nothing is visible with the linear transformation. This is because a couple of pixels with extremely high intensity have overshadowed all others. You will need to zoom-in to identify those hotspots.

如您所见,线性变换几乎看不到任何东西。 这是因为几个具有极高强度的像素使所有其他像素都黯淡了。 您将需要放大以识别那些热点。

Similar to the transformation, different color map options are also available. To get the list of all color maps, click here. Below, the examples with a few different color maps are shown.

与转换类似,也可以使用不同的颜色图选项。 要获取所有颜色图的列表, 请单击此处 。 下面显示了带有一些不同颜色映射的示例。

Image for post
Different color maps
不同的颜色图

创建可视化: (Creating the visualization:)

fig = px.scatter_mapbox(df.tail(1), 
lat='latitude',
lon='longitude',
zoom=4,width=1000, height=1000)# Add the datashader image as a mapbox layer image
fig.update_layout(mapbox_style="carto-darkmatter",
mapbox_layers = [
{
"sourcetype": "image",
"source": img,
"coordinates": coordinates
}]
)
fig.show()

Here, we are plotting just one point from the dataframe (the last one), so that plotly can create the scatter visualization. We are using the carto-darkmatter style from Mapbox and overlaying the image output of datashader as a layer on top of the visualization. Congratulations!! Your visualization is ready!

在这里,我们仅绘制了数据框中的一个点(最后一个),以便可以通过散点图创建散点图。 我们正在使用Mapbox中的carto-darkmatter样式,并将datashader的图像输出覆盖为可视化之上的一层。 恭喜!! 您的可视化已准备就绪!

何时使用此库: (When to use this library:)

The answer is perhaps the simplest for this library. Use this when you have a very large data set. If you find this visualization aesthetically appealing as I do, then you can use this for smaller datasets as well, but the results will depend on the density distribution of your data. You won’t get high interactivity, because datashader essentially overlays an image on the Mapbox base-map. But you can still zoom and pan the visualization.

对于这个库,答案也许是最简单的。 如果数据集非常大,请使用此选项。 如果您发现这种可视化效果像我一样美观,那么您也可以将其用于较小的数据集,但结果将取决于数据的密度分布。 您不会获得很高的交互性,因为datashader本质上会将图像叠加在Mapbox底图上。 但是您仍然可以缩放和平移可视化效果。

We are trying to fix some broken benches in the Indian agriculture ecosystem through technology, to improve farmers’ income. If you share the same passion join us in the pursuit, or simply drop us a line on report@carnot.co.in

我们正在尝试通过技术修复印度农业生态系统中一些破烂的长凳 ,以提高农民的收入。 如果您有同样的热情,请加入我们的行列,或者直接给我们写信至report@carnot.co.in

翻译自: https://medium.com/tech-carnot/plotly-datashader-visualizing-large-geospatial-datasets-bea27b9d7824

基于plotly数据可视化

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/388522.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Centos用户和用户组管理

inux系统是一个多用户多任务的分时操作系统&#xff0c;任何一个要使用系统资源的用户&#xff0c;都必须首先向系统管理员申请一个账号&#xff0c;然后以这个账号的身份进入系统。1、添加新的用户账号使用useradd命令&#xff0c;其语法如下&#xff1a;useradd 选项 用户名-…

划痕实验 迁移面积自动统计_从Jupyter迁移到合作实验室

划痕实验 迁移面积自动统计If you want to use Google Colaboratory to perform your data analysis, for building data pipelines and data visualizations, here is the beginners’ guide to migrate from one tool to the other.如果您想使用Google Colaboratory进行数据分…

数据开放 数据集_除开放式清洗之外:叙述是开放数据门户的未来吗?

数据开放 数据集There is growing consensus in the open data community that the mere release of open data — that is data that can be freely accessed, remixed, and redistributed — is not enough to realize the full potential of openness. Successful open data…

ios android 交互 区别,很多人不承认:iOS的返回交互,对比Android就是反人类。

宁之的奥义2020-09-21 10:54:39点灭只看此人举报给你解答&#xff1a;美国人都是左撇子&#xff0c;所以他们很方便&#x1f436;给你解答&#xff1a;美国人都是左撇子&#xff0c;所以他们很方便&#x1f436;亮了(504)回复查看评论(19)回忆的褶皱楼主2020-09-21 11:01:01点灭…

Servlet+JSP

需要说明的是&#xff0c;其实工具的版本不是主要因素&#xff0c;所以我下面忽略版本。 你能搜到这篇文章&#xff0c;说明你已经知道怎么部署Tomcat&#xff0c;并运行自己的网页了。 但是&#xff0c;我们知道&#xff0c;每次修改源文件&#xff0c;我们总得手工把文件co…

正态分布高斯分布泊松分布_正态分布:将数据转换为高斯分布

正态分布高斯分布泊松分布For detailed implementation in python check my GitHub repository.有关在python中的详细实现&#xff0c;请查看我的GitHub存储库。 介绍 (Introduction) Some machine learning model like linear and logistic regression assumes a Gaussian di…

BABOK - 开篇:业务分析知识体系介绍

本文更新版已挪至 http://www.zhoujingen.cn/itbang/328.html ---------------------------------------------- 当我们作项目时&#xff0c;下面这张图很多人都明白&#xff0c;从计划、构建、测试、部署实施后发现提供的方案并不能真正解决用户的问题&#xff0c;那么我们是…

黑苹果 wifi android,动动手指零负担让你的黑苹果连上Wifi

动动手指零负担让你的黑苹果连上Wifi2019-12-02 10:08:485点赞36收藏4评论购买理由黑苹果Wifi是个头疼的问题&#xff0c;高“贵”的原机Wifi蓝牙很贵&#xff0c;比如我最近偶然得到的BCM94360CS2&#xff0c;估计要180。稍微便宜的一点的&#xff0c;搞各种ID&#xff0c;各种…

float在html语言中的用法,float属性值包括

html中不属于float常用属性值的是float常用的值就三个:left\right\none。没有其他的值了。 其中none这个值是默认的&#xff0c;所以一般不用写。css中float属性有几种用法&#xff1f;值 描述left 元素向左浮动。 right 元素向右浮动。 none 默认值。元素不浮动&#xff0c;并…

它们是什么以及为什么我们不需要它们

Once in a while, when reading papers in the Reinforcement Learning domain, you may stumble across mysterious-sounding phrases such as ‘we deal with a filtered probability space’, ‘the expected value is conditional on a filtration’ or ‘the decision-mak…

LoadRunner8.1破解汉化过程

LR8.1版本已经将7.8和8.0中通用的license封了&#xff0c;因此目前无法使用LR8.1版本&#xff0c;包括该版本的中文补丁。 破解思路&#xff1a;由于软件的加密程序和运行的主程序是分开的&#xff0c;因此可以使用7.8的加密程序覆盖8.1中的加密程序&#xff0c;这样老的7.8和…

TCP/IP网络编程之基于TCP的服务端/客户端(二)

回声客户端问题 上一章TCP/IP网络编程之基于TCP的服务端/客户端&#xff08;一&#xff09;中&#xff0c;我们解释了回声客户端所存在的问题&#xff0c;那么单单是客户端的问题&#xff0c;服务端没有任何问题&#xff1f;是的&#xff0c;服务端没有问题&#xff0c;现在先让…

谈谈iOS获取调用链

本文由云社区发表iOS开发过程中难免会遇到卡顿等性能问题或者死锁之类的问题&#xff0c;此时如果有调用堆栈将对解决问题很有帮助。那么在应用中如何来实时获取函数的调用堆栈呢&#xff1f;本文参考了网上的一些博文&#xff0c;讲述了使用mach thread的方式来获取调用栈的步…

python 移动平均线_Python中的移动平均线

python 移动平均线There are situations, particularly when dealing with real-time data, when a conventional average is of little use because it includes old values which are no longer relevant and merely give a misleading impression of the current situation.…

html5字体的格式转换,font字体

路由器之家网今天精心准备的是《font字体》&#xff0c;下面是详解&#xff01;html中的标签是什么意思HTML提供了文本样式标记&#xff0c;用来控制网页中文本的字体、字号和颜色&#xff0c;多种多样的文字效果可以使网页变得更加绚丽。其基本语法格式&#xff1a;文本内容fa…

红星美凯龙牵手新潮传媒抢夺社区消费市场

瞄准线下流量红利&#xff0c;红星美凯龙牵手新潮传媒抢夺社区消费市场 中新网1月14日电 2019年1月13日&#xff0c;红星美凯龙和新潮传媒战略合作发布会在北京召开&#xff0c;双方宣布建立全面的战略合作伙伴关系。未来&#xff0c;新潮传媒的梯媒产品将入驻红星美凯龙的全国…

机器学习 啤酒数据集_啤酒数据集上的神经网络

机器学习 啤酒数据集Artificial neural networks (ANNs), usually simply called neural networks (NNs), are computing systems vaguely inspired by the biological neural networks that constitute animal brains.人工神经网络(ANN)通常简称为神经网络(NNs)&#xff0c;是…

ER TO SQL语句

ER TO SQL语句的转换&#xff0c;在数据库设计生命周期的位置如下所示。 一、转换的类别 从ER图转化得到关系数据库中的SQL表&#xff0c;一般可分为3类&#xff1a; 1&#xff09;转化得到的SQL表与原始实体包含相同信息内容。该类转化一般适用于&#xff1a; 二元“多对多”关…

dede 5.7 任意用户重置密码前台

返回了重置的链接&#xff0c;还要把&amp删除了&#xff0c;就可以重置密码了 结果只能改test的密码&#xff0c;进去过后&#xff0c;这个居然是admin的密码&#xff0c;有点头大&#xff0c;感觉这样就没有意思了 我是直接上传的一句话&#xff0c;用菜刀连才有乐趣 OK了…

nasa数据库cm1数据集_获取下一个地理项目的NASA数据

nasa数据库cm1数据集NASA provides an extensive library of data points that they’ve captured over the years from their satellites. These datasets include temperature, precipitation and more. NASA hosts this data on a website where you can search and grab in…