mongodb数据可视化_使用MongoDB实时可视化开放数据

mongodb数据可视化

Using Python to connect to Taiwan Government PM2.5 open data API, and schedule to update data in real time to MongoDB — Part 2

使用Python连接到台湾政府PM2.5开放数据API,并计划将数据实时更新到MongoDB —第2部分

目标 (Goal)

This time I’m using the same PM2.5 open data API (used in Part 1) to showcase how to refresh real time data into mongoDB for every 2 min (because it’s the time for the government’s portal to refresh its API). The strength of mongoDB is it’s simple to use, especially with JSON document format data. This makes connecting to open data much easier. Also we can directly show real time data changes from our database using its Charts & Dashboard features.

这次,我使用相同的PM2.5开放数据API(在第1部分中使用过)来展示如何每2分钟将实时数据刷新到mongoDB中(因为这是政府门户网站刷新其API的时间)。 mongoDB的优势在于它易于使用,尤其是使用JSON文档格式数据时。 这使得连接打开的数据变得更加容易。 我们还可以使用其“图表和仪表板”功能直接从数据库中显示实时数据更改。

How convenient!

多么方便!

The below demo uses Taipei City (the capital city of Taiwan) as example:

下面的演示以台北市(台湾省会城市)为例:

Skills covered:

涵盖技能:

  • Connect to API with required parameters to filter out all sensors data in Taipei City

    连接到具有所需参数的API,以过滤掉台北市中的所有传感器数据
  • Insert the first batch of data into mongoDB

    将第一批数据插入mongoDB
  • Set a schedule to extract new batch of PM2.5 data from API into mongoDB

    设置时间表以从API将新的PM2.5数据批次提取到mongoDB中
  • Create charts into dashboard

    将图表创建到仪表板

So, let’s get started.

因此,让我们开始吧。

处理 (Process)

Import all required libraries:

导入所有必需的库:

# connect to mongoDB cloud cluster
import pymongo
from pymongo import MongoClient# convert timezone
import pytz, dateutil.parser# connect to government open data API
import requests

Connect to API with required parameters to filter out all sensors data in Taipei City. Raw data looks like below (total count of sensors is 100):

使用必需的参数连接到API,以过滤掉台北市中的所有传感器数据。 原始数据如下所示(传感器总数为100):

Image for post

All data was stored in “first_batch” variable:

所有数据都存储在“ first_batch”变量中:

# Parameters: the latest data, observation value > 0, PM2.5 data only, Taipei city
# https://sta.ci.taiwan.gov.tw/STA_AirQuality_EPAIoT/v1.0/Datastreams?$expand=Thing,Observations($top=1)&$filter=name eq'PM2.5' and Observations/result gt 0 and Thing/properties/city eq '臺北市'&$count=truedef API_data():API_URL = "https://sta.ci.taiwan.gov.tw/STA_AirQuality_EPAIoT/v1.0/Datastreams?$expand=Thing,Observations($top=1)&$filter=name%20eq%27PM2.5%27%20and%20Observations/result%20gt%200%20and%20Thing/properties/city%20eq%20%27%E8%87%BA%E5%8C%97%E5%B8%82%27&$count=true"total = requests.get(API_URL).json()data = total['value']first_batch = []for item in data:dic = {}dic['_id'] = item['Thing']['properties']['stationID']dic['name'] = item['name']dic['areaDescription'] = item['Thing']['properties']['areaDescription']dic['city'] = item['Thing']['properties']['city']dic['township'] = item['Thing']['properties']['township']dic['observedArea'] = item['observedArea']dic['iso8601_UTC_0'] = item['Observations'][0]['phenomenonTime']UTC_0 = dateutil.parser.parse(dic['iso8601_UTC_0'])dic['UTC_0'] = str(UTC_0)UTC_8 = UTC_0.astimezone(pytz.timezone("Asia/Taipei"))dic['UTC_8'] = str(UTC_8)dic['result'] = item['Observations'][0]['result']dic['unitOfMeasurement'] = item['unitOfMeasurement']['symbol']first_batch.append(dic)return first_batchfirst_batch = API_data()

The first value within “first_batch” list is a sensor station’s data read:

“ first_batch”列表中的第一个值是读取的传感器站数据:

print(first_batch[0])# output: 
{'_id': '10189360662', 'name': 'PM2.5', 'areaDescription': '營建混合物土資場', 'city': '臺北市', 'township': '北投區', 'observedArea': {'type': 'Point', 'coordinates': [121.4871916, 25.121195]}, 'iso8601_UTC_0': '2020-08-20T05:22:58.000Z', 'UTC_0': '2020-08-20 05:22:58+00:00', 'UTC_8': '2020-08-20 13:22:58+08:00', 'result': 22.0, 'unitOfMeasurement': 'μg/m3'}

Then connect to my mongoDB Atlas and insert the first batch of data:

然后连接到我的mongoDB Atlas并插入第一批数据:

# connect to my mongoDB cloud clustercluster = MongoClient("mongodb+srv://<username>:<password>@cluster0.dd7sd.mongodb.net/<dbname>?retryWrites=true&w=majority")# my database name
db = cluster["test"]# my collection's name
collection = db["test2"]results = collection.insert_many(first_batch)

Next, set a scheduler to pull out latest PM2.5 data read from API (every 2 min and stop at a time whenever we wanted) and update data by “_id” on mongoDB i.e. “stationID” of each station:

接下来,设置一个调度程序以提取从API读取的最新PM2.5数据(每2分钟一次,并在需要时停止一次),并在mongoDB上通过“ _id”更新数据,即每个站的“ stationID”:

import schedule
import time
import datetime
import sysdef update_content():# get a new batchnew_batch = API_data() for item in new_batch:update_data = {"iso8601_UTC_0": item['iso8601_UTC_0'], "UTC_0": item['UTC_0'], "UTC_8": item['UTC_8'], "result": item['result']}results = collection.update_one({"_id": item['_id']}, {"$set": update_data}, upsert=True)def stop_update():sys.exit()schedule.every(2).minutes.do(update_content)
schedule.every(5).minutes.do(stop_update)while True: schedule.run_pending() time.sleep(1)

In mongoDB it will look like this:

在mongoDB中,它将如下所示:

Image for post
PM2.5 intensity score was 19.47.
PM2.5强度得分是19.47。
Image for post
After 2 min, it became 20.16.
2分钟后,它变成20.16。

Lastly, we created each charts on dashboard as following:

最后,我们在仪表板上创建了每个图表,如下所示:

Image for post
Add new data source (my real time data is saved in collection “test2”).
添加新的数据源(我的实时数据保存在集合“ test2”中)。
Image for post
Create a new dashboard.
创建一个新的仪表板。
Image for post
Create a heat map.
创建一个热图。
Image for post
Once we drag the chart into dashboard, we can set auto-refresh feature on the dashboard. When our application is running in the background, updating data into mongoDB, our charts will then be updated accordingly.
将图表拖入仪表板后,可以在仪表板上设置自动刷新功能。 当我们的应用程序在后台运行时,将数据更新到mongoDB中,然后将相应地更新我们的图表。
Image for post
We can also create a scatter plot with customized tooltips. We can see there was a construction site which may result in higher level of PM2.5.
我们还可以使用自定义工具提示创建散点图。 我们看到有一个建筑工地,可能导致更高的PM2.5水平。
Image for post
Note that time series line chart’s date format need to be modified in customized tab.
请注意,需要在自定义标签中修改时间序列折线图的日期格式。
Image for post
We can also create a gauge chart (The maximum score of PM2.5 is 100.)
我们还可以创建一个量表(PM2.5的最高得分为100。)

结论 (Conclusion)

With the above 4 charts, our dashboard is ready:

有了以上4个图表,我们的仪表板已准备就绪:

Image for post
Image for post

We can further modify the color according to the intensity level set by government e.g. in Taiwan, 0–30 μg/m3 is low, 30–50 μg/m3 is medium, etc. Below I set within 5 min, how much the PM2.5 intensity changed “slightly” across different sensors in Taipei City on both maps. This clip was recorded later than the previous demo, around 19:00–19:30, but still on the same day.

我们可以根据政府设定的强度水平进一步修改颜色,例如在台湾,0–30μg/ m3低,30–50μg/ m3中度等。在5分钟内低于我设定的PM2。在两张地图上,台北市的不同传感器上的5个强度“略有变化”。 该剪辑的录制时间比上一个演示晚,大约在19:00–19:30,但仍在同一天。

At the left-bottom corner of scatter plot, it shows how much time left for mongoDB to refresh the data input again, or just stare at the below clip for 10 sec you may spot the difference :D

在散点图的左下角,它显示了mongoDB再次刷新数据输入还有多少时间,或者只是盯着下面的剪辑10秒钟,您可能会发现差异:D

Image for post
Recorded at 19:00–19:30 on Aug 20, 2020
记录于2020年8月20日19:00–19:30

That’s it. Hope you find this helpful.

而已。 希望对您有所帮助。

Have a wonderful day!

祝你有美好的一天!

翻译自: https://medium.com/li-ting-liao-tiffany/visualize-open-data-using-mongodb-in-real-time-2cca4bcca26e

mongodb数据可视化

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/391621.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

4.kafka的安装部署

为了安装过程对一些参数的理解&#xff0c;我先在这里提一下kafka一些重点概念,topic,broker,producer,consumer,message,partition,依赖于zookeeper, kafka是一种消息队列,他的服务端是由若干个broker组成的&#xff0c;broker会向zookeeper&#xff0c;producer生成者对应一个…

ecshop 前台个人中心修改侧边栏 和 侧边栏显示不全 或 导航现实不全

怎么给个人中心侧边栏加项或者减项 在模板文件default/user_menu.lbi 文件里添加或者修改,一般看到页面都会知道怎么加,怎么删,这里就不啰嗦了 添加一个栏目以后,这个地址跳的页面怎么写 这是最基本的一个包括左侧个人信息,头部导航栏 <!DOCTYPE html PUBLIC "-//W3C//…

面向对象编程思想-观察者模式

一、引言 相信猿友都大大小小经历过一些面试&#xff0c;其中有道经典题目&#xff0c;场景是猫咪叫了一声&#xff0c;老鼠跑了&#xff0c;主人被惊醒&#xff08;设计有扩展性的可加分&#xff09;。对于初学者来说&#xff0c;可能一脸懵逼&#xff0c;这啥跟啥啊是&#x…

Python:在Pandas数据框中查找缺失值

How to find Missing values in a data frame using Python/Pandas如何使用Python / Pandas查找数据框中的缺失值 介绍&#xff1a; (Introduction:) When you start working on any data science project the data you are provided is never clean. One of the most common …

监督学习-回归分析

一、数学建模概述 监督学习&#xff1a;通过已有的训练样本进行训练得到一个最优模型&#xff0c;再利用这个模型将所有的输入映射为相应的输出。监督学习根据输出数据又分为回归问题&#xff08;regression&#xff09;和分类问题&#xff08;classfication&#xff09;&#…

微服务架构技能

2019独角兽企业重金招聘Python工程师标准>>> 微服务架构技能 博客分类&#xff1a; 架构 &#xff08;StuQ 微服务技能图谱&#xff09; 2课程简介 本课程分为基础篇和高级篇两部分&#xff0c;旨在通过完整的案例&#xff0c;呈现微服务的开发、测试、构建、部署、…

Tableau Desktop认证:为什么要关心以及如何通过

Woah, Tableau!哇&#xff0c;Tableau&#xff01; By now, almost everyone’s heard of the data visualization software that brought visual analytics to the public. Its intuitive drag and drop interface makes connecting to data, creating graphs, and sharing d…

约束布局constraint-layout导入失败的解决方案 - 转

今天有同事用到了约束布局&#xff0c;但是导入我的工程出现错误 **提示错误&#xff1a; Could not find com.Android.support.constraint:constraint-layout:1.0.0-alpha3** 我网上查了一下资料&#xff0c;都说是因为我的androidStudio版本是最新的稳定版导入这个包就会报这…

算法复习:冒泡排序

思想&#xff1a;对于一个列表,每个数都是一个"气泡 "&#xff0c;数字越大表示"越重 "&#xff0c;最重的气泡移动到列表最后一位&#xff0c;冒泡排序后的结果就是“气泡”按照它们的重量依次移动到列表中它们相应的位置。 算法&#xff1a;搜索整个列表…

前端基础进阶(七):函数与函数式编程

纵观JavaScript中所有必须需要掌握的重点知识中&#xff0c;函数是我们在初学的时候最容易忽视的一个知识点。在学习的过程中&#xff0c;可能会有很多人、很多文章告诉你面向对象很重要&#xff0c;原型很重要&#xff0c;可是却很少有人告诉你&#xff0c;面向对象中所有的重…

显示与删除使用工具

右击工具菜单栏中的空白处选择自定义 在弹出的自定义菜单中选择命令选项在选择想要往里面添加工具的菜单&#xff0c;之后在选择要添加的工具 若想要删除工具栏中的某个工具&#xff0c;在打开自定义菜单后&#xff0c;按住鼠标左键拖动要删除工具到空白处 例如 转载于:https:/…

js值的拷贝和值的引用_到达P值的底部:直观的解释

js值的拷贝和值的引用介绍 (Introduction) Welcome to this lesson on calculating p-values.欢迎参加有关计算p值的课程。 Before we jump into how to calculate a p-value, it’s important to think about what the p-value is really for.在我们开始计算p值之前&#xff…

监督学习-KNN最邻近分类算法

分类&#xff08;Classification&#xff09;指的是从数据中选出已经分好类的训练集&#xff0c;在该训练集上运用数据挖掘分类的技术建立分类模型&#xff0c;从而对没有分类的数据进行分类的分析方法。 分类问题的应用场景&#xff1a;用于将事物打上一个标签&#xff0c;通常…

无监督学习-主成分分析和聚类分析

聚类分析&#xff08;cluster analysis&#xff09;是将一组研究对象分为相对同质的群组&#xff08;clusters&#xff09;的统计分析技术&#xff0c;即将观测对象的群体按照相似性和相异性进行不同群组的划分&#xff0c;划分后每个群组内部各对象相似度很高&#xff0c;而不…

struts实现分页_在TensorFlow中实现点Struts

struts实现分页If you want to get started on 3D Object Detection and more specifically on Point Pillars, I have a series of posts written on it just for that purpose. Here’s the link. Also, going through the Point Pillars paper directly will be really help…

MySQL-InnoDB索引实现

联合索引提高查询效率的原理 MySQL会为InnoDB的每个表建立聚簇索引&#xff0c;如果表有索引会建立二级索引。聚簇索引以主键建立索引&#xff0c;如果没有主键以表中的唯一键建立&#xff0c;唯一键也没会以隐式的创建一个自增的列来建立。聚簇索引和二级索引都是一个b树&…

钉钉设置jira机器人_这是当您机器学习JIRA票证时发生的事情

钉钉设置jira机器人For software developers, one of the most-debated and maybe even most-hated questions is “…and how long will it take?”. I’ve experienced those discussions myself, which oftentimes lacked precise information on the requirements. What I…

vscode 标准库位置_如何在VSCode中使用标准

vscode 标准库位置I use Visual Studio Code as my text editor. When I write JavaScript, I follow JavaScript Standard Style.Theres an easy way to integrate Standard in VS Code—with the vscode-standardjs plugin. I made a video for this some time ago if youre …

IBM量子计算新突破:成功构建50个量子比特原型机

本文来自AI新媒体量子位&#xff08;QbitAI&#xff09;IBM去年开始以云计算服务的形式提供量子计算能力。当时&#xff0c;IBM发布了包含5个量子比特的计算机。在短短18个月之后&#xff0c;IBM周五宣布&#xff0c;将发布包含20个量子比特的计算机。 IBM还宣布&#xff0c;该…

小程序点击地图气泡获取气泡_气泡上的气泡

小程序点击地图气泡获取气泡Combining two colors that are two steps apart on the Color Wheel creates a Diad Color Harmony. This Color Harmony is one of the lesser used ones. I decided to cover it here to add variety to your options for colorizing visualizati…