熊猫在线压缩图_回归图与熊猫和脾气暴躁

熊猫在线压缩图

数据可视化 (Data Visualization)

I like the plotting facilities that come with Pandas. Yes, there are many other plotting libraries such as Seaborn, Bokeh and Plotly but for most purposes, I am very happy with the simplicity of Pandas plotting.

我喜欢熊猫随附的绘图设备。 是的,还有许多其他的绘图库,例如Seaborn,Bokeh和Plotly,但对于大多数用途,我对Pandas绘图的简单性感到非常满意。

But there is one thing missing that I would like and that is the ability to plot a regression line over a complex line or scatter plot.

但是我想缺少一件事,那就是能够在复杂线或散点图上绘制回归线。

But, as I have discovered, this is very easily solved. With the Numpy library you can generate regression data in a couple of lines of code and plot it in the same figure as your original line or scatter plot.

但是,正如我发现的那样,这很容易解决。 使用Numpy库,您可以在几行代码中生成回归数据,并将其绘制在与原始线图或散点图相同的图中。

So that is what we are going to do in this article.

这就是我们在本文中要做的。

First, let’s get some data. If you’ve read any of my previous articles on data visualization, you know what’s coming next. I’m going to use a set of weather data that you can download from my Github account. It records the temperatures, sunshine levels and rainfall over several decades for London in the UK and is stored as a CSV file. This file has been created from public domain data recorded by the UK Met Office.

首先,让我们获取一些数据。 如果您阅读过我以前有关数据可视化的任何文章,那么您将了解接下来的内容。 我将使用一组可以从我的Github帐户下载的天气数据。 它记录了英国伦敦数十年来的温度,日照水平和降雨量,并以CSV文件存储。 该文件是根据UK Met Office记录的公共领域数据创建的。

伦敦夏天变热吗 (Are London summers getting hotter)

We are going to check whether the temperatures in London are rising over time. It’s not obvious from the raw data but by plotting a regression line over that data we will be better able to see the trend.

我们将检查伦敦的温度是否随着时间升高。 从原始数据来看并不明显,但是通过在该数据上绘制一条回归线,我们将能够更好地看到趋势。

So to begin we need to import the libraries that we will need.

因此,我们首先需要导入所需的库。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

Nothing very unusual there, we are importing Pandas to help with data analysis and visualization, Numpy will give us the routines we need to create the regression data and Matplotlib is used by Pandas to create the plots.

那里没有什么异常的,我们正在导入Pandas以帮助进行数据分析和可视化,Numpy将为我们提供创建回归数据所需的例程,而Matplotlib被Pandas用于创建图。

Next, we download the data.

接下来,我们下载数据。

weather = pd.read_csv(‘https://raw.githubusercontent.com/alanjones2/dataviz/master/londonweather.csv')

(As you probably guessed, that’s all supposed to be on one line.)

(您可能已经猜到了,这些都应该放在一行上。)

We have read the CSV file into a Pandas DataFrame and this is what it looks like — a table containing monthly data that records the maximum and minimum temperatures, the rainfall and the number of hours of sunshine, starting in 1957 and ending part way through 2019.

我们已经将CSV文件读入了Pandas DataFrame,它的样子是这样的-该表包含每月数据,记录最高和最低温度,降雨量和日照小时数,始于1957年,直到2019年结束。

Image for post

I posed the question about whether summers were getting hotter, so I’m going to filter the data to give me only the data for the month of July when the hottest temperatures are normally recorded. And, for convenience, I’m going to add a column that numbers the years starting at year 0 (you’ll see how this is used later).

我提出了一个关于夏天是否变热的问题,所以我将过滤数据以仅提供通常记录最热温度的7月的数据。 并且,为方便起见,我将添加一列以数字表示从0年开始的年份(您将在稍后看到如何使用它)。

july = weather.query(‘Month == 7’)
july.insert(0,’Yr’,range(0,len(july)))

The code above applies a query to the weather dataframe which returns only the rows where the Month is equal to 7 (i.e.July) and creates a new dataframe called july from the result.

上面的代码对天气数据框应用查询,该查询仅返回Month等于7(即7月)的行,并从结果中创建一个称为july的新数据框。

Next, we insert a new column called Yr which numbers the rows from 0 to the length of the table.

接下来,我们插入一个称为Yr的新列,该列对从0到表的长度的行进行编号。

july looks like this:

七月看起来像这样:

Image for post

Now we can plot the maximum temperatures for July since 1957.

现在,我们可以绘制1957年以来7月份的最高温度。

july.plot(y=’Tmax’,x=’Yr’)
Image for post

There is a lot of variation there and high temperatures are not limited to recent years. But there does seem to be a trend, temperatures do seem to be rising a little, over time.

那里有很多变化,高温不仅限于近年来。 但似乎确实存在趋势,随着时间的流逝,温度似乎确实有所上升。

We can try and make this a bit more obvious by doing a linear regression where we attempt to find a straight line graph that represents the trend in the rise in temperature. To do this we use the polyfit function from Numpy. Polyfit does a least squares polynomial fit over the data that it is given. We want a linear regression over the data in columns Yr and Tmax so we pass these as parameters. The final parameter is the degree of the polynomial. For linear regression the degree is 1.

我们可以通过进行线性回归来尝试使这一点更加明显,在线性回归中我们试图找到一个代表温度上升趋势的直线图。 为此,我们使用Numpy中的polyfit函数。 Polyfit对给出的数据进行最小二乘多项式拟合。 我们希望对YrTmax列中的数据进行线性回归,因此我们将它们作为参数传递。 最终参数是多项式的次数。 对于线性回归,度为1。

We then use the convenience function poly1d to provide us with a function that will do the fitting.

然后,我们使用便利函数poly1d为我们提供将进行拟合的函数。

d = np.polyfit(july[‘Yr’],july[‘Tmax’],1)
f = np.poly1d(d)

We now use the function f to produce our linear regression data and inserting that into a new column called Treg.

现在,我们使用函数f生成线性回归数据,并将其插入到名为Treg的新列中。

july.insert(6,’Treg’,f(july[‘Yr’]))

Next, we create a line plot of Yr against Tmax (the wiggly plot we saw above) and another of Yr against Treg which will be our straight line regression plot. We combine the two plot by assigning the first plot to the variable ax and then passing that to the second plot as an additional axis.

接下来,我们创建一个YrTmax的折线图(我们在上面看到的摆动曲线),以及另一个YrTreg的折线图,这将是我们的直线回归图。 我们通过将第一个图分配给变量ax ,然后将其作为附加轴传递给第二个图,来组合这两个图。

ax = july.plot(x = ‘Yr’,y=’Tmax’)
july.plot(x=’Yr’, y=’Treg’,color=’Red’,ax=ax)
Image for post

That’s it, done!

就这样,完成了!

We can now see much more clearly the upward trend of temperature over the years.

现在,我们可以更清楚地看到多年来温度的上升趋势。

And here is the same thing done with a scatter chart.

这就是散点图所做的相同的事情。

ax=july.plot.scatter(x=’Yr’, y=’Tmax’)
july.plot(x=’Yr’,y=’Treg’,color=’Red’,legend=False,ax=ax)
Image for post

That was fairly straightforward, I think, and I hope you found it useful.

我认为那非常简单,希望您发现它有用。

For an introduction to plotting with Pandas see this:

有关使用Pandas进行绘图的介绍,请参见:

翻译自: https://towardsdatascience.com/regression-plots-with-pandas-and-numpy-faf2edbfad4f

熊猫在线压缩图

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389250.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

敏捷数据科学pdf_敏捷数据科学数据科学可以并且应该是敏捷的

敏捷数据科学pdfTL;DR;TL; DR; I have encountered a lot of resistance in the data science community against agile methodology and specifically scrum framework; 在数据科学界,我遇到了许多反对敏捷方法论(特别是Scrum框架)的抵制。 I don’t see it this …

oracle的连接字符串

OracleConnection oCnn new OracleConnection("Data SourceORCL_SERVER;USERM70;PASSWORDmmm;");建立个角色 建立个表空间(角色与表空间同名的) 在方案里就可以建立表,然后就哦了 10g

SiameseRPN详解

SiameseRPN论文来源论文背景一,简介二,研究动机三、相关工作论文理论注意:网络结构:1.Siamese Network2.RPN3.LOSS计算4.Tracking论文的优缺点分析一、Siamese-RPN的贡献/优点:二、Siamese-RPN的缺点:代码流…

数据可视化 信息可视化_可视化数据操作数据可视化与纪录片的共同点

数据可视化 信息可视化Data visualization is a great way to celebrate our favorite pieces of art as well as reveal connections and ideas that were previously invisible. More importantly, it’s a fun way to connect things we love — visualizing data and kicki…

python 图表_使用Streamlit-Python将动画图表添加到仪表板

python 图表介绍 (Introduction) I have been thinking of trying out Streamlit for a while. So last weekend, I spent some time tinkering with it. If you have never heard of this tool before, it provides a very friendly way to create custom interactive Data we…

Python--day26--复习

转载于:https://www.cnblogs.com/xudj/p/9953293.html

sockets C#

Microsoft.Net Framework为应用程序访问Internet提供了分层的、可扩展的以及受管辖的网络服务,其名字空间System.Net和System.Net.Sockets包含丰富的类可以开发多种网络应用程序。.Net类采用的分层结构允许应用程序在不同的控制级别上访问网络,开发人员可…

667. Beautiful Arrangement II

找规律 1&#xff0c;2&#xff0c;... , n 乱序排列&#xff0c;相邻数据的绝对差最多有n-1种 比如1&#xff0c;2&#xff0c;3&#xff0c;4&#xff0c;5对应于 1 5 2 4 3 class Solution { public:vector<int> constructArray(int n, int k) {vector<int> re…

SiameseRPN++分析

SiamRPN论文来源论文背景什么是目标跟踪什么是孪生网络结构Siamese的局限解决的问题论文分析创新点一&#xff1a;空间感知策略创新点二&#xff1a;ResNet-50深层网络创新点三&#xff1a;多层特征融合创新点四&#xff1a;深层互相关代码分析整体代码简述&#xff08;1&#…

MySQL:Innodb page clean 线程 (二) :解析

一、数据结构和入口函数 1、数据结构 ● page_cleaner_t&#xff1a;整个Innodb只有一个&#xff0c;包含整个page clean线程相关信息。其中包含了一个page_cleaner_slot_t的指针。变量名含义mutex用于保护整个page_cleaner_t结构体和page_cleaner_slot_t结构体&#xff0c;当…

Lockdown Wheelie项目

“It’s Strava for wheelies,” my lockdown project, combining hyper-local exercise with data analytics to track and guide improvement. Practising wheelies is a great way to stay positive; after all, it’s looking up, moving forward.我的锁定项目“将Strava运…

api地理编码_通过地理编码API使您的数据更有意义

api地理编码Motivation动机 In my second semester of my Master’s degree, I was working on a dataset which had all the records of the road accident in Victoria, Australia (2013-19). I was very curious to know, which national highways are the most dangerous …

js进阶 12-5 jquery中表单事件如何使用

js进阶 12-5 jquery中表单事件如何使用 一、总结 一句话总结&#xff1a;表单事件如何使用&#xff1a;可元素添加事件监听&#xff0c;然后监听元素&#xff0c;和javase里面一样。 1、表单获取焦点和失去焦点事件有哪两组&#xff1f; 注意是blur/focus和focus in/out&#x…

SiamBAN论文学习

SiameseBAN论文来源论文背景主要贡献论文分析网络框架创新点一&#xff1a;Box Adaptive Head创新点二&#xff1a;Ground-truth创新点三&#xff1a;Anchor Free论文流程训练部分&#xff1a;跟踪部分论文翻译Abstract1. Introduction2. Related Works2.1. Siamese Network Ba…

简单入门Javascript正则表达式

我们已经会熟练使用js字符串类型了&#xff0c;例如你想知道一个变量是否等于一个字符串&#xff0c;可能可能这样判断 if(ahello,world){... } 复制代码但是往往我们有时候对一些字符串判断显得力不从心&#xff0c;例如判断一个文件的类型是否为js类型&#xff0c;可能有下面…

实现klib_使用klib加速数据清理和预处理

实现klibTL;DRThe klib package provides a number of very easily applicable functions with sensible default values that can be used on virtually any DataFrame to assess data quality, gain insight, perform cleaning operations and visualizations which results …

MMDetection修改代码无效

最近在打比赛&#xff0c;使用MMDetection框架&#xff0c;但是无论是Yolo修改类别还是更改head&#xff0c;代码运行后发现运行的是修改之前的代码。。。也就是说修改代码无效。。。 问题解决办法&#xff1a; MMDetection在首次运行后会把一部分运行核心放在anaconda的环境…

docker etcd

etcd是CoreOS团队于2013年6月发起的开源项目&#xff0c;它的目标是构建一个高可用的分布式键值(key-value)数据库&#xff0c;用于配置共享和服务发现 etcd内部采用raft协议作为一致性算法&#xff0c;etcd基于Go语言实现。 etcd作为服务发现系统&#xff0c;有以下的特点&…

SpringBoot简要

2019独角兽企业重金招聘Python工程师标准>>> 简化Spring应用开发的一个框架&#xff1b;      整个Spring技术栈的一个大整合&#xff1b;      J2EE开发的一站式解决方案&#xff1b;      自动配置&#xff1a;针对很多Spring应用程序常见的应用功能&…

发送邮件 的类 C# .net

/// <summary> /// 发送邮件 /// </summary> /// <param name"SendTo">发送人的地址</param> /// <param name"MyEmail">我的Email地址</param> /// <param name"SendTit…