seaborn添加数据标签_常见Seaborn图的数据标签快速指南

seaborn添加数据标签

In the course of my data exploration adventures, I find myself looking at such plots (below), which is great for observing trend but it makes it difficult to make out where and what each data point is.

在进行数据探索的过程中,我发现自己正在查看此类图(如下),这对于观察趋势非常有用,但是很难确定每个数据点的位置和位置。

A line plot showing the total number of passengers yearly.
How many passengers are there in 1956?
1956年有多少乘客?

The purpose of this piece of writing is to provide a quick guide in labelling common data exploration seaborn graphs. All the code used can be found here.

本文的目的是提供一个快速指南,以标记常见的数据探索海洋图。 所有使用的代码都可以在这里找到。

建立 (Set-Up)

Seaborn’s flights dataset will be used for the purposes of demonstration.

Seaborn的航班数据集将用于演示。

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline# load dataset
flights = sns.load_dataset(‘flights’)
flights.head()
Dataframe showing the first 5 rows of the data in flights.
First 5 rows of the the data in flights
排期中数据的前5行

For increased ease and convenience in creating some plots, some additional data frames can be created.

为了增加创建某些绘图的便利性和便利性,可以创建一些其他数据框。

# set up flights by year dataframe
year_flights = flights.groupby(‘year’).sum().reset_index()
year_flights
Dataframe showing each year and the total number of flight passengers that year.
Total number of passengers for each year
每年的乘客总数
# set up average number of passengers by month dataframe
month_flights = flights.groupby(‘month’).agg({‘passengers’: ‘mean’}).reset_index()
month_flights
Dataframe showing each month of the year and the average number of flight passengers for that month.
Total number of passengers for each month
每个月的乘客总数

线图 (Line Plot)

Plotting a graph of passengers per year:

绘制每年的乘客图:

# plot line graph
sns.set(rc={‘figure.figsize’:(10,5)})
ax = sns.lineplot(x=’year’, y=’passengers’, data=year_flights, marker=’*’, color=’#965786')
ax.set(title=’Total Number of Passengers Yearly’)# label points on the plot
for x, y in zip(year_flights[‘year’], year_flights[‘passengers’]):
# the position of the data label relative to the data point can be adjusted by adding/subtracting a value from the x &/ y coordinates
plt.text(x = x, # x-coordinate position of data label
y = y-150, # y-coordinate position of data label, adjusted to be 150 below the data point
s = ‘{:.0f}’.format(y), # data label, formatted to ignore decimals
color = ‘purple’) # set colour of line
A line plot showing the total number of passengers yearly with data labels.
Line plot showing the total number of passengers yearly.
折线图显示了每年的乘客总数。

At times, it would be preferable for the data label to be more visible, which can be achieved by adding a background colour to the data labels:

有时,最好使数据标签更可见,这可以通过向数据标签添加背景色来实现:

# add set_backgroundcolor(‘color’) after plt.text(‘…’)
plt.text(x, y-150, ‘{:.0f}’.format(y), color=’white’).set_backgroundcolor(‘#965786’)
A line plot showing the total number of passengers yearly with data labels that have a background colour.
Line plot showing the total number of passengers yearly.
折线图显示了每年的乘客总数。

直方图 (Histogram)

Plotting a histogram of the frequency of passengers on each flight:

绘制每次航班上乘客频率的直方图:

# plot histogram 
ax = sns.distplot(flights[‘passengers’], color=’#9d94ba’, bins=10, kde=False)
ax.set(title=’Distribution of Passengers’)# label each bar in histogram
for p in ax.patches:
height = p.get_height() # get the height of each bar
# adding text to each bar
ax.text(x = p.get_x()+(p.get_width()/2), # x-coordinate position of data label, padded to be in the middle of the bar
y = height+0.2, # y-coordinate position of data label, padded 0.2 above bar
s = ‘{:.0f}’.format(height), # data label, formatted to ignore decimals
ha = ‘center’) # sets horizontal alignment (ha) to center
Histogram showing the frequency of passengers on each flight.
Histogram showing the number of passengers on each flight.
直方图显示每次航班上的乘客人数。

An additional information that might be beneficial to reflect in the graph as well is the mean line of the dataset:

可能也有益于在图中反映的其他信息是数据集的平均线:

# plot histogram 
# …# adding a vertical line for the average passengers per flight
plt.axvline(flights[‘passengers’].mean(), color=’purple’, label=’mean’)# adding data label to mean line
plt.text(x = flights[‘passengers’].mean()+3, # x-coordinate position of data label, adjusted to be 3 right of the data point
y = max([h.get_height() for h in ax.patches]), # y-coordinate position of data label, to take max height
s = ‘mean: {:.0f}’.format(flights[‘passengers’].mean()), # data label
color = ‘purple’) # colour of the vertical mean line# label each bar in histogram
# …
Histogram showing the frequency of passengers on each flight with a vertical line indicating the mean.
Histogram showing the number of passengers on each flight and a line indicating the mean.
直方图显示每次航班上的乘客人数,线表示平均值。

条形图 (Bar Plot)

Vertical Bar Plot

垂直条形图

Plotting the total number of passengers for each year:

绘制每年的乘客总数:

# plot vertical barplot
sns.set(rc={‘figure.figsize’:(10,5)})
ax = sns.barplot(x=’year’, y=’passengers’, data=year_flights)
ax.set(title=’Total Number of Passengers Yearly’) # title barplot# label each bar in barplot
for p in ax.patches:
# get the height of each bar
height = p.get_height()
# adding text to each bar
ax.text(x = p.get_x()+(p.get_width()/2), # x-coordinate position of data label, padded to be in the middle of the bar
y = height+100, # y-coordinate position of data label, padded 100 above bar
s = ‘{:.0f}’.format(height), # data label, formatted to ignore decimals
ha = ‘center’) # sets horizontal alignment (ha) to center
Bar Plot with vertical bars showing the total number of passengers yearly.
Bar plot with vertical bars showing the total number of passengers yearly
竖线条形图,显示每年的乘客总数

Horizontal Bar Plot

水平条形图

Plotting the average number of passengers on flights each month:

绘制每月航班的平均乘客数:

# plot horizontal barplot
sns.set(rc={‘figure.figsize’:(10,5)})
ax = sns.barplot(x=’passengers’, y=’month’, data=month_flights, orient=’h’)
ax.set(title=’Average Number of Flight Passengers Monthly’) # title barplot# label each bar in barplot
for p in ax.patches:
height = p.get_height() # height of each horizontal bar is the same
width = p.get_width() # width (average number of passengers)
# adding text to each bar
ax.text(x = width+3, # x-coordinate position of data label, padded 3 to right of bar
y = p.get_y()+(height/2), # # y-coordinate position of data label, padded to be in the middle of the bar
s = ‘{:.0f}’.format(width), # data label, formatted to ignore decimals
va = ‘center’) # sets vertical alignment (va) to center
Bar plot with horizontal bars showing the average number of passengers for each month.
Bar plot with horizontal bars showing the average number of passengers for each month
带有水平条的条形图,显示每个月的平均乘客人数

使用注意事项 (Notes on Usage)

It might be beneficial to add data labels to some plots (especially bar plots), it would be good to experiment and test out different configurations (such as using labels only for certain meaningful points, instead of labelling everything) and not overdo the labelling, especially if there are many points. A clean and informative graph is usually more preferable than a cluttered one.

将数据标签添加到某些图(尤其是条形图)可能是有益的,尝试并测试不同的配置(例如仅对某些有意义的点使用标签,而不是对所有内容进行标签),并且不要过度标注,特别是如果有很多要点的话。 通常,干净整洁的图表比混乱的图表更可取。

# only labelling some points on graph# plot line graph
sns.set(rc={‘figure.figsize’:(10,5)})
ax = sns.lineplot(x=’year’, y=’passengers’, data=year_flights, marker=’*’, color=’#965786')# title the plot
ax.set(title=’Total Number of Passengers Yearly’)mean = year_flights[‘passengers’].mean()# label points on the plot only if they are higher than the mean
for x, y in zip(year_flights[‘year’], year_flights[‘passengers’]):
if y > mean:
plt.text(x = x, # x-coordinate position of data label
y = y-150, # y-coordinate position of data label, adjusted to be 150 below the data point
s = ‘{:.0f}’.format(y), # data label, formatted to ignore decimals
color = ‘purple’) # set colour of line
A line plot showing the total number of passengers yearly.
Line plot showing the total number of passengers yearly.
折线图显示了每年的乘客总数。

翻译自: https://medium.com/swlh/quick-guide-to-labelling-data-for-common-seaborn-plots-736e10bf14a9

seaborn添加数据标签

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389210.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

使用python pandas dataframe学习数据分析

⚠️ Note — This post is a part of Learning data analysis with python series. If you haven’t read the first post, some of the content won’t make sense. Check it out here.Note️ 注意 -这篇文章是使用python系列学习数据分析的一部分。 如果您还没有阅读第一篇文…

无向图g的邻接矩阵一定是_矩阵是图

无向图g的邻接矩阵一定是To study structure,tear away all flesh soonly the bone shows.要研究结构,请尽快撕掉骨头上所有的肉。 Linear algebra. Graph theory. If you are a data scientist, you have encountered both of these fields in your study or work …

前端绘制绘制图表_绘制我的文学风景

前端绘制绘制图表Back when I was a kid, I used to read A LOT of books. Then, over the last couple of years, movies and TV series somehow stole the thunder, and with it, my attention. I did read a few odd books here and there, but not with the same ferocity …

如何描绘一个vue的项目_描绘了一个被忽视的幽默来源

如何描绘一个vue的项目Source)来源 ) Data visualization is a great way to celebrate our favorite pieces of art as well as reveal connections and ideas that were previously invisible. More importantly, it’s a fun way to connect things we love — visualizing …

数据存储加密和传输加密_将时间存储网络应用于加密预测

数据存储加密和传输加密I’m not going to string you along until the end, dear reader, and say “Didn’t achieve anything groundbreaking but thanks for reading ;)”.亲爱的读者,我不会一直待到最后,然后说: “没有取得任何开创性的…

熊猫分发_熊猫新手:第一部分

熊猫分发For those just starting out in data science, the Python programming language is a pre-requisite to learning data science so if you aren’t familiar with Python go make yourself familiar and then come back here to start on Pandas.对于刚接触数据科学的…

多线程 进度条 C# .net

前言  在我们应用程序开发过程中,经常会遇到一些问题,需要使用多线程技术来加以解决。本文就是通过几个示例程序给大家讲解一下多线程相关的一些主要问题。 执行长任务操作  许多种类的应用程序都需要长时间操作,比如:执行一…

《Linux内核原理与分析》第六周作业

课本:第五章 系统调用的三层机制(下) 中断向量0x80和system_call中断服务程序入口的关系 0x80对应着system_call中断服务程序入口,在start_kernel函数中调用了trap_init函数,trap_init函数中调用了set_system_trap_gat…

Codeforces Round 493

心情不好&#xff0c;被遣散回学校 &#xff0c;心态不好 &#xff0c;为什么会累&#xff0c;一直微笑就好了 #include<bits/stdc.h> using namespace std; int main() {freopen("in","r",stdin);\freopen("out","w",stdout);i…

android动画笔记二

从android3.0&#xff0c;系统提供了一个新的动画&#xff0d;property animation, 为什么系统会提供这样一个全新的动画包呢&#xff0c;先来看看之前的补间动画都有什么缺陷吧1、传统的补间动画都是固定的编码&#xff0c;功能是固定的&#xff0c;扩展难度大。比如传统动画只…

回归分析检验_回归分析

回归分析检验Regression analysis is a reliable method in statistics to determine whether a certain variable is influenced by certain other(s). The great thing about regression is also that there could be multiple variables influencing the variable of intere…

是什么样的骚操作让应用上线节省90%的时间

优秀的程序员 总会想着 如何把花30分钟才能解决的问题 在5分钟内就解决完 例如在应用上线这件事上 通常的做法是 构建项目在本地用maven打包 每次需要clean一次&#xff0c;再build一次 部署包在本地ide、git/svn、maven/gradie 及代码仓库、镜像仓库和云平台间 来回切换 上传部…

Ubuntu 18.04 下如何配置mysql 及 配置远程连接

首先是大家都知道的老三套&#xff0c;啥也不说上来就放三个大招&#xff1a; sudo apt-get install mysql-serversudo apt isntall mysql-clientsudo apt install libmysqlclient-dev 这三步下来mysql就装好了&#xff0c;然后我们偷偷检查一下 sudo netstat -tap | grep mysq…

数据科学与大数据技术的案例_主数据科学案例研究,招聘经理的观点

数据科学与大数据技术的案例I’ve been in that situation where I got a bunch of data science case studies from different companies and I had to figure out what the problem was, what to do to solve it and what to focus on. Conversely, I’ve also designed case…

队列的链式存储结构及其实现_了解队列数据结构及其实现

队列的链式存储结构及其实现A queue is a collection of items whereby its operations work in a FIFO — First In First Out manner. The two primary operations associated with them are enqueue and dequeue.队列是项目的集合&#xff0c;由此其操作以FIFO(先进先出)的方…

cad2016珊瑚_预测有马的硬珊瑚覆盖率

cad2016珊瑚What’s the future of the world’s coral reefs?世界珊瑚礁的未来是什么&#xff1f; In February of 2020, scientists at University of Hawaii Manoa released a study addressing this very question. The models they developed forecasted a 70–90% worl…

EChart中使用地图方式总结(转载)

EChart中使用地图方式总结 2018年02月06日 22:18:57 来源&#xff1a;https://blog.csdn.net/shaxiaozilove/article/details/79274772最近在仿照EChart公交线路方向示例&#xff0c;开发表示排水网和污水网流向地图&#xff0c;同时地图上需要叠加排放口、污染源、污水处理厂等…

android mvp模式

越来越多人讨论mvp模式&#xff0c;mvp在android应用开发中获得更多的重视&#xff0c;这里说一下对MVP的简单了解。 什么是 MVP? MVP模式使逻辑从视图层分开&#xff0c;目的是我们在屏幕上怎么表现&#xff0c;和界面如何工作的所有事情就完全分开了。 View显示数据&…

Node.js REPL(交互式解释器)

2019独角兽企业重金招聘Python工程师标准>>> Node.js REPL(交互式解释器) Node.js REPL(Read Eval Print Loop:交互式解释器) 表示一个电脑的环境&#xff0c;类似 Window 系统的终端或 Unix/Linux shell&#xff0c;我们可以在终端中输入命令&#xff0c;并接收系统…

用python进行营销分析_用python进行covid 19分析

用python进行营销分析Python is a highly powerful general purpose programming language which can be easily learned and provides data scientists a wide variety of tools and packages. Amid this pandemic period, I decided to do an analysis on this novel coronav…