Seaborn:Python

Seaborn is a data visualization library built on top of matplotlib and closely integrated with pandas data structures in Python. Visualization is the central part of Seaborn which helps in exploration and understanding of data.

Seaborn是建立在matplotlib之上的数据可视化库,并与Python中的pandas数据结构紧密集成。 可视化是Seaborn的核心部分,有助于探索和理解数据。

One has to be familiar with Numpy and Matplotlib and Pandas to learn about Seaborn.

必须熟悉Numpy MatplotlibPandas了解Seaborn。

Seaborn offers the following functionalities:

Seaborn提供以下功能:

  1. Dataset oriented API to determine the relationship between variables.

    面向数据集的API确定变量之间的关系。
  2. Automatic estimation and plotting of linear regression plots.

    自动估计和绘制线性回归图。
  3. It supports high-level abstractions for multi-plot grids.

    它支持多图网格的高级抽象。
  4. Visualizing univariate and bivariate distribution.

    可视化单变量和双变量分布。

These are only some of the functionalities offered by Seaborn, there are many more of them, and we can explore all of them here.

这些只是Seaborn提供的功能中的一部分,还有更多功能,我们可以在这里进行探索。

To initialize the Seaborn library, the command used is:

要初始化Seaborn库,使用的命令是:

import seaborn as sns

Using Seaborn we can plot wide varieties of plots like:

使用Seaborn,我们可以绘制各种各样的地块,例如:

  1. Distribution Plots

    分布图
  2. Pie Chart & Bar Chart

    饼图和条形图
  3. Scatter Plots

    散点图
  4. Pair Plots

    对图
  5. Heat maps

    热图

For this entirety of the article, we are using the dataset of Google Playstore downloaded from Kaggle.

在本文的全文中,我们使用从Kaggle下载的Google Playstore数据集

1.分布图 (1. Distribution Plots)

We can compare the distribution plot in Seaborn to histograms in Matplotlib. They both offer pretty similar functionalities. Instead of frequency plots in the histogram, here we’ll plot an approximate probability density across the y-axis.

我们可以将Seaborn中的分布图与Matplotlib中的直方图进行比较。 它们都提供了非常相似的功能。 代替直方图中的频率图,这里我们将在y轴上绘制近似的概率密度。

We will be using sns.distplot() in the code to plot distribution graphs.

我们将在代码中使用sns.distplot()绘制分布图。

Before going further, first, let’s access our dataset,

首先,让我们先访问数据集,

Accessing Dataset from our system
从我们的系统访问数据集

The dataset looks like this,

数据集看起来像这样,

Image for post
Google Play Store Dataset from Kaggle
来自Kaggle的Google Play商店数据集

Now, let’s see how distribution plot looks like if we plot for ‘Rating’ column from the above dataset,

现在,让我们看看如果从上述数据集中为“评级”列作图,分布图将是什么样子,

Code for Rating column distribution plot
评级列分布图代码

The Distribution Plot looks like this for Rating’s column,

“评分”列的“分布图”如下所示:

Image for post
Distribution Plot — Rating
分布图-等级

Here, the curve(KDE) that appears drawn over the distribution graph is the approximate probability density curve.

在此,分布图上绘制的曲线( KDE )是近似概率密度曲线。

Similar to the histograms in the matplotlib, in distribution too, we can change the number of bins and make the graph more understandable.

与matplotlib中的直方图类似,在分布上,我们也可以更改bin的数量并使图更易于理解。

We just have to add the number of bins in the code,

我们只需要在代码中添加垃圾箱的数量,

#Change the number of bins
sns.distplot(inp1.Rating, bins=20, kde = False)
plt.show()

Now, the graph looks like this,

现在,图看起来像这样,

Image for post
Distribution Plot with specific number of bins
具有特定箱数的分布图

In the above graph, there is no probability density curve. To remove the curve, we just have to write ‘kde = False’ in the code.

上图中没有概率密度曲线。 要删除曲线,我们只需要在代码中编写“ kde = False”即可

We can also provide the title and color of the bins similar to matplotlib to the distribution plots. Let’s see the code for that,

我们还可以向分布图提供类似于matplotlib的垃圾箱的标题和颜色。 让我们看一下代码

The distribution graph, for the same column rating, looks like this:

对于相同的列等级,分布图如下所示:

Image for post
Distributon plot with Title
带标题的分布图

Styling the Seaborn graphs

样式化Seaborn图

One of the biggest advantages of using Seaborn is, it offers a wide range of default styling options to our graphs.

使用Seaborn的最大优势之一是,它为我们的图形提供了多种默认样式选项。

These are the default styles offered by Seaborn.

这些是Seaborn提供的默认样式。

'Solarize_Light2',
'_classic_test_patch',
'bmh',
'classic',
'dark_background',
'fast',
'fivethirtyeight',
'ggplot',
'grayscale',
'seaborn',
'seaborn-bright',
'seaborn-colorblind',
'seaborn-dark',
'seaborn-dark-palette',
'seaborn-darkgrid',
'seaborn-deep',
'seaborn-muted',
'seaborn-notebook',
'seaborn-paper',
'seaborn-pastel',
'seaborn-poster',
'seaborn-talk',
'seaborn-ticks',
'seaborn-white',
'seaborn-whitegrid',
'tableau-colorblind10'

We just have to write one line of code to incorporate these styles into our graph.

我们只需要编写一行代码即可将这些样式合并到我们的图形中。

After applying the dark background to our graph, the distribution plot looks like this,

将深色背景应用于图表后,分布图如下所示,

Image for post
Distribution plot with dark background
深色背景的分布图

2.饼图和条形图 (2. Pie Chart & Bar Chart)

Pie Chart is generally used to analyze the data on how a numeric variable changes across different categories.

饼图通常用于分析有关数字变量如何在不同类别中变化的数据。

In the dataset we are using, we’ll analyze how the top 4 categories in the Content Rating column is performing.

在我们使用的数据集中,我们将分析“内容分级”列中排名前4位的类别的效果。

First, we’ll do some data cleaning/mining to the Content rating column and check what are the categories in there.

首先,我们将对“内容分级”列进行一些数据清理/挖掘,并检查其中的类别。

Now, the categories list will be,

现在,类别列表将是

Image for post
Content rating count
内容分级计数

As per the above output, since the count of “Adults only 18+” and “Unrated” are significantly less compared to the others, we’ll drop those categories from the Content Rating and update the dataset.

根据上面的输出,由于“仅18岁以上成人”和“未分级”的计数与其他数据相比要少得多,因此我们将从“内容分级”中删除这些类别并更新数据集。

The categories present in the “Content Rating” column after updating the sheet are,

更新工作表后,“内容分级”列中显示的类别为:

Image for post
Content Rating count after updating the dataset
更新数据集后的内容分级计数

Now, let’s plot Pie Chart for the categories present in the Content Rating column.

现在,让我们为“内容分级”列中存在的类别绘制饼图。

The Pie Chart for the above code looks like the following,

上面代码的饼图如下所示,

Image for post
Pie Chart for Content Rating
内容分级饼图

From the above Pie diagram, we cannot correctly infer whether “Everyone 10+” and “Mature 17+”. It is very difficult to assess the difference between those two categories when their values are somewhat similar to each other.

从上面的饼图中,我们无法正确推断“所有人10+”和“成熟17+”。 当它们的值彼此相似时,很难评估这两个类别之间的差异。

We can overcome this situation by plotting the above data in Bar chart.

我们可以通过在条形图中绘制以上数据来克服这种情况

Now, the bar Chart looks like the following,

现在,条形图如下所示,

Image for post
Bar Chart for Content rating column
内容分级列的条形图

Similar to Pie Chart, we can customize our Bar Graph too, with different Colors of Bars, the title of the chart, etc.

与饼图类似,我们也可以自定义条形图,使用不同的条形颜色,图表标题等。

3.散点图 (3. Scatter Plots)

Up until now, we have been dealing with only a single numeric column from the dataset, like Rating, Reviews or Size, etc. But, what if we have to infer a relationship between two numeric columns, say “Rating and Size” or “Rating and Reviews”.

到目前为止,我们仅处理数据集中的单个数字列,例如“评分”,“评论”或“大小”等。但是,如果我们必须推断两个数字列之间的关系,例如“评分和大小”或“评分和评论”。

Scatter Plot is used when we want to plot the relationship between any two numeric columns from a dataset. These plots are the most powerful visualization tools that are being used in the field of machine learning.

当我们要绘制数据集中任意两个数字列之间的关系时,使用散点图。 这些图是机器学习领域中使用的最强大的可视化工具。

Let’s see how the scatter plot looks like for two numeric columns in the dataset “Rating” & “Size”. First, we’ll plot the graph using matplotlib after that we’ll see how it looks like in seaborn.

让我们来看一下数据集“ Rating”和“ Size”中两个数字列的散点图。 首先,我们将使用matplotlib绘制图形,之后我们将看到它在seaborn中的外观。

Scatter Plot using matplotlib

使用matplotlib的散点图

#import all the necessary libraries
#Plotting the scatter plot
plt.scatter(pstore.Size, pstore.Rating)
plt.show()

Now, the plot looks like this

现在,情节看起来像这样

Image for post
Scatter Plot using Matplotlib
使用Matplotlib的散点图

Scatter Plot using Seaborn

使用Seaborn的散点图

We will be using sns.joinplot() in the code for scatter plot along with the histogram.

我们将在代码中使用sns.joinplot()和散点图以及直方图。

sns.scatterplot() in the code for only scatter plots.

代码中的sns.scatterplot()仅用于散点图。

The Scatter plot for the above code looks like,

以上代码的散点图如下所示:

Image for post
Scatter Plot using Seaborn
使用Seaborn的散点图

The main advantage of using a scatter plot in seaborn is, we’ll get both the scatter plot and the histograms in the graph.

在seaborn中使用散点图的主要优点是,我们将在图中同时获得散点图和直方图。

If we want to see only the scatter plot instead of “jointplot” in the code, just change it with “scatterplot

如果我们希望看到只有散点图,而不是在代码“jointplot”,只是“ 散点 ”更改

Regression Plot

回归图

Regression plots create a regression line between 2 numerical parameters in the jointplot(scatterplot) and help to visualize their linear relationships.

回归图可在jointplot(scatterplot)中的2个数字参数之间创建回归线,并有助于可视化它们的线性关系。

The graph looks like the following,

该图如下所示,

Image for post
Regression Plot using jointplot in Seaborn
Seaborn中使用jointplot的回归图

From the above graph, we can infer that there is a steady increase in the Rating if the Price of the apps increases.

从上图可以看出,如果应用程序的价格提高,则评级会稳定增长。

4.配对图 (4. Pair Plots)

Pair Plots are used when we want to see the relationship pattern among more than 3 different numeric variables. For example, let’s say we want to see how a company’s sales are affected by three different factors, in that case, pair plots will be very helpful.

当我们想查看三个以上不同数值变量之间的关系模式时,使用对图。 例如,假设我们想了解公司的销售受到三个不同因素的影响,在这种情况下,配对图将非常有用。

Let’s create a pair plot for Reviews, Size, Price, and Rating columns from of dataset.

让我们为数据集中的评论,尺寸,价格和评分列创建一个配对图。

We will be using sns.pairplot() in the code to plot multiple scatter plots at a time.

我们将在代码中使用sns.pairplot()一次绘制多个散点图。

The output graph for the above graphs looks like this,

以上图表的输出图表如下所示:

Image for post
Pair Plot using Seaborn
使用Seaborn进行配对图
  • For the non-diagonal views, the graph will be a scatter plot between 2 numeric variables

    对于非对角线视图,图形将是2个数字变量之间的散点图

  • For the diagonal views, it plots a histogram since both the axis(x,y) is the same.

    对于对角线视图,由于两个轴(x,y)相同,因此它绘制了直方图

5.热图 (5. Heatmaps)

The heatmap represents the data in a 2-dimensional form. The ultimate goal of the heatmap is to show the summary of information in a colored graph. It utilizes the concept of using colors and color intensities to visualize a range of values.

热图以二维形式表示数据。 热图的最终目标是在彩色图表中显示信息摘要。 它利用使用颜色和颜色强度的概念来可视化一系列值。

Most of us would have seen the following type of graphics in a football match,

我们大多数人会在足球比赛中看到以下类型的图形,

Image for post
Heatmap of a player in football
足球运动员的热图

Heatmaps in Seaborn create exactly these types of graphs.

Seaborn中的热图正是创建了这些类型的图。

We’ll be using sns.heatmap() to plot the visualization.

我们将使用sns.heatmap()绘制可视化效果。

When you have data as the following we can create a heatmap.

当您具有以下数据时,我们可以创建一个热图。

Image for post

The above table is created using the Pivot table from Pandas. You can see how Pivot tables are created in my previous article Pandas.

上表是使用Pandas的数据透视表创建的。 您可以在上一篇文章Pandas中看到如何创建数据透视表

Now, let’s see how we can create a heatmap for the above table.

现在,让我们看看如何为上表创建一个热图。

In the above code, we have saved the data in the new variable “heat.”

在上面的代码中,我们已将数据保存在新变量“ heat”中。

The heatmap looks like the following,

该热图如下所示,

Image for post
Default heatmap created on Seaborn
在Seaborn上创建的默认热图

We can apply some customization to the above graph, and also can change the color gradient so that the highest value will be darker in color and the lowest value will be lighter.

我们可以对上面的图形进行一些自定义,还可以更改颜色渐变,以使最高值的颜色更深,而最低值的颜色更浅。

The updated code will be something like this,

更新后的代码将是这样,

The heatmap for the above-updated code looks like this,

上面更新的代码的热图看起来像这样,

Image for post
Heatmap with some customizations to the code
带有一些自定义代码的热图

If we observe, in the code we have given “annot = True”, what this means is, when annot is true, each cell in the graph displays its value. If we haven’t mention annot in our code, then the default value it takes is False.

如果我们观察到,在代码中给定了“ annot = True ”,这意味着,当annot为true时 ,图中的每个单元格都会显示其值。 如果我们在代码中未提及annot ,则其默认值为False。

Seaborn also supports some of the other types of graphs like Line Plots, Bar Graphs, Stacked bar charts, etc. But, they don’t offer anything different from the ones created through matplotlib.

Seaborn还支持其他一些类型的图形,例如折线图,条形图,堆积条形图等。但是,它们提供的功能与通过matplotlib创建的功能不同。

结论 (Conclusion)

So, this is how Seaborn works in Python and the different types of graphs we can create using seaborn. As I have already mentioned, Seaborn is built on top of the matplotlib library. So, if we are already familiar with the Matplotlib and its functions, we can easily build Seaborn graphs and can explore more depth concepts.

因此,这就是Seaborn在Python中的工作方式以及我们可以使用seaborn创建的不同类型的图。 正如我已经提到的,Seaborn建立在matplotlib库的顶部。 因此,如果我们已经熟悉Matplotlib及其功能,则可以轻松构建Seaborn图并可以探索更多深度概念。

Thank you for reading and Happy Coding!!!

感谢您的阅读快乐编码!!!

在这里查看我以前有关Python的文章 (Check out my previous articles about Python here)

  • Pandas: Python

    熊猫:Python

  • Matplotlib: Python

    Matplotlib:Python

  • NumPy: Python

    NumPy:Python

  • Time Complexity and Its Importance in Python

    时间复杂度及其在Python中的重要性

  • Python Recursion or Recursive Function in Python

    Python中的Python递归或递归函数

  • Python Programs to check for Armstrong Number (n digit) and Fenced Matrix

    用于检查Armstrong编号(n位)和栅栏矩阵的Python程序

  • Python: Problems for Basics Reference — Swapping, Factorial, Reverse Digits, Pattern Print

    Python:基本参考问题-交换,阶乘,反向数字,图案打印

翻译自: https://towardsdatascience.com/seaborn-python-8563c3d0ad41

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/388990.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

福大软工 · 第十次作业 - 项目测评(团队)

写在前面 本次作业测试报告链接林燊大哥第一部分 调研,评测 一、评测 软件的bug,功能评测,黑箱测试 1.下载并使用,描述最简单直观的个人第一次上手体验 IOS端 UI界面简单明了,是我喜欢的极简风格。课程模块界面简洁优雅…

销货清单数据_2020年8月数据科学阅读清单

销货清单数据Note: I am not affiliated with any of the writers in this article. These are simply books and essays that I’m excited to share with you. There are no referrals or a cent going in my pocket from the authors or publishers mentioned. Reading is a…

c++运行不出结果_fastjson 不出网利用总结

点击蓝字 关注我们 声明 本文作者:flashine 本文字数:2382 阅读时长:20分钟 附件/链接:点击查看原文下载 声明:请勿用作违法用途,否则后果自负 本文属于WgpSec原创奖励计划,未经许可禁止转载 前言 之前做项目在内网测到了一个fastjson反序列化漏洞,使用dnslo…

FocusBI:租房分析可视化(PowerBI网址体验)

微信公众号:FocusBI关注可了解更多的商业智能、数据仓库、数据库开发、爬虫知识及沪深股市数据推送。问题或建议,请关注公众号发送消息留言;如果你觉得FocusBI对你有帮助,欢迎转发朋友圈或在文章末尾点赞[1] 《商业智能教程》pdf下载地址 …

米其林餐厅 盐之花_在世界范围内探索《米其林指南》

米其林餐厅 盐之花Among the culinary world, there are few greater accolades for a restaurant than being awarded a Michelin star (or three!), or being listed as one of the best in the world by a reputable guide. Foodies and fine dine lovers like myself, see …

差值平方和匹配_纯前端实现图片的模板匹配

基础介绍模板匹配是指在当前图像A里寻找与图像B最相似的部分,本文中将图像A称为模板图像,将图像B称为搜索匹配图像。引言:一般在Opencv里实现此种功能非常方便:直接调用result cv2.matchTemplate(templ, search, method)templ 为…

蓝牙耳机音量大解决办法_长时间使用蓝牙耳机的危害这么大?我们到底该选什么蓝牙耳机呢?...

蓝牙耳机避免了耳机线缠结,使人活动更自由,给人们带来了更加方便、舒适的听觉体验。但近日,英国《每日邮报》刊文表示,蓝牙耳机可能会危害人体健康。美国加州大学伯克利分校公共健康教授乔尔莫斯科维茨博士表示,已有研…

spotify 数据分析_我的Spotify流历史分析

spotify 数据分析Spotisis /spo-ti-sis/ noun The analysis of one’s Spotify streaming history using Python.Spotisis / spo-ti-sis / 名词使用Python分析一个人的Spotify流历史。 I was reading through a lot of data science related guides and project ideas when I …

intellig idea中jsp或html数据没有自动保存和更换字体

主题一:保存数据jsp intellig idea是自动保存数据的,看到没有保存 解决方案: 成功解决 主题二:更换字体: 或者快捷键CtelAlts 成功解决 转载于:https://www.cnblogs.com/weibanggang/p/9398498.html

java 环境变量

1.确保安装jrd jdk 2.环境变量配置 (1)新建->变量名"JAVA_HOME",变量值"C:\Java\jdk1.8.0_05"(JDK的安装路径) (2)编辑->变量名"Path",在原变量值的最后面加上“;%JAVA_HOME%\bin;%JAVA_HOME…

陆涛喜欢夏琳吗_夏琳·香布利斯(Charlene Chambliss):从心理学到自然语言处理和应用研究

陆涛喜欢夏琳吗技术系列中的女性 (WOMEN IN TECHNOLOGY SERIES) Interest in data science has been exponentially increasing over the past decade, and more and more people are working towards making a career switch into the field. In 2020, articles and YouTube v…

【angularJS】简介

简介 AngularJS 是一个 JavaScript 框架。它可通过 <script> 标签添加到 HTML 页面。 AngularJS 通过 指令 扩展了 HTML&#xff0c;且通过 表达式 绑定数据到 HTML。 AngularJS 是一个 JavaScript 框架。它是一个以 JavaScript 编写的库。 AngularJS 是以一个 JavaScrip…

纹个鸡儿天才小熊猫_给熊猫用户的5个提示

纹个鸡儿天才小熊猫A popular Python library used by those working with data is pandas, an easy and flexible data manipulation and analysis library. There are a myriad of awesome methods and functions in pandas, some of which are probably less well-known tha…

用户与用户组管理

linux最优秀的地方之一&#xff0c;就在于他的多用用户、多任务环境。 用户及用户组的概念 1、文件所有者 由于linux是一个多用户、多任务的系统。因此可能常常会有很多人同时使用这台主机来进行工作的情况发生&#xff0c;为了考虑每个人的隐私权以及每个人的喜好的工作环境&a…

代码 抠图_3 行 Python 代码 5 秒抠图的 AI 神器,根本无需 PS,附教程

曾几何时&#xff0c;「抠图」是一个难度系数想当高的活儿&#xff0c;但今天要介绍的这款神工具&#xff0c;只要 3 行代码 5 秒钟就可以完成高精度抠图&#xff0c;甚至都不用会代码&#xff0c;点两下鼠标就完成了。感受下这款抠图工具抠地有多精细&#xff1a;是不是很赞&a…

python函数使用易错举例

关于嵌套&#xff1a; 嵌套使用中&#xff0c; retrun inner ---> 返回的是函数的地址 retrun inner() &#xff1a; ---> 运行inner()函数 ---> 运行inner()函数后的返回值a&#xff08;假设&#xff09;返回上级 --> retrun inner()得到返回值a 如…

图像离群值_什么是离群值?

图像离群值你是&#xff01; (You are!) Actually not. This is not a text about you.其实并不是。 这不是关于您的文字。 But, as Gladwell puts it in Outliers, if you find yourself being that type of outlier, you’re quite lucky. And rare.但是&#xff0c;正如Gla…

混合模型和EM---混合高斯

2019独角兽企业重金招聘Python工程师标准>>> 混合高斯 最大似然 用于高斯混合模型的EM 转载于:https://my.oschina.net/liyangke/blog/2986520

Python学习---django知识补充之CBV

Django知识补充之CBV Django: url --> def函数 FBV[function based view] 用函数和URL进行匹配 url --> 类 CBV[function based view] 用类和URL进行匹配 POSTMAN插件 http://blog.csdn.net/zzy1078689276/article/details/77528249 基于CBV的登…

蓝图解锁怎么用_[UE4蓝图][Materials]虚幻4中可互动的雪地材质完整实现(一)

不说废话&#xff0c;先上个演示图最终成果&#xff08;脚印&#xff0c;雪地可慢慢恢复&#xff0c;地形可控制&#xff09;主要原理&#xff08;白话文&#xff09;&#xff1a;假如你头上是块白色并且可以透视的平地&#xff0c;来了个非洲兄弟踩上面&#xff0c;你拿起单反…