Seaborn is a data visualization library built on top of matplotlib and closely integrated with pandas data structures in Python. Visualization is the central part of Seaborn which helps in exploration and understanding of data.
Seaborn是建立在matplotlib之上的数据可视化库,并与Python中的pandas数据结构紧密集成。 可视化是Seaborn的核心部分,有助于探索和理解数据。
One has to be familiar with Numpy and Matplotlib and Pandas to learn about Seaborn.
必须熟悉Numpy和 Matplotlib和Pandas了解Seaborn。
Seaborn offers the following functionalities:
- Dataset oriented API to determine the relationship between variables. 面向数据集的API确定变量之间的关系。
- Automatic estimation and plotting of linear regression plots. 自动估计和绘制线性回归图。
- It supports high-level abstractions for multi-plot grids. 它支持多图网格的高级抽象。
- Visualizing univariate and bivariate distribution. 可视化单变量和双变量分布。
These are only some of the functionalities offered by Seaborn, there are many more of them, and we can explore all of them here.
To initialize the Seaborn library, the command used is:
import seaborn as sns
Using Seaborn we can plot wide varieties of plots like:
- Distribution Plots 分布图
- Pie Chart & Bar Chart 饼图和条形图
- Scatter Plots 散点图
- Pair Plots 对图
- Heat maps 热图
For this entirety of the article, we are using the dataset of Google Playstore downloaded from Kaggle.
在本文的全文中,我们使用从Kaggle下载的Google Playstore数据集。
1.分布图 (1. Distribution Plots)
We can compare the distribution plot in Seaborn to histograms in Matplotlib. They both offer pretty similar functionalities. Instead of frequency plots in the histogram, here we’ll plot an approximate probability density across the y-axis.
我们可以将Seaborn中的分布图与Matplotlib中的直方图进行比较。 它们都提供了非常相似的功能。 代替直方图中的频率图,这里我们将在y轴上绘制近似的概率密度。
We will be using sns.distplot() in the code to plot distribution graphs.
Before going further, first, let’s access our dataset,
The dataset looks like this,

Now, let’s see how distribution plot looks like if we plot for ‘Rating’ column from the above dataset,
The Distribution Plot looks like this for Rating’s column,

Here, the curve(KDE) that appears drawn over the distribution graph is the approximate probability density curve.
在此,分布图上绘制的曲线( KDE )是近似概率密度曲线。
Similar to the histograms in the matplotlib, in distribution too, we can change the number of bins and make the graph more understandable.
We just have to add the number of bins in the code,
#Change the number of bins
sns.distplot(inp1.Rating, bins=20, kde = False)
Now, the graph looks like this,

In the above graph, there is no probability density curve. To remove the curve, we just have to write ‘kde = False’ in the code.
上图中没有概率密度曲线。 要删除曲线,我们只需要在代码中编写“ kde = False”即可 。
We can also provide the title and color of the bins similar to matplotlib to the distribution plots. Let’s see the code for that,
我们还可以向分布图提供类似于matplotlib的垃圾箱的标题和颜色。 让我们看一下代码
The distribution graph, for the same column rating, looks like this:

Styling the Seaborn graphs
One of the biggest advantages of using Seaborn is, it offers a wide range of default styling options to our graphs.
These are the default styles offered by Seaborn.
We just have to write one line of code to incorporate these styles into our graph.
After applying the dark background to our graph, the distribution plot looks like this,

2.饼图和条形图 (2. Pie Chart & Bar Chart)
Pie Chart is generally used to analyze the data on how a numeric variable changes across different categories.
In the dataset we are using, we’ll analyze how the top 4 categories in the Content Rating column is performing.
First, we’ll do some data cleaning/mining to the Content rating column and check what are the categories in there.
Now, the categories list will be,

As per the above output, since the count of “Adults only 18+” and “Unrated” are significantly less compared to the others, we’ll drop those categories from the Content Rating and update the dataset.
The categories present in the “Content Rating” column after updating the sheet are,

Now, let’s plot Pie Chart for the categories present in the Content Rating column.
The Pie Chart for the above code looks like the following,

From the above Pie diagram, we cannot correctly infer whether “Everyone 10+” and “Mature 17+”. It is very difficult to assess the difference between those two categories when their values are somewhat similar to each other.
从上面的饼图中,我们无法正确推断“所有人10+”和“成熟17+”。 当它们的值彼此相似时,很难评估这两个类别之间的差异。
We can overcome this situation by plotting the above data in Bar chart.
Now, the bar Chart looks like the following,

Similar to Pie Chart, we can customize our Bar Graph too, with different Colors of Bars, the title of the chart, etc.
3.散点图 (3. Scatter Plots)
Up until now, we have been dealing with only a single numeric column from the dataset, like Rating, Reviews or Size, etc. But, what if we have to infer a relationship between two numeric columns, say “Rating and Size” or “Rating and Reviews”.
Scatter Plot is used when we want to plot the relationship between any two numeric columns from a dataset. These plots are the most powerful visualization tools that are being used in the field of machine learning.
当我们要绘制数据集中任意两个数字列之间的关系时,使用散点图。 这些图是机器学习领域中使用的最强大的可视化工具。
Let’s see how the scatter plot looks like for two numeric columns in the dataset “Rating” & “Size”. First, we’ll plot the graph using matplotlib after that we’ll see how it looks like in seaborn.
让我们来看一下数据集“ Rating”和“ Size”中两个数字列的散点图。 首先,我们将使用matplotlib绘制图形,之后我们将看到它在seaborn中的外观。
Scatter Plot using matplotlib
#import all the necessary libraries
#Plotting the scatter plotplt.scatter(pstore.Size, pstore.Rating)
Now, the plot looks like this

Scatter Plot using Seaborn
We will be using sns.joinplot() in the code for scatter plot along with the histogram.
sns.scatterplot() in the code for only scatter plots.
The Scatter plot for the above code looks like,

The main advantage of using a scatter plot in seaborn is, we’ll get both the scatter plot and the histograms in the graph.
If we want to see only the scatter plot instead of “jointplot” in the code, just change it with “scatterplot”
如果我们希望看到只有散点图,而不是在代码“jointplot”,只是“ 散点 ”更改
Regression Plot
Regression plots create a regression line between 2 numerical parameters in the jointplot(scatterplot) and help to visualize their linear relationships.
The graph looks like the following,

From the above graph, we can infer that there is a steady increase in the Rating if the Price of the apps increases.
4.配对图 (4. Pair Plots)
Pair Plots are used when we want to see the relationship pattern among more than 3 different numeric variables. For example, let’s say we want to see how a company’s sales are affected by three different factors, in that case, pair plots will be very helpful.
当我们想查看三个以上不同数值变量之间的关系模式时,使用对图。 例如,假设我们想了解公司的销售受到三个不同因素的影响,在这种情况下,配对图将非常有用。
Let’s create a pair plot for Reviews, Size, Price, and Rating columns from of dataset.
We will be using sns.pairplot() in the code to plot multiple scatter plots at a time.
The output graph for the above graphs looks like this,

For the non-diagonal views, the graph will be a scatter plot between 2 numeric variables
For the diagonal views, it plots a histogram since both the axis(x,y) is the same.
对于对角线视图,由于两个轴(x,y)相同,因此它绘制了直方图 。
5.热图 (5. Heatmaps)
The heatmap represents the data in a 2-dimensional form. The ultimate goal of the heatmap is to show the summary of information in a colored graph. It utilizes the concept of using colors and color intensities to visualize a range of values.
热图以二维形式表示数据。 热图的最终目标是在彩色图表中显示信息摘要。 它利用使用颜色和颜色强度的概念来可视化一系列值。
Most of us would have seen the following type of graphics in a football match,

Heatmaps in Seaborn create exactly these types of graphs.
We’ll be using sns.heatmap() to plot the visualization.
When you have data as the following we can create a heatmap.

The above table is created using the Pivot table from Pandas. You can see how Pivot tables are created in my previous article Pandas.
上表是使用Pandas的数据透视表创建的。 您可以在上一篇文章Pandas中看到如何创建数据透视表。
Now, let’s see how we can create a heatmap for the above table.
In the above code, we have saved the data in the new variable “heat.”
在上面的代码中,我们已将数据保存在新变量“ heat”中。
The heatmap looks like the following,

We can apply some customization to the above graph, and also can change the color gradient so that the highest value will be darker in color and the lowest value will be lighter.
The updated code will be something like this,
The heatmap for the above-updated code looks like this,

If we observe, in the code we have given “annot = True”, what this means is, when annot is true, each cell in the graph displays its value. If we haven’t mention annot in our code, then the default value it takes is False.
如果我们观察到,在代码中给定了“ annot = True ”,这意味着,当annot为true时 ,图中的每个单元格都会显示其值。 如果我们在代码中未提及annot ,则其默认值为False。
Seaborn also supports some of the other types of graphs like Line Plots, Bar Graphs, Stacked bar charts, etc. But, they don’t offer anything different from the ones created through matplotlib.
结论 (Conclusion)
So, this is how Seaborn works in Python and the different types of graphs we can create using seaborn. As I have already mentioned, Seaborn is built on top of the matplotlib library. So, if we are already familiar with the Matplotlib and its functions, we can easily build Seaborn graphs and can explore more depth concepts.
因此,这就是Seaborn在Python中的工作方式以及我们可以使用seaborn创建的不同类型的图。 正如我已经提到的,Seaborn建立在matplotlib库的顶部。 因此,如果我们已经熟悉Matplotlib及其功能,则可以轻松构建Seaborn图并可以探索更多深度概念。
