

For Visualization in Python, Matplotlib library has been the workhorse for quite some time now. It has held its own even after more nimble rivals with easier code interface and capabilities like seaborn, plotly, bokeh etc. have arrived on the scene. Though Matplotlib may lack the interactive capabilities of the new kids on the block, it does a more than adequate job of visualizing our data exploration tasks in Exploratory Data Analysis(EDA).

对于Python中的可视化而言,Matplotlib库已经成为相当长一段时间的主力军。 即使具有更灵活的代码界面的更灵活的竞争对手以及诸如seaborn,plot,bokeh等功能已经出现在现场,它也保持了自己的地位。 尽管Matplotlib可能缺少新手的互动功能,但它在可视化探索性数据分析(EDA)中可视化我们的数据探索任务所做的工作远远不够。

During EDA, one may come across situations where we want to display a group of related plots as part of a larger picture to drive home our insight. The subplot function of matplotlib does the job for us. However, in certain situations, we may want to combine several subplots and want to have different aspect ratios for each subplot. How can we achieve this layout, where, essentially some subplots span across several rows/columns of the overall figure?

在EDA期间,我们可能会遇到一些情况,在这些情况下,我们希望将一组相关的图显示为大图的一部分,以推动我们的洞察力。 matplotlib的subplot功能为我们完成了工作。 但是,在某些情况下,我们可能希望合并多个子图,并希望每个子图具有不同的纵横比。 我们如何才能实现这种布局,在此布局中,实际上有些子图跨越了整个图形的几行/几列?

Enter gridspec submodule of Matplotlib.


Image for post

We first need to create an instance of GridSpec which allows us to specify the total number of rows and columns as arguments in the overall figure along with a figure object.


Image for post

We store the GridSpec instance in a variable called gs and specify that we want to have 4 rows and 4 columns in the overall figure.


Now, we need to specify the details of how each subplot will span the rows and columns in the overall figure. It is useful to make a rough sketch on paper as to how you want the subplots to be laid out, so that they don't overlap. Once done, we convey this information through the GridSpec object we created. The row/column span info is passed in the same index notation we use for subsetting arrays/dataframes with rows and column index numbers starting from zero and using the : to specify range. The GridSpec object with the index is passed to the add_subplot function of the figure object.

现在,我们需要指定每个子图将如何跨越整个图中的行和列的详细信息。 在纸上粗略地绘制草图,以了解子图的布局方式是有用的,这样子图就不会重叠。 完成后,我们将通过创建的GridSpec对象传达此信息。 行/列跨度信息以相同的索引符号传递,该索引符号用于设置数组和数据框,其中行和列的索引号从零开始,并使用:指定范围。 所述GridSpec与索引对象被传递到add_subplot所述的功能figure的对象。

Image for post

We add an overall title for the figure and remove the ticks to visualize the layout better as the objective here is to demonstrate how we can achieve subplots spanning multiple rows /columns. When you implement this, obviously you will want to add your axis ticks, labels etc. from your dataframe and tweak the spacing and figure size to accommodate these plot elements.

我们为图形添加一个整体标题,并删除刻度线以更好地显示布局,因为此处的目的是演示如何实现跨越多行/列的子图。 当您实现此功能时,显然您会希望从数据框中添加轴刻度,标签等,并调整间距和图形大小以容纳这些绘图元素。

Image for post

Boom! This may come in handy in multi-variable time series plots where we may want to show the time series plot stretching across the columns in the top row and other uni-variate, multi-variate visualization in the other subplots below. You can customize how your jigsaw looks like by specifying your row/columns in the overall figure and spans of your individual subplots.

繁荣! 这在多变量时间序列图中可能会派上用场,在这里我们可能想要显示跨越顶部行中各列的时间序列图,并在下面的其他子图中显示其他单变量,多变量可视化。 您可以通过在整体图形中指定行/列以及各个子图的跨度来自定义拼图的外观。

In R, achieving the above is ridiculously easy with the patchwork package in a single line of code with nothing more than + and / operators and ( ) to even have nested subplots if you want to go bonkers. Click on the below link to see how you can get this done in R.

在R语言中,使用单行代码中的patchwork程序包就可以轻松地实现上述目标,而只需要+/运算符和( )甚至嵌套嵌套的子图就可以了。 单击下面的链接,查看如何在R中完成此操作。

Thanks for reading. If you liked this article, you may also like the one below on how to do EDA with minimal lines of code with maximum output.

谢谢阅读。 如果您喜欢这篇文章,那么您可能也喜欢以下关于如何用最少的代码行和最大的输出量进行EDA的文章。

Would love to hear your feedback and comments. Thanks!

很想听听您的反馈和意见。 谢谢!







大数据架构-Lambda Lambda架构由Storm的作者Nathan Marz提出。旨在设计出一个能满足实时大数据系统关键特性的架构,具有高容错、低延时和可扩展等特性。Lambda架构整合离线计算和实时计算,融合不可变性(Immutability)&#xff0c…


Linux 概念 Linux 是一个类Unix操作系统,是 Unix 的一种,它 控制整个系统基本服务的核心程序 (kernel) 是由 Linus 带头开发出来的,「Linux」这个名称便是以 「Linus’s unix」来命名的。 Linux泛指一类操作系统,具体的版本有&a…


python多项式回归Polynomial regression in an improved version of linear regression. If you know linear regression, it will be simple for you. If not, I will explain the formulas here in this article. There are other advanced and more efficient machine learn…


回归分析Machine learning algorithms are not your regular algorithms that we may be used to because they are often described by a combination of some complex statistics and mathematics. Since it is very important to understand the background of any algorith…


数据科学还是计算机科学意见 (Opinion) 目录 (Table of Contents) Introduction 介绍 Examples 例子 When You Should Use Data Science 什么时候应该使用数据科学 Summary 摘要 介绍 (Introduction) Both Data Science and Machine Learning are useful fields that apply sev…

leetcode 523. 连续的子数组和

给你一个整数数组 nums 和一个整数 k ,编写一个函数来判断该数组是否含有同时满足下述条件的连续子数组: 子数组大小 至少为 2 ,且 子数组元素总和为 k 的倍数。 如果存在,返回 true ;否则,返回 false 。 …

Docker学习笔记 - Docker Compose

一、概念 Docker Compose 用于定义运行使用多个容器的应用,可以一条命令启动应用(多个容器)。 使用Docker Compose 的步骤: 定义容器 Dockerfile定义应用的各个服务 docker-compose.yml启动应用 docker-compose up二、安装 Note t…


线性回归算法数学原理内部AI (Inside AI) Linear regression is one of the most popular algorithms used in different fields well before the advent of computers. Today with the powerful computers, we can solve multi-dimensional linear regression which was not p…

Linux 概述

UNIX发展历程 第一个版本是1969年由Ken Thompson(UNIX之父)在AT& T贝尔实验室实现Ken Thompson和Dennis Ritchie(C语言之父)使用C语言对整个系统进行了再加工和编写UNIX的源代码属于SCO公司(AT&T ->Novell …


泰坦尼克:机器从灾难中学习For the first time in 2021, a major Machine Learning conference will have a track devoted to disaster response. The 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2021) has a track on…

github持续集成的设置_如何使用GitHub Actions和Puppeteer建立持续集成管道

github持续集成的设置Lately Ive added continuous integration to my blog using Puppeteer for end to end testing. My main goal was to allow automatic dependency updates using Dependabot. In this guide Ill show you how to create such a pipeline yourself. 最近&…


虚拟控制台 一台计算机的输入输出设备就是一个物理的控制台 ; 如果在一台计算机上用软件的方法实现了多个互不干扰独立工作的控制台界面,就是实现了多个虚拟控制台; Linux终端的工作方式是字符命令行方式,用户通过键盘输入命令进…


Linux文本编辑器 Linux系统下有很多文本编辑器。 按编辑区域: 行编辑器 ed 全屏编辑器 vi 按运行环境: 命令行控制台编辑器 vi X Window图形界面编辑器 gedit ed 它是一个很古老的行编辑器,vi这些编辑器都是ed演化而来。 每次只能对一…


Alpha第十天 听说 031502543 周龙荣(队长) 031502615 李家鹏 031502632 伍晨薇 031502637 张柽 031502639 郑秦 1.前言 任务分配是VV、ZQ、ZC负责前端开发,由JP和LL负责建库和服务器。界面开发的教辅材料是《第一行代码》,利用And…

Streamlit —使用数据应用程序更好地测试模型

介绍 (Introduction) We use all kinds of techniques from creating a very reliable validation set to using k-fold cross-validation or coming up with all sorts of fancy metrics to determine how good our model performs. However, nothing beats looking at the ra…

X Window系统

X Window系统 一种以位图方式显示的软件窗口系统。诞生于1984,比Microsoft Windows要早。是一套独立于内核的软件 Linux上的X Window系统 X Window系统由三个基本元素组成:X Server、X Client和二者通信的通道。 X Server:是控制输出及输入…


lasso回归和岭回归Marketers sometimes have to be creative to offer customers something new without the luxury of that new item being a brand-new product or built-from-scratch service. In fact, incrementally introducing features is familiar to marketers of c…

Linux 设备管理和进程管理

设备管理 Linux系统中设备是用文件来表示的,每种设备都被抽象为设备文件的形式,这样,就给应用程序一个一致的文件界面,方便应用程序和操作系统之间的通信。 设备文件集中放置在/dev目录下,一般有几千个,不…

贝叶斯 定理_贝叶斯定理实际上是一个直观的分数

贝叶斯 定理Bayes’ Theorem is one of the most known to the field of probability, and it is used often as a baseline model in machine learning. It is, however, too often memorized and chanted by people who don’t really know what P(B|E) P(E|B) * P(B) / P(E…


文本数据可视化自然语言处理 (Natural Language Processing) When we are working on any NLP project or competition, we spend most of our time on preprocessing the text such as removing digits, punctuations, stopwords, whitespaces, etc and sometimes visualizati…