面向Tableau开发人员的Python简要介绍(第2部分)

用PYTHON探索数据 (EXPLORING DATA WITH PYTHON)

And we’re back! Let’s pick up where we left off in the first article of this series and use the visual we built there as a starting point.

我们回来了! 让我们从在本系列的第一篇文章中停下来的地方开始,并以在此建立的视觉效果作为起点。

Before we dive in, let’s set the stage for what we’ll accomplish here:

在开始学习之前,让我们为在此完成的工作做好准备:

  1. Build upon what we know about creating calculated fields in Tableau and show how that translates to Python

    在我们了解的有关在Tableau中创建计算字段的知识的基础上,并展示如何将其转换为Python
  2. Demonstrate how we can use color to add depth to the insights our visuals provide

    演示我们如何使用颜色来增加视觉效果所提供的见解的深度

In Tableau, you can often get your brain all twisted around a tricky situation that requires you to produce calculations various levels of aggregation. Some of you might have Googled your way through this to the point that if you type ‘level’ into the search bar, it autocompletes to ‘level of detail tableau’ (I feel you).

在Tableau中,通常会使您的大脑陷入棘手的困境,而这种棘手的情况要求您进行各种聚合级别的计算。 可能有些人用Google搜索了这一点,以至于如果您在搜索栏中输入“ level”,它会自动完成为“ level of detail tableau”(我觉得是)。

While it will take time to learn all the nuances of Python, one of the powerful aspects of using any programming language is the freedom you have to creatively solve problems. You’ll find that in Tableau there may have been a handful of different ways to get to the correct results, while in Python (or any programming language) the options are nearly limitless.

虽然要花一些时间来学习Python的所有细微差别,但是使用任何编程语言的强大功能之一就是您可以创造性地解决问题。 您会发现,在Tableau中可能有几种获得正确结果的不同方法,而在Python(或任何编程语言)中,选项几乎是无限的。

Something I think Tableau did a great job with is the marks card. That’s the area where you can drag fields to modify aspects of your visual such as size, color, labels, and details provided in your tooltip. Let’s explore how we can take something that was a click away with the marks card in Tableau, and recreate its output in Python.

我认为Tableau在标记卡方面做得很好。 在该区域中,您可以拖动字段以修改视觉效果的各个方面,例如大小,颜色,标签和工具提示中提供的详细信息。 让我们探讨一下如何在Tableau中使用标记卡单击即可完成操作,然后在Python中重新创建其输出。

In today’s exercise, we will use our ‘Profit’ and ‘Sales’ values to create a new column (similar to a calculated field) and name it ‘Profit Ratio’. We will then apply that new column to enhance the visual we created in the first article, such that the profit ratio dictates the color in the visual.

在今天的练习中,我们将使用“利润”和“销售”值创建一个新列(类似于计算字段)并将其命名为“利润率”。 然后,我们将应用该新列来增强我们在第一篇文章中创建的视觉效果,从而使利润率决定视觉效果中的颜色。

First things first, let’s build a baseline in Tableau to compare with.

首先,让我们在Tableau中建立基线进行比较。

步骤1:通过首先在Tableau中建立目标来设定目标 (Step 1: setting a goal by building it first in Tableau)

Let’s revisit where we left off last week:

让我们回顾上周我们停下来的地方:

Image for post
As someone who used to teach Tableau trainings, this brings back memories.
作为曾经教过Tableau培训的人,这带回了回忆。

Okay, so aside from removing gridlines (AKA non-data ink), what could make it better?

好吧,除了删除网格线(又称为非数据墨水)之外,还有什么可以使它更好呢?

We could try coloring by profitability… let’s give that a try by dragging ‘Profit’ to ‘Color’ on the marks card:

我们可以尝试通过获利能力进行着色……让我们通过在标记卡上将“利润”拖到“颜色”来进行尝试:

Image for post
Sales by Sub-Category, colored by Profit
子类别的销售额,按利润划分

Adding color gradients to visuals can be valuable. But let’s keep in mind that right now we are coloring by overall profit. If you ignore ‘Tables’, which is screaming for attention, then the color in this visual conveys an obvious message: sub-categories with higher volumes of sales also have higher volumes of profit.

为视觉效果添加颜色渐变可能很有价值。 但是请记住,现在我们正在通过整体利润进行着色。 如果您忽略了引起人们注意的“表格”,则该视觉效果中的颜色传达出明显的信息:销售量较高的子类别也具有较高的利润量。

But what if we’re more interested in knowing which sub-category is relatively more profitable? Analyzing profitability by volume alone doesn’t provide the whole story, as you could sell one million items at 0.1% profit ratio and make more profits than selling one thousand items at 50% profit ratio. Perhaps our business could shift priorities towards products that are more profitable per unit sold if only we knew what those products were.

但是,如果我们更想知道哪个子类别相对更有利可图,该怎么办? 仅按数量分析获利能力并不能提供全部信息,因为您可以以0.1%的利润率出售一百万件商品,并且比以50%的利润率出售一千件商品赚更多的利润。 如果只有我们知道这些产品是什么,也许我们的业务可以将重点转移到每单位销售利润更高的产品上。

So let’s see if looking at this same data through a different lens gives us any insights.

因此,让我们看看通过不同的眼光看待相同的数据是否能给我们带来任何见解。

In Tableau, the calculation for profit ratio looks like this:

在Tableau中,利润率的计算如下所示:

SUM([Profit]) / SUM([Sales])

The calculation above takes the total profit and divides that by the total sales to provide the profit ratio.

上面的计算采用总利润,然后将其除以总销售额即可得出利润率。

Here’s what our visual looks like when we color by the profit ratio instead of the raw profits:

这是当我们按利润率而不是原始利润进行着色时,视觉效果的样子:

Image for post
Sales by Sub-Category, colored by Profit Ratio
子类别的销售额,按利润率分类

The differences here are subtle, but we can immediately see that the coloration is no longer following the previous rule of ‘darker at the top, lighter at the bottom’.

这里的差异是细微的,但是我们可以立即看到,颜色不再遵循以前的规则:“顶部越暗,底部越浅”。

Paper and Labels, which were bottom feeders in our previous visual, seem to be bringing in the highest profits per sale.

纸张和标签是我们以前的视觉效果中的最底层,似乎每次销售带来最高利润。

When we color by profit ratio, we obtain a more relative picture that allows us to see that per unit of sales, paper and labels are doing well. It seems that while these products bring in lower volumes of sales, the revenue they bring in is highly profitable. Without this insight, perhaps we would neglect this profitable niche of our business.

当我们按利润率进行着色时,我们获得了更相关的图像,可以使我们看到每销售单位 ,纸张和标签都表现良好。 这些产品虽然带来较低的销售量,但带来的收益却是高利润的。 没有这种洞察力,也许我们会忽略我们业务的这一有利可图的利基市场。

步骤2:在Python中计算利润率 (Step 2: calculating the profit ratio in Python)

No need to make this more complicated than necessary. First, let’s revisit where we left off in the last article in terms of our Python code.

无需使此操作变得不必要的复杂。 首先,让我们重新回顾上一篇文章中关于Python代码的内容。

We had a Pandas DataFrame named ‘subcat_sales_df’, which had columns ‘Sub-Category’, ‘Sales’, and ‘Profit’.

我们有一个名为'subcat_sales_df'的Pandas DataFrame,其中有'Sub-Category','Sales'和'Profit'列。

Here’s what it looked like:

看起来是这样的:

Image for post

Given this DataFrame, here’s what it takes to create a new ‘Profit Ratio’ column:

给定此DataFrame,以下是创建新的“利润率”列的步骤:

subcat_sales_df['Profit Ratio'] = subcat_sales_df['Profit'] / subcat_sales_df['Sales']

Translating that into plain English, we are defining our ‘Profit Ratio’ column to be our ‘Profit’ divided by our ‘Sales’. Nothing too crazy, right?

将其翻译成简单的英语,我们将“利润率”列定义为“利润”除以“销售”。 没什么太疯狂的吧?

Keep in mind that our ‘subcat_sales_df’ DataFrame was pre-processed from our raw data in the previous article. This loops back to what we said earlier regarding freedom and flexibility in Python: you control your own destiny. Want to reshape your data and store that in a different variable like we did here? You’re completely free to do so.

请记住,我们的“ subcat_sales_df” DataFrame是根据上一篇文章中的原始数据进行预处理的。 这回溯到我们之前所说的关于Python的自由和灵活性的观点:您可以控制自己的命运。 是否想要像我们在此处那样重塑数据并将其存储在其他变量中? 您完全可以这样做。

Here’s what our ‘Profit Ratio’ looks like:

这是我们的“利润率”:

Image for post

步骤3:在Python中使用颜色渐变进行绘图 (Step 3: plotting with a color gradient in Python)

In our previous article, we created a bar graph showing the sales volume per sub-category.

在上一篇文章中,我们创建了一个条形图,显示每个子类别的销量。

Let’s see how we can make this happen. Here’s the code I used to add ‘Profit Ratio’ as a color gradient for the chart:

让我们看看如何实现这一目标。 这是我用来为图表添加“利润率”作为颜色渐变的代码:

from matplotlib import cmfig, ax = plt.subplots(figsize=(10, 6))ax = sns.barplot(
data=subcat_sales_df,
x='Sales',
y='Sub-Category',
palette=cm.RdBu(subcat_sales_df['Profit Ratio'] * 7.5)
)ax.tick_params(axis='both', which='both', length=0)
ax.set_xlabel("Sales")
sns.despine(left=True, bottom=True)

Don’t worry about all the ‘ax’ and ‘fig’ nonsense here, this is mostly related to formatting (axis labels, etc). Feel free to play around with the scalar (7.5) seen here; adjusting it will change the nature of the color gradient.

在这里不必担心所有的“ ax”和“ fig”废话,这主要与格式化(轴标签等)有关。 随意使用此处看到的标量(7.5); 调整它会改变颜色渐变的性质。

The key here is the ‘palette’ being set to a matplotlib Colormap value. The ‘RdBu’ indicates that this color mapping runs from red to blue. The values of the color mapping are decided by the values we pass in, which in this case is the ‘Profit Ratio’ data.

这里的关键是将“调色板”设置为matplotlib Colormap值。 “ RdBu”表示此颜色映射从红色到蓝色。 颜色映射的值由我们传入的值确定,在这种情况下为“利润率”数据。

Here’s what it looks like:

看起来是这样的:

Image for post

步骤4:调和差异 (Step 4: reconciling the differences)

You probably noticed that the coloration from Tableau and what we built here isn’t an exact match. That’s fine! In Tableau, many things are decided for you and that’s why it’s such a great click-and-drag tool. In Python, you control the details.

您可能已经注意到,Tableau的颜色与我们在此处构建的颜色并不完全匹配。 没关系! 在Tableau中,为您决定了很多事情,这就是为什么它是一种如此出色的单击和拖动工具的原因。 在Python中,您可以控制细节。

In the next article, we’ll take an even closer look at how those details come together with more advanced calculations! Down the road, we’ll explore joining in data from other sources.

在下一篇文章中,我们将进一步研究这些细节如何与更高级的计算结合在一起! 在未来的过程中,我们将探索如何结合其他来源的数据。

See you there!

到时候那里见!

翻译自: https://towardsdatascience.com/a-gentle-introduction-to-python-for-tableau-developers-part-2-c18095a03e0e

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389139.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

设计组合中的10个严重错误可能会导致您丧命

As an agency co-founder and design lead, I’ve been participating in many recruitment processes. I’ve seen hundreds of portfolios and CVs of aspiring designers. If you’re applying for a UI designer position, it is good to have some things in mind and to …

MySQL命令学习

上面两篇博客讲了MySQL的安装、登录,密码重置,为接下来的MySQL命令学习做好了准备,现在开启MySQL命令学习之旅吧。 首先打开CMD,输入命令:mysql -u root -p 登录MySQL。 注意:MySQL命令终止符为分号 (;) …

实验心得_大肠杆菌原核表达实验心得(上篇)

大肠杆菌原核表达实验心得(上篇)对于大肠杆菌蛋白表达,大部分小伙伴都觉得 so easy! 做大肠杆菌蛋白表达十几年经历的老司机还经常阴沟翻船,被大肠杆菌表达蛋白虐千百遍的惨痛经历,很多小伙伴都有切肤之痛。福因德接下…

自定义版本更新弹窗

目录介绍 1.Animation和Animator区别 2.Animation运行原理和源码分析 2.1 基本属性介绍2.2 如何计算动画数据2.3 什么是动画更新函数2.4 动画数据如何存储2.5 Animation的调用 3.Animator运行原理和源码分析 3.1 属性动画的基本属性3.2 属性动画新的概念3.3 PropertyValuesHold…

《SQL Server 2008从入门到精通》--20180716

1.锁 当多个用户同时对同一个数据进行修改时会产生并发问题,使用事务就可以解决这个问题。但是为了防止其他用户修改另一个还没完成的事务中的数据,就需要在事务中用到锁。 SQL Server 2008提供了多种锁模式:排他锁,共享锁&#x…

googleearthpro打开没有地球_嫦娥五号成功着陆地球!为何嫦娥五号返回时会燃烧,升空却不会?...

目前,嫦娥五号已经带着月壤成功降落到地球上,创造了中国航天的又一里程碑。嫦娥五号这一路走来,困难重重,但都被我国航天科技人员逐一克服,最终圆满地完成了嫦娥五号的月球采样返回地球任务。嫦娥五号最后这一步走得可…

C#中实现对象的深拷贝

深度拷贝指的是将一个引用类型&#xff08;包含该类型里的引用类型&#xff09;拷贝一份(在内存中完完全全是两个对象&#xff0c;没有任何引用关系)..........  直接上代码&#xff1a; 1 /// <summary>2 /// 对象的深度拷贝&#xff08;序列化的方式&#xf…

Okhttp 源码解析

HTTP及okhttp的优势 http结构 请求头 列表内容表明本次请求的客户端本次请求的cookie本次请求希望返回的数据类型本次请求是否采用数据压缩等等一系列设置 请求体 指定本次请求所使用的方法请求所使用的方法 响应头 - 服务器标识 - 状态码 - 内容编码 - cookie 返回给客…

python中定义数据结构_Python中的数据结构。

python中定义数据结构I remembered the day when I made up my mind to learn python then the very first things I learned about data types and data structures. So in this article, I would like to discuss different data structures in python.我记得当初下定决心学习…

builder 模式

首先提出几个问题&#xff1a; 什么是Builder模式&#xff1f;为什么要使用Builder模式&#xff1f;它的优点是什么&#xff0c;那缺点呢&#xff1f;什么情况下使用Builder模式&#xff1f; 关于Builder模式在代码中用的很多&#xff0c;比如AlertDialog, OkHttpClient等。一…

venn diagram_Venn Diagram Python软件包:Vennfig

venn diagram目录 (Table of Contents) Introduction 介绍 Installation 安装 Default Functions 默认功能 Parameters 参量 Examples 例子 Conclusion 结论 介绍 (Introduction) In the last article, I showed how to draw basic Venn diagrams using matplotlib_venn.在上一…

创梦天地通过聆讯:上半年经营利润1.3亿 腾讯持股超20%

雷帝网 雷建平 11月23日报道时隔半年后&#xff0c;乐逗游戏母公司创梦天地终于通过上市聆讯&#xff0c;这意味着创梦天地很快将在港交所上市。创梦天地联合保荐人包括瑞信、招商证券国际、中金公司。当前&#xff0c;创梦天地运营的游戏包括《梦幻花园》、《快乐点点消》、《…

PyCharm之python书写规范--消去提示波浪线

强迫症患者面对PyCharm的波浪线是很难受的&#xff0c;针对如下代码去除PyCharm中的波浪线&#xff1a; # _*_coding:utf-8_*_ # /usr/bin/env python3 A_user "lin" A_password "lin123"for i in range(3): # 循环次数为3name input("请输入你的…

plotly django_使用Plotly为Django HTML页面进行漂亮的可视化

plotly djangoHello everyone! Recently I had to do some visualizations for my university project, I’ve done some googling and haven’t found any simple guides on how to put Plotly plots on an HTML page.大家好&#xff01; 最近&#xff0c;我不得不为我的大学项…

handler 消息处理机制

关于handler消息处理机制&#xff0c;只要一提到&#xff0c;相信作为一个android工程师&#xff0c;脑海就会有这么一个流程 大家都滚瓜烂熟了&#xff0c;但别人问到几个问题&#xff0c;很多人还是栽到这个“烂”上面&#xff0c;比如&#xff1a; 一个线程是如何对应一个L…

软件工程方法学要素含义_日期时间数据的要素工程

软件工程方法学要素含义According to Wikipedia, feature engineering refers to the process of using domain knowledge to extract features from raw data via data mining techniques. These features can then be used to improve the performance of machine learning a…

vue图片压缩不失真_图片压缩会失真?快试试这几个无损压缩神器。

前端通常在做网页的时候 会出现图片加载慢的情况 在这里我通常会将图片进行压缩 但是通常情况下 观众会认为 图片压缩会出现失真的现象 在这里我会向大家推荐几款图片压缩的工具 基本上会实现无损压缩1.TinyPng地址&#xff1a;https://tinypng.comEnglish&#xff1f;不要慌&a…

remoteing2

此示例主要演示了net remoting,其中包含一个服务器程序Server.exe和一个客户端程序CAOClient.exe。客户端程序会通过http channel调用服务器端RemoteType.dll的对象和方法。服务器端的代码文件由下图所述&#xff1a;Server.cs源代码 :using System;using System.Runtime.Remot…

更换mysql_Docker搭建MySQL主从复制

Docker搭建MySQL主从复制 主从服务器上分别安装Docker 1.1 Docker 要求 CentOS 系统的内核版本高于 3.10 [rootlocalhost ~]# uname -r 3.10.0-693.el7.x86_641.2 确保 yum 包更新到最新。 [rootlocalhost ~]# sudo yum update Loaded plugins: fastestmirror, langpacks Loadi…

理解ConstraintLayout 对性能的好处

自从在17年GoogleI/O大会宣布了Constraintlayout,我们持续提升了布局的稳定性和布局编辑的支持。我们还为ConstraintLayout添加了一些新特性支持创建不同类型的布局&#xff0c;添加这些新特性&#xff0c;可以明显的提升性能&#xff0c;在这里&#xff0c;我门将讨论Contrain…