用PYTHON探索数据 (EXPLORING DATA WITH PYTHON)
And we’re back! Let’s pick up where we left off in the first article of this series and use the visual we built there as a starting point.
我们回来了! 让我们从在本系列的第一篇文章中停下来的地方开始,并以在此建立的视觉效果作为起点。
Before we dive in, let’s set the stage for what we’ll accomplish here:
在开始学习之前,让我们为在此完成的工作做好准备:
- Build upon what we know about creating calculated fields in Tableau and show how that translates to Python 在我们了解的有关在Tableau中创建计算字段的知识的基础上,并展示如何将其转换为Python
- Demonstrate how we can use color to add depth to the insights our visuals provide 演示我们如何使用颜色来增加视觉效果所提供的见解的深度
In Tableau, you can often get your brain all twisted around a tricky situation that requires you to produce calculations various levels of aggregation. Some of you might have Googled your way through this to the point that if you type ‘level’ into the search bar, it autocompletes to ‘level of detail tableau’ (I feel you).
在Tableau中,通常会使您的大脑陷入棘手的困境,而这种棘手的情况要求您进行各种聚合级别的计算。 可能有些人用Google搜索了这一点,以至于如果您在搜索栏中输入“ level”,它会自动完成为“ level of detail tableau”(我觉得是)。
While it will take time to learn all the nuances of Python, one of the powerful aspects of using any programming language is the freedom you have to creatively solve problems. You’ll find that in Tableau there may have been a handful of different ways to get to the correct results, while in Python (or any programming language) the options are nearly limitless.
虽然要花一些时间来学习Python的所有细微差别,但是使用任何编程语言的强大功能之一就是您可以创造性地解决问题。 您会发现,在Tableau中可能有几种获得正确结果的不同方法,而在Python(或任何编程语言)中,选项几乎是无限的。
Something I think Tableau did a great job with is the marks card. That’s the area where you can drag fields to modify aspects of your visual such as size, color, labels, and details provided in your tooltip. Let’s explore how we can take something that was a click away with the marks card in Tableau, and recreate its output in Python.
我认为Tableau在标记卡方面做得很好。 在该区域中,您可以拖动字段以修改视觉效果的各个方面,例如大小,颜色,标签和工具提示中提供的详细信息。 让我们探讨一下如何在Tableau中使用标记卡单击即可完成操作,然后在Python中重新创建其输出。
In today’s exercise, we will use our ‘Profit’ and ‘Sales’ values to create a new column (similar to a calculated field) and name it ‘Profit Ratio’. We will then apply that new column to enhance the visual we created in the first article, such that the profit ratio dictates the color in the visual.
在今天的练习中,我们将使用“利润”和“销售”值创建一个新列(类似于计算字段)并将其命名为“利润率”。 然后,我们将应用该新列来增强我们在第一篇文章中创建的视觉效果,从而使利润率决定视觉效果中的颜色。
First things first, let’s build a baseline in Tableau to compare with.
首先,让我们在Tableau中建立基线进行比较。
步骤1:通过首先在Tableau中建立目标来设定目标 (Step 1: setting a goal by building it first in Tableau)
Let’s revisit where we left off last week:
让我们回顾上周我们停下来的地方:
Okay, so aside from removing gridlines (AKA non-data ink), what could make it better?
好吧,除了删除网格线(又称为非数据墨水)之外,还有什么可以使它更好呢?
We could try coloring by profitability… let’s give that a try by dragging ‘Profit’ to ‘Color’ on the marks card:
我们可以尝试通过获利能力进行着色……让我们通过在标记卡上将“利润”拖到“颜色”来进行尝试:
Adding color gradients to visuals can be valuable. But let’s keep in mind that right now we are coloring by overall profit. If you ignore ‘Tables’, which is screaming for attention, then the color in this visual conveys an obvious message: sub-categories with higher volumes of sales also have higher volumes of profit.
为视觉效果添加颜色渐变可能很有价值。 但是请记住,现在我们正在通过整体利润进行着色。 如果您忽略了引起人们注意的“表格”,则该视觉效果中的颜色传达出明显的信息:销售量较高的子类别也具有较高的利润量。
But what if we’re more interested in knowing which sub-category is relatively more profitable? Analyzing profitability by volume alone doesn’t provide the whole story, as you could sell one million items at 0.1% profit ratio and make more profits than selling one thousand items at 50% profit ratio. Perhaps our business could shift priorities towards products that are more profitable per unit sold if only we knew what those products were.
但是,如果我们更想知道哪个子类别相对更有利可图,该怎么办? 仅按数量分析获利能力并不能提供全部信息,因为您可以以0.1%的利润率出售一百万件商品,并且比以50%的利润率出售一千件商品赚更多的利润。 如果只有我们知道这些产品是什么,也许我们的业务可以将重点转移到每单位销售利润更高的产品上。
So let’s see if looking at this same data through a different lens gives us any insights.
因此,让我们看看通过不同的眼光看待相同的数据是否能给我们带来任何见解。
In Tableau, the calculation for profit ratio looks like this:
在Tableau中,利润率的计算如下所示:
SUM([Profit]) / SUM([Sales])
The calculation above takes the total profit and divides that by the total sales to provide the profit ratio.
上面的计算采用总利润,然后将其除以总销售额即可得出利润率。
Here’s what our visual looks like when we color by the profit ratio instead of the raw profits:
这是当我们按利润率而不是原始利润进行着色时,视觉效果的样子:
The differences here are subtle, but we can immediately see that the coloration is no longer following the previous rule of ‘darker at the top, lighter at the bottom’.
这里的差异是细微的,但是我们可以立即看到,颜色不再遵循以前的规则:“顶部越暗,底部越浅”。
Paper and Labels, which were bottom feeders in our previous visual, seem to be bringing in the highest profits per sale.
纸张和标签是我们以前的视觉效果中的最底层,似乎每次销售带来最高利润。
When we color by profit ratio, we obtain a more relative picture that allows us to see that per unit of sales, paper and labels are doing well. It seems that while these products bring in lower volumes of sales, the revenue they bring in is highly profitable. Without this insight, perhaps we would neglect this profitable niche of our business.
当我们按利润率进行着色时,我们获得了更相关的图像,可以使我们看到每销售单位 ,纸张和标签都表现良好。 这些产品虽然带来较低的销售量,但带来的收益却是高利润的。 没有这种洞察力,也许我们会忽略我们业务的这一有利可图的利基市场。
步骤2:在Python中计算利润率 (Step 2: calculating the profit ratio in Python)
No need to make this more complicated than necessary. First, let’s revisit where we left off in the last article in terms of our Python code.
无需使此操作变得不必要的复杂。 首先,让我们重新回顾上一篇文章中关于Python代码的内容。
We had a Pandas DataFrame named ‘subcat_sales_df’, which had columns ‘Sub-Category’, ‘Sales’, and ‘Profit’.
我们有一个名为'subcat_sales_df'的Pandas DataFrame,其中有'Sub-Category','Sales'和'Profit'列。
Here’s what it looked like:
看起来是这样的:
Given this DataFrame, here’s what it takes to create a new ‘Profit Ratio’ column:
给定此DataFrame,以下是创建新的“利润率”列的步骤:
subcat_sales_df['Profit Ratio'] = subcat_sales_df['Profit'] / subcat_sales_df['Sales']
Translating that into plain English, we are defining our ‘Profit Ratio’ column to be our ‘Profit’ divided by our ‘Sales’. Nothing too crazy, right?
将其翻译成简单的英语,我们将“利润率”列定义为“利润”除以“销售”。 没什么太疯狂的吧?
Keep in mind that our ‘subcat_sales_df’ DataFrame was pre-processed from our raw data in the previous article. This loops back to what we said earlier regarding freedom and flexibility in Python: you control your own destiny. Want to reshape your data and store that in a different variable like we did here? You’re completely free to do so.
请记住,我们的“ subcat_sales_df” DataFrame是根据上一篇文章中的原始数据进行预处理的。 这回溯到我们之前所说的关于Python的自由和灵活性的观点:您可以控制自己的命运。 是否想要像我们在此处那样重塑数据并将其存储在其他变量中? 您完全可以这样做。
Here’s what our ‘Profit Ratio’ looks like:
这是我们的“利润率”:
步骤3:在Python中使用颜色渐变进行绘图 (Step 3: plotting with a color gradient in Python)
In our previous article, we created a bar graph showing the sales volume per sub-category.
在上一篇文章中,我们创建了一个条形图,显示每个子类别的销量。
Let’s see how we can make this happen. Here’s the code I used to add ‘Profit Ratio’ as a color gradient for the chart:
让我们看看如何实现这一目标。 这是我用来为图表添加“利润率”作为颜色渐变的代码:
from matplotlib import cmfig, ax = plt.subplots(figsize=(10, 6))ax = sns.barplot(
data=subcat_sales_df,
x='Sales',
y='Sub-Category',
palette=cm.RdBu(subcat_sales_df['Profit Ratio'] * 7.5)
)ax.tick_params(axis='both', which='both', length=0)
ax.set_xlabel("Sales")
sns.despine(left=True, bottom=True)
Don’t worry about all the ‘ax’ and ‘fig’ nonsense here, this is mostly related to formatting (axis labels, etc). Feel free to play around with the scalar (7.5) seen here; adjusting it will change the nature of the color gradient.
在这里不必担心所有的“ ax”和“ fig”废话,这主要与格式化(轴标签等)有关。 随意使用此处看到的标量(7.5); 调整它会改变颜色渐变的性质。
The key here is the ‘palette’ being set to a matplotlib Colormap value. The ‘RdBu’ indicates that this color mapping runs from red to blue. The values of the color mapping are decided by the values we pass in, which in this case is the ‘Profit Ratio’ data.
这里的关键是将“调色板”设置为matplotlib Colormap值。 “ RdBu”表示此颜色映射从红色到蓝色。 颜色映射的值由我们传入的值确定,在这种情况下为“利润率”数据。
Here’s what it looks like:
看起来是这样的:
步骤4:调和差异 (Step 4: reconciling the differences)
You probably noticed that the coloration from Tableau and what we built here isn’t an exact match. That’s fine! In Tableau, many things are decided for you and that’s why it’s such a great click-and-drag tool. In Python, you control the details.
您可能已经注意到,Tableau的颜色与我们在此处构建的颜色并不完全匹配。 没关系! 在Tableau中,为您决定了很多事情,这就是为什么它是一种如此出色的单击和拖动工具的原因。 在Python中,您可以控制细节。
In the next article, we’ll take an even closer look at how those details come together with more advanced calculations! Down the road, we’ll explore joining in data from other sources.
在下一篇文章中,我们将进一步研究这些细节如何与更高级的计算结合在一起! 在未来的过程中,我们将探索如何结合其他来源的数据。
See you there!
到时候那里见!
翻译自: https://towardsdatascience.com/a-gentle-introduction-to-python-for-tableau-developers-part-2-c18095a03e0e
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389139.shtml
如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!