熊猫数据集

P (tPYTHON)

Logical comparisons are used everywhere.

逻辑比较随处可见 。

The Pandas library gives you a lot of different ways that you can compare a DataFrame or Series to other Pandas objects, lists, scalar values, and more. The traditional comparison operators (<, >, <=, >=, ==, !=) can be used to compare a DataFrame to another set of values.

Pandas库为您提供了许多不同的方式，您可以将DataFrame或Series与其他Pandas对象，列表，标量值等进行比较。传统的比较运算符( <, >, <=, >=, ==, != )可用于将DataFrame与另一组值进行比较。

However, you can also use wrappers for more flexibility in your logical comparison operations. These wrappers allow you to specify the axis for comparison, so you can choose to perform the comparison at the row or column level. Also, if you are working with a MultiIndex, you may specify which index you want to work with.

但是，还可以使用包装器在逻辑比较操作中提供更大的灵活性。这些包装器允许您指定要进行比较的轴，因此您可以选择在行或列级别执行比较。另外，如果您使用的是MultiIndex，则可以指定要使用的索引。

In this piece, we’ll first take a quick look at logical comparisons with the standard operators. After that, we’ll go through five different examples of how you can use these logical comparison wrappers to process and better understand your data.

在本文中，我们将首先快速了解与标准运算符的逻辑比较。之后，我们将介绍五个不同的示例，说明如何使用这些逻辑比较包装器来处理和更好地理解您的数据。

The data used in this piece is sourced from Yahoo Finance. We’ll be using a subset of Tesla stock price data. Run the code below if you want to follow along. (And if you’re curious as to the function I used to get the data scroll to the very bottom and click on the first link.)

本文中使用的数据来自Yahoo Finance。我们将使用特斯拉股价数据的子集。如果要继续，请运行下面的代码。 (如果您对我用来使数据滚动到最底部并单击第一个链接的功能感到好奇)。

import pandas as pd# fixed data so sample data will stay the same
df = pd.read_html("https://finance.yahoo.com/quote/TSLA/history?period1=1277942400&period2=1594857600&interval=1d&filter=history&frequency=1d")[0]df = df.head(10) # only work with the first 10 points

Image for post — Tesla stock data from Yahoo Finance雅虎财经的特斯拉股票数据

与熊猫的逻辑比较 (Logical Comparisons With Pandas)

The wrappers available for use are:

可用的包装器有：

eq (equivalent to ==) — equals to
eq (等于== )—等于
ne (equivalent to !=) — not equals to
ne (等于!= )-不等于
le (equivalent to <=) — less than or equals to
le (等于<= )-小于或等于
lt (equivalent to <) — less than
lt (等于< )-小于
ge (equivalent to >=) — greater than or equals to
ge (等于>= )-大于或等于
gt (equivalent to >) — greater than
gt (等于> )-大于

Before we dive into the wrappers, let’s quickly review how to perform a logical comparison in Pandas.

在深入探讨包装之前，让我们快速回顾一下如何在Pandas中进行逻辑比较。

With the regular comparison operators, a basic example of comparing a DataFrame column to an integer would look like this:

使用常规比较运算符，将DataFrame列与整数进行比较的基本示例如下所示：

old = df['Open'] >= 270

Here, we’re looking to see whether each value in the “Open” column is greater than or equal to the fixed integer “270”. However, if you try to run this, at first it won’t work.

在这里，我们正在查看“ Open”列中的每个值是否大于或等于固定整数“ 270”。但是，如果尝试运行此命令，则一开始它将无法工作。

You’ll most likely see this:

您很可能会看到以下内容：

TypeError: '>=' not supported between instances of 'str' and 'int'

This is important to take care of now because when you use both the regular comparison operators and the wrappers, you’ll need to make sure that you are actually able to compare the two elements. Remember to do something like the following in your pre-processing, not just for these exercises, but in general when you’re analyzing data:

这一点现在很重要，因为当您同时使用常规比较运算符和包装器时，需要确保您确实能够比较这两个元素。请记住，在预处理过程中，不仅要针对这些练习，而且在分析数据时通常要执行以下操作：

df = df.astype({"Open":'float',
                "High":'float',
                "Low":'float',
                "Close*":'float',
                "Adj Close**":'float',
                "Volume":'float'})

Now, if you run the original comparison again, you’ll get this series back:

现在，如果再次运行原始比较，您将获得以下系列：

You can see that the operation returns a series of Boolean values. If you check the original DataFrame, you’ll see that there should be a corresponding “True” or “False” for each row where the value was greater than or equal to (>=) 270 or not.

您可以看到该操作返回了一系列布尔值。如果检查原始DataFrame，您会发现值大于或等于( >= )270的每一行都应该有一个对应的“ True”或“ False”。

Now, let’s dive into how you can do the same and more with the wrappers.

现在，让我们深入研究如何使用包装器做同样的事情。

1.比较两列的不平等 (1. Comparing two columns for inequality)

In the data set, you’ll see that there is a “Close*” column and an “Adj Close**” column. The Adjusted Close price is altered to reflect potential dividends and splits, whereas the Close price is only adjusted for splits. To see if these events may have happened, we can do a basic test to see if values in the two columns are not equal.

在数据集中，您将看到有一个“ Close *”列和一个“ Adj Close **”列。调整后的收盘价被更改以反映潜在的股息和分割，而收盘价仅针对分割进行调整。要查看是否可能发生了这些事件，我们可以进行基本测试以查看两列中的值是否不相等。

To do so, we run the following:

为此，我们运行以下命令：

# is the adj close different from the close?
df['Close Comparison'] = df['Adj Close**'].ne(df['Close*'])

Here, all we did is call the .ne() function on the “Adj Close**” column and pass “Close*”, the column we want to compare, as an argument to the function.

在这里，我们.ne()在“ Adj Close **”列上调用.ne()函数，并传递“ Close *”(我们要比较的列)作为该函数的参数。

If we take a look at the resulting DataFrame, you’ll see that we‘ve created a new column “Close Comparison” that will show “True” if the two original Close columns are different and “False” if they are the same. In this case, you can see that the values for “Close*” and “Adj Close**” on every row are the same, so the “Close Comparison” only has “False” values. Technically, this would mean that we could remove the “Adj Close**” column, at least for this subset of data, since it only contains duplicate values to the “Close*” column.

如果我们看一下生成的DataFrame，您会看到我们创建了一个新列“ Close Compare”，如果两个原始的Close列不同，则显示“ True”，如果相同，则显示“ False”。在这种情况下，您可以看到每行上“ Close *”和“ Adj Close **”的值相同，因此“ Close Compare”只有“ False”值。从技术上讲，这意味着我们至少可以删除此数据子集的“ Adj Close **”列，因为它仅包含“ Close *”列的重复值。

2.检查一列是否大于另一列 (2. Checking if one column is greater than another)

We’d often like to see whether a stock’s price increased by the end of the day. One way to do this would be to see a “True” value if the “Close*” price was greater than the “Open” price or “False” otherwise.

我们经常想看看一天结束时股票的价格是否上涨了。一种方法是，如果“收盘价”大于“开盘价”，则查看“真”值，否则查看“假”价。

To implement this, we run the following:

为了实现这一点，我们运行以下命令：

# is the close greater than the open?
df['Bool Price Increase'] = df['Close*'].gt(df['Open'])

Here, we see that the “Close*” price at the end of the day was higher than the “Open” price at the beginning of the day 4/10 times in the first two weeks of July 2020. This might not be that informative because it’s such a small sample, but if you were to extend this to months or even years of data, it could indicate the overall trend of the stock (up or down).

在这里，我们看到，在2020年7月的前两周，一天结束时的“收盘价”比一天开始时的“开盘价”高出4/10倍。因为这是一个很小的样本，但是如果您将其扩展到数月甚至数年的数据，则可能表明存量的总体趋势(上升或下降)。

3.检查列是否大于标量值 (3. Checking if a column is greater than a scalar value)

So far, we’ve just been comparing columns to one another. You can also use the logical operators to compare values in a column to a scalar value like an integer. For example, let’s say that if the volume traded per day is greater than or equal to 100 million, we’ll call it a “High Volume” day.

到目前为止，我们只是在相互比较列。您还可以使用逻辑运算符将列中的值与标量值(例如整数)进行比较。例如，假设每天的交易量大于或等于1亿，我们将其称为“高交易量”日。

To do so, we run the following:

为此，我们运行以下命令：

# was the volume greater than 100m?
df['High Volume'] = df['Volume'].ge(100000000)

Instead of passing a column to the logical comparison function, this time we simply have to pass our scalar value “100000000”.

这次我们不必将列传递给逻辑比较函数，而只需传递标量值“ 100000000”。

Now, we can see that on 5/10 days the volume was greater than or equal to 100 million.

现在，我们可以看到5/10天的交易量大于或等于1亿。

4.检查列是否大于自身 (4. Checking if a column is greater than itself)

Earlier, we compared if the “Open” and “Close*” value in each row were different. It would be cool if instead, we compared the value of a column to the preceding value, to track an increase or decrease over time. Doing this means we can check if the “Close*” value for July 15 was greater than the value for July 14.

之前，我们比较了每行中的“打开”和“关闭*”值是否不同。相反，如果我们将列的值与先前的值进行比较，以跟踪随时间的增加或减少，那将很酷。这样做意味着我们可以检查7月15日的“ Close *”值是否大于7月14日的值。

To do so, we run the following:

为此，我们运行以下命令：

# was the close greater than yesterday's close?
df['Close (t-1)'] = df['Close*'].shift(-1)
df['Bool Over Time Increase'] = df['Close*'].gt(df['Close*'].shift(-1))

For illustration purposes, I included the “Close (t-1)” column so you can compare each row directly. In practice, you don’t need to add an entirely new column, as all we’re doing is passing the “Close*” column again into the logical operator, but we’re also calling shift(-1) on it to move all the values “up by one”.

为了便于说明，我在“ Close(t-1)”列中添加了一个标题，以便您可以直接比较每一行。实际上，您不需要添加全新的列，因为我们要做的只是将“ Close *”列再次传递到逻辑运算符中，但是我们还对其调用了shift(-1)来进行移动所有值“加一”。

What’s going on here is basically subtracting one from the index, so the value for July 14 moves “up”, which lets us compare it to the real value on July 15. As a result, you can see that on 7/10 days the “Close*” value was greater than the “Close*” value on the day before.

这里发生的基本上是从索引中减去1，因此7月14日的值“向上”移动，这使我们可以将其与7月15日的实际值进行比较。结果，您可以看到在7/10天“关闭*”值大于前一天的“关闭*”值。

5.比较列与列表 (5. Comparing a column to a list)

As a final exercise, let’s say that we developed a model to predict the stock prices for 10 days. We’ll store those predictions in a list, then compare the both the “Open” and “Close*” values of each day to the list values.

作为最后的练习，假设我们开发了一个模型来预测10天的股价。我们将这些预测存储在列表中，然后将每天的“打开”和“关闭*”值与列表值进行比较。

To do so, we run the following:

为此，我们运行以下命令：

# did the open and close price match the predictions?
predictions = [309.2, 303.36, 300, 489, 391, 445, 402.84, 274.32, 410, 223.93]
df2 = df[['Open','Close*']].eq(predictions, axis='index')

Here, we’ve compared our generated list of predictions for the daily stock prices and compared it to the “Close*” column. To do so, we pass “predictions” into the eq() function and set axis='index'. By default, the comparison wrappers have axis='columns', but in this case, we actually want to work with each row in each column.

在这里，我们比较了生成的每日股票价格预测列表，并将其与“收盘价*”列进行了比较。为此，我们将“预测”传递给eq()函数并设置axis='index' 。默认情况下，比较包装器具有axis='columns' ，但是在这种情况下，我们实际上要处理每一列中的每一行。

What this means is Pandas will compare “309.2”, which is the first element in the list, to the first values of “Open” and “Close*”. Then it will move on to the second value in the list and the second values of the DataFrame and so on. Remember that the index of a list and a DataFrame both start at 0, so you would look at “308.6” and “309.2” respectively for the first DataFrame column values (scroll back up if you want to double-check the results).

这意味着熊猫将把列表中的第一个元素“ 309.2”与“打开”和“关闭*”的第一个值进行比较。然后它将移至列表中的第二个值和DataFrame的第二个值，依此类推。请记住，列表的索引和DataFrame的索引都从0开始，因此对于第一个DataFrame列值，您将分别查看“ 308.6”和“ 309.2”(如果要仔细检查结果，请向上滚动)。

Based on these arbitrary predictions, you can see that there were no matches between the “Open” column values and the list of predictions. There were 4/10 matches between the “Close*” column values and the list of predictions.

根据这些任意的预测，您可以看到“ Open”列值和预测列表之间没有匹配项。 “ Close *”列值和预测列表之间有4/10个匹配项。

I hope you found this very basic introduction to logical comparisons in Pandas using the wrappers useful. Remember to only compare data that can be compared (i.e. don’t try to compare a string to a float) and manually double-check the results to make sure your calculations are producing the intended results.

我希望您发现使用包装程序对熊猫进行逻辑比较非常基础的介绍很有用。请记住仅比较可以比较的数据(即不要尝试将字符串与浮点数进行比较)，并手动仔细检查结果以确保您的计算产生了预期的结果。

Go forth and compare!

继续比较吧！

More by me:- 2 Easy Ways to Get Tables From a Website
- Top 4 Repositories on GitHub to Learn Pandas
- An Introduction to the Cohort Analysis With Tableau
- How to Quickly Create and Unpack Lists with Pandas
- Learning to Forecast With Tableau in 5 Minutes Or Less

翻译自: https://towardsdatascience.com/using-logical-comparisons-with-pandas-dataframes-3520eb73ae63

熊猫数据集

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.mzph.cn/news/389631.shtml

如若内容造成侵权/违法违规/事实不符，请联系多彩编程网进行投诉反馈email:809451989@qq.com，一经查实，立即删除！

初级功能笔试题-1

给我徒弟整理的一些理论性的笔试题，不喜勿喷。（所以没有答案哈） 1、测试人员返测缺陷时，如果缺陷未修复，把缺陷的状态置为下列什么状态（）。 2、当验证被测系统的主要业务流程和功能是否实现时&a…

ansbile--playbook剧本案例

个人博客转至： www.zhangshoufu.com 通过ansible批量管理三台服务器，使三台服务器实现备份，web01、nfs、backup，把web和nfs上的重要文件被分到backup上，主机ip地址分配如下 CharacterIP地址IP地址主机名Rsync--server1…

5938. 找出数组排序后的目标下标

5938. 找出数组排序后的目标下标给你一个下标从 0 开始的整数数组 nums 以及一个目标元素 target 。目标下标是一个满足 nums[i] target 的下标 i 。将 nums 按非递减顺序排序后，返回由 nums 中目标下标组成的列表。如果不存在目标下标，返回一…

决策树之前要不要处理缺失值_不要使用这样的决策树

决策树之前要不要处理缺失值As one of the most popular classic machine learning algorithm, the Decision Tree is much more intuitive than the others for its explainability. In one of my previous article, I have introduced the basic idea and mechanism of a Dec…

说说 C 语言中的变量与算术表达式

我们先来写一个程序，打印英里与公里之间的对应关系表。公式：1 mile1.61 km 程序如下： #include <stdio.h>/* print Mile to Kilometre table*/ main() {float mile, kilometre;int lower 0;//lower limitint upper 1000;//upper limi…

gl3520 gl3510_带有gl gl本机的跨平台地理空间可视化

gl3520 gl3510Editor’s note: Today’s post is by Ib Green, CTO, and Ilija Puaca, Founding Engineer, both at Unfolded, an “open core” company that builds products and services on the open source deck.gl / vis.gl technology stack, and is also a major contr…