熊猫数据集_对熊猫数据框使用逻辑比较

熊猫数据集

P (tPYTHON)

Logical comparisons are used everywhere.

逻辑比较随处可见

The Pandas library gives you a lot of different ways that you can compare a DataFrame or Series to other Pandas objects, lists, scalar values, and more. The traditional comparison operators (<, >, <=, >=, ==, !=) can be used to compare a DataFrame to another set of values.

Pandas库为您提供了许多不同的方式,您可以将DataFrame或Series与其他Pandas对象,列表,标量值等进行比较。 传统的比较运算符( <, >, <=, >=, ==, != )可用于将DataFrame与另一组值进行比较。

However, you can also use wrappers for more flexibility in your logical comparison operations. These wrappers allow you to specify the axis for comparison, so you can choose to perform the comparison at the row or column level. Also, if you are working with a MultiIndex, you may specify which index you want to work with.

但是,还可以使用包装器在逻辑比较操作中提供更大的灵活性。 这些包装器允许您指定要进行比较的 ,因此您可以选择在行或列级别执行比较。 另外,如果您使用的是MultiIndex,则可以指定要使用的索引

In this piece, we’ll first take a quick look at logical comparisons with the standard operators. After that, we’ll go through five different examples of how you can use these logical comparison wrappers to process and better understand your data.

在本文中,我们将首先快速了解与标准运算符的逻辑比较。 之后,我们将介绍五个不同的示例,说明如何使用这些逻辑比较包装器来处理和更好地理解您的数据。

The data used in this piece is sourced from Yahoo Finance. We’ll be using a subset of Tesla stock price data. Run the code below if you want to follow along. (And if you’re curious as to the function I used to get the data scroll to the very bottom and click on the first link.)

本文中使用的数据来自Yahoo Finance。 我们将使用特斯拉股价数据的子集。 如果要继续,请运行下面的代码。 (如果您对我用来使数据滚动到最底部并单击第一个链接的功能感到好奇)。

import pandas as pd# fixed data so sample data will stay the same
df = pd.read_html("https://finance.yahoo.com/quote/TSLA/history?period1=1277942400&period2=1594857600&interval=1d&filter=history&frequency=1d")[0]df = df.head(10) # only work with the first 10 points
Image for post
Tesla stock data from Yahoo Finance雅虎财经的特斯拉股票数据

与熊猫的逻辑比较 (Logical Comparisons With Pandas)

The wrappers available for use are:

可用的包装器有:

  • eq (equivalent to ==) — equals to

    eq (等于== )—等于

  • ne (equivalent to !=) — not equals to

    ne (等于!= )-不等于

  • le (equivalent to <=) — less than or equals to

    le (等于<= )-小于或等于

  • lt (equivalent to <) — less than

    lt (等于< )-小于

  • ge (equivalent to >=) — greater than or equals to

    ge (等于>= )-大于或等于

  • gt (equivalent to >) — greater than

    gt (等于> )-大于

Before we dive into the wrappers, let’s quickly review how to perform a logical comparison in Pandas.

在深入探讨包装之前,让我们快速回顾一下如何在Pandas中进行逻辑比较。

With the regular comparison operators, a basic example of comparing a DataFrame column to an integer would look like this:

使用常规比较运算符,将DataFrame列与整数进行比较的基本示例如下所示:

old = df['Open'] >= 270

Here, we’re looking to see whether each value in the “Open” column is greater than or equal to the fixed integer “270”. However, if you try to run this, at first it won’t work.

在这里,我们正在查看“ Open”列中的每个值是否大于或等于固定整数“ 270”。 但是,如果尝试运行此命令,则一开始它将无法工作。

You’ll most likely see this:

您很可能会看到以下内容:

TypeError: '>=' not supported between instances of 'str' and 'int'

TypeError: '>=' not supported between instances of 'str' and 'int'

This is important to take care of now because when you use both the regular comparison operators and the wrappers, you’ll need to make sure that you are actually able to compare the two elements. Remember to do something like the following in your pre-processing, not just for these exercises, but in general when you’re analyzing data:

这一点现在很重要,因为当您同时使用常规比较运算符和包装器时,需要确保您确实能够比较这两个元素。 请记住,在预处理过程中,不仅要针对这些练习,而且在分析数据时通常要执行以下操作:

df = df.astype({"Open":'float',
"High":'float',
"Low":'float',
"Close*":'float',
"Adj Close**":'float',
"Volume":'float'})

Now, if you run the original comparison again, you’ll get this series back:

现在,如果再次运行原始比较,您将获得以下系列:

Image for post
Easy logical comparison example
简单的逻辑比较示例

You can see that the operation returns a series of Boolean values. If you check the original DataFrame, you’ll see that there should be a corresponding “True” or “False” for each row where the value was greater than or equal to (>=) 270 or not.

您可以看到该操作返回了一系列布尔值。 如果检查原始DataFrame,您会发现值大于或等于( >= )270的每一行都应该有一个对应的“ True”或“ False”。

Now, let’s dive into how you can do the same and more with the wrappers.

现在,让我们深入研究如何使用包装器做同样的事情。

1.比较两列的不平等 (1. Comparing two columns for inequality)

In the data set, you’ll see that there is a “Close*” column and an “Adj Close**” column. The Adjusted Close price is altered to reflect potential dividends and splits, whereas the Close price is only adjusted for splits. To see if these events may have happened, we can do a basic test to see if values in the two columns are not equal.

在数据集中,您将看到有一个“ Close *”列和一个“ Adj Close **”列。 调整后的收盘价被更改以反映潜在的股息和分割,而收盘价仅针对分割进行调整。 要查看是否可能发生了这些事件,我们可以进行基本测试以查看两列中的值是否不相等。

To do so, we run the following:

为此,我们运行以下命令:

# is the adj close different from the close?
df['Close Comparison'] = df['Adj Close**'].ne(df['Close*'])
Image for post
Results of column inequality comparison
列不相等比较的结果

Here, all we did is call the .ne() function on the “Adj Close**” column and pass “Close*”, the column we want to compare, as an argument to the function.

在这里,我们.ne()在“ Adj Close **”列上调用.ne()函数,并传递“ Close *”(我们要比较的列)作为该函数的参数。

If we take a look at the resulting DataFrame, you’ll see that we‘ve created a new column “Close Comparison” that will show “True” if the two original Close columns are different and “False” if they are the same. In this case, you can see that the values for “Close*” and “Adj Close**” on every row are the same, so the “Close Comparison” only has “False” values. Technically, this would mean that we could remove the “Adj Close**” column, at least for this subset of data, since it only contains duplicate values to the “Close*” column.

如果我们看一下生成的DataFrame,您会看到我们创建了一个新列“ Close Compare”,如果两个原始的Close列不同,则显示“ True”,如果相同,则显示“ False”。 在这种情况下,您可以看到每行上“ Close *”和“ Adj Close **”的值相同,因此“ Close Compare”只有“ False”值。 从技术上讲,这意味着我们至少可以删除此数据子集的“ Adj Close **”列,因为它仅包含“ Close *”列的重复值。

2.检查一列是否大于另一列 (2. Checking if one column is greater than another)

We’d often like to see whether a stock’s price increased by the end of the day. One way to do this would be to see a “True” value if the “Close*” price was greater than the “Open” price or “False” otherwise.

我们经常想看看一天结束时股票的价格是否上涨了。 一种方法是,如果“收盘价”大于“开盘价”,则查看“真”值,否则查看“假”价。

To implement this, we run the following:

为了实现这一点,我们运行以下命令:

# is the close greater than the open?
df['Bool Price Increase'] = df['Close*'].gt(df['Open'])
Image for post
Results of column greater than comparison
列结果大于比较结果

Here, we see that the “Close*” price at the end of the day was higher than the “Open” price at the beginning of the day 4/10 times in the first two weeks of July 2020. This might not be that informative because it’s such a small sample, but if you were to extend this to months or even years of data, it could indicate the overall trend of the stock (up or down).

在这里,我们看到,在2020年7月的前两周,一天结束时的“收盘价”比一天开始时的“开盘价”高出4/10倍。因为这是一个很小的样本,但是如果您将其扩展到数月甚至数年的数据,则可能表明存量的总体趋势(上升或下降)。

3.检查列是否大于标量值 (3. Checking if a column is greater than a scalar value)

So far, we’ve just been comparing columns to one another. You can also use the logical operators to compare values in a column to a scalar value like an integer. For example, let’s say that if the volume traded per day is greater than or equal to 100 million, we’ll call it a “High Volume” day.

到目前为止,我们只是在相互比较列。 您还可以使用逻辑运算符将列中的值与标量值(例如整数)进行比较。 例如,假设每天的交易量大于或等于1亿,我们将其称为“高交易量”日。

To do so, we run the following:

为此,我们运行以下命令:

# was the volume greater than 100m?
df['High Volume'] = df['Volume'].ge(100000000)
Image for post
Result of column and scalar greater than comparison
列和标量的结果大于比较结果

Instead of passing a column to the logical comparison function, this time we simply have to pass our scalar value “100000000”.

这次我们不必将列传递给逻辑比较函数,而只需传递标量值“ 100000000”。

Now, we can see that on 5/10 days the volume was greater than or equal to 100 million.

现在,我们可以看到5/10天的交易量大于或等于1亿。

4.检查列是否大于自身 (4. Checking if a column is greater than itself)

Earlier, we compared if the “Open” and “Close*” value in each row were different. It would be cool if instead, we compared the value of a column to the preceding value, to track an increase or decrease over time. Doing this means we can check if the “Close*” value for July 15 was greater than the value for July 14.

之前,我们比较了每行中的“打开”和“关闭*”值是否不同。 相反,如果我们将列的值与先前的值进行比较,以跟踪随时间的增加或减少,那将很酷。 这样做意味着我们可以检查7月15日的“ Close *”值是否大于7月14日的值。

To do so, we run the following:

为此,我们运行以下命令:

# was the close greater than yesterday's close?
df['Close (t-1)'] = df['Close*'].shift(-1)
df['Bool Over Time Increase'] = df['Close*'].gt(df['Close*'].shift(-1))
Image for post
Result of column greater than comparison
列结果大于比较结果

For illustration purposes, I included the “Close (t-1)” column so you can compare each row directly. In practice, you don’t need to add an entirely new column, as all we’re doing is passing the “Close*” column again into the logical operator, but we’re also calling shift(-1) on it to move all the values “up by one”.

为了便于说明,我在“ Close(t-1)”列中添加了一个标题,以便您可以直接比较每一行。 实际上,您不需要添加全新的列,因为我们要做的只是将“ Close *”列再次传递到逻辑运算符中,但是我们还对其调用了shift(-1)来进行移动所有值“加一”。

What’s going on here is basically subtracting one from the index, so the value for July 14 moves “up”, which lets us compare it to the real value on July 15. As a result, you can see that on 7/10 days the “Close*” value was greater than the “Close*” value on the day before.

这里发生的基本上是从索引中减去1,因此7月14日的值“向上”移动,这使我们可以将其与7月15日的实际值进行比较。结果,您可以看到在7/10天“关闭*”值大于前一天的“关闭*”值。

5.比较列与列表 (5. Comparing a column to a list)

As a final exercise, let’s say that we developed a model to predict the stock prices for 10 days. We’ll store those predictions in a list, then compare the both the “Open” and “Close*” values of each day to the list values.

作为最后的练习,假设我们开发了一个模型来预测10天的股价。 我们将这些预测存储在列表中,然后将每天的“打开”和“关闭*”值与列表值进行比较。

To do so, we run the following:

为此,我们运行以下命令:

# did the open and close price match the predictions?
predictions = [309.2, 303.36, 300, 489, 391, 445, 402.84, 274.32, 410, 223.93]
df2 = df[['Open','Close*']].eq(predictions, axis='index')
Image for post
Result of column and list comparison
列和列表比较的结果

Here, we’ve compared our generated list of predictions for the daily stock prices and compared it to the “Close*” column. To do so, we pass “predictions” into the eq() function and set axis='index'. By default, the comparison wrappers have axis='columns', but in this case, we actually want to work with each row in each column.

在这里,我们比较了生成的每日股票价格预测列表,并将其与“收盘价*”列进行了比较。 为此,我们将“预测”传递给eq()函数并设置axis='index' 。 默认情况下,比较包装器具有axis='columns' ,但是在这种情况下,我们实际上要处理每一列中的每一行。

What this means is Pandas will compare “309.2”, which is the first element in the list, to the first values of “Open” and “Close*”. Then it will move on to the second value in the list and the second values of the DataFrame and so on. Remember that the index of a list and a DataFrame both start at 0, so you would look at “308.6” and “309.2” respectively for the first DataFrame column values (scroll back up if you want to double-check the results).

这意味着熊猫将把列表中的第一个元素“ 309.2”与“打开”和“关闭*”的第一个值进行比较。 然后它将移至列表中的第二个值和DataFrame的第二个值,依此类推。 请记住,列表的索引和DataFrame的索引都从0开始,因此对于第一个DataFrame列值,您将分别查看“ 308.6”和“ 309.2”(如果要仔细检查结果,请向上滚动)。

Based on these arbitrary predictions, you can see that there were no matches between the “Open” column values and the list of predictions. There were 4/10 matches between the “Close*” column values and the list of predictions.

根据这些任意的预测,您可以看到“ Open”列值和预测列表之间没有匹配项。 “ Close *”列值和预测列表之间有4/10个匹配项。

I hope you found this very basic introduction to logical comparisons in Pandas using the wrappers useful. Remember to only compare data that can be compared (i.e. don’t try to compare a string to a float) and manually double-check the results to make sure your calculations are producing the intended results.

我希望您发现使用包装程序对熊猫进行逻辑比较非常基础的介绍很有用。 请记住仅比较可以比较的数据(即不要尝试将字符串与浮点数进行比较),并手动仔细检查结果以确保您的计算产生了预期的结果。

Go forth and compare!

继续比较吧!

More by me:- 2 Easy Ways to Get Tables From a Website
- Top 4 Repositories on GitHub to Learn Pandas
- An Introduction to the Cohort Analysis With Tableau
- How to Quickly Create and Unpack Lists with Pandas
- Learning to Forecast With Tableau in 5 Minutes Or Less

翻译自: https://towardsdatascience.com/using-logical-comparisons-with-pandas-dataframes-3520eb73ae63

熊猫数据集

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389631.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

初级功能笔试题-1

给我徒弟整理的一些理论性的笔试题&#xff0c;不喜勿喷。&#xff08;所以没有答案哈&#xff09; 1、测试人员返测缺陷时&#xff0c;如果缺陷未修复&#xff0c;把缺陷的状态置为下列什么状态&#xff08;&#xff09;。 2、当验证被测系统的主要业务流程和功能是否实现时&a…

ansbile--playbook剧本案例

个人博客转至&#xff1a; www.zhangshoufu.com 通过ansible批量管理三台服务器&#xff0c;使三台服务器实现备份&#xff0c;web01、nfs、backup&#xff0c;把web和nfs上的重要文件被分到backup上&#xff0c;主机ip地址分配如下 CharacterIP地址IP地址主机名Rsync--server1…

5938. 找出数组排序后的目标下标

5938. 找出数组排序后的目标下标 给你一个下标从 0 开始的整数数组 nums 以及一个目标元素 target 。 目标下标 是一个满足 nums[i] target 的下标 i 。 将 nums 按 非递减 顺序排序后&#xff0c;返回由 nums 中目标下标组成的列表。如果不存在目标下标&#xff0c;返回一…

决策树之前要不要处理缺失值_不要使用这样的决策树

决策树之前要不要处理缺失值As one of the most popular classic machine learning algorithm, the Decision Tree is much more intuitive than the others for its explainability. In one of my previous article, I have introduced the basic idea and mechanism of a Dec…

说说 C 语言中的变量与算术表达式

我们先来写一个程序&#xff0c;打印英里与公里之间的对应关系表。公式&#xff1a;1 mile1.61 km 程序如下&#xff1a; #include <stdio.h>/* print Mile to Kilometre table*/ main() {float mile, kilometre;int lower 0;//lower limitint upper 1000;//upper limi…

gl3520 gl3510_带有gl gl本机的跨平台地理空间可视化

gl3520 gl3510Editor’s note: Today’s post is by Ib Green, CTO, and Ilija Puaca, Founding Engineer, both at Unfolded, an “open core” company that builds products and services on the open source deck.gl / vis.gl technology stack, and is also a major contr…

uiautomator +python 安卓UI自动化尝试

使用方法基本说明&#xff1a;https://www.cnblogs.com/mliangchen/p/5114149.html&#xff0c;https://blog.csdn.net/Eugene_3972/article/details/76629066 环境准备&#xff1a;https://www.cnblogs.com/keeptheminutes/p/7083816.html 简单实例 1.自动化安装与卸载 &#…

5922. 统计出现过一次的公共字符串

5922. 统计出现过一次的公共字符串 给你两个字符串数组 words1 和 words2 &#xff0c;请你返回在两个字符串数组中 都恰好出现一次 的字符串的数目。 示例 1&#xff1a;输入&#xff1a;words1 ["leetcode","is","amazing","as",&…

Python+Appium寻找蓝牙/wifi匹配

前言&#xff1a; 此篇是介绍怎么去寻找蓝牙&#xff0c;进行匹配。主要2个问题点&#xff1a; 1.在不同环境下&#xff0c;搜索到的蓝牙数量有变 2.在不同环境下&#xff0c;搜索到的蓝牙排序会变 简单思路&#xff1a; 将搜索出来的蓝牙名字添加到一个list去&#xff0c;然后…

power bi中的切片器_在Power Bi中显示选定的切片器

power bi中的切片器Just recently, while presenting my session: “Magnificent 7 — Simple tricks to boost your Power BI Development” at the New Stars of Data conference, one of the questions I’ve received was:就在最近&#xff0c;在“新数据之星”会议上介绍我…

字符串匹配 sunday算法

#include"iostream" #include"string.h" using namespace std;//BF算法 int strfind(char *s1,char *s2,int pos){int len1 strlen(s1);int len2 strlen(s2);int i pos - 1,j 0;while(j < len2){if(s1[i j] s2[j]){j;}else{i;j 0;}}if(j len2){…

5939. 半径为 k 的子数组平均值

5939. 半径为 k 的子数组平均值 给你一个下标从 0 开始的数组 nums &#xff0c;数组中有 n 个整数&#xff0c;另给你一个整数 k 。 半径为 k 的子数组平均值 是指&#xff1a;nums 中一个以下标 i 为 中心 且 半径 为 k 的子数组中所有元素的平均值&#xff0c;即下标在 i …

Adobe After Effects CS6 操作记录

安装 After Effects CS6 在Mac OS 10.12.5 上无法直接安装, 需要浏览到安装的执行文件后才能进行 https://helpx.adobe.com/creative-cloud/kb/install-creative-suite-mac-os-sierra.html , 但是即使安装成功, 也不能正常启动, 会报"You can’t use this version of the …

数据库逻辑删除的sql语句_通过数据库的眼睛查询sql的逻辑流程

数据库逻辑删除的sql语句Structured Query Language (SQL) is famously known as the romance language of data. Even thinking of extracting the single correct answer from terabytes of relational data seems a little overwhelming. So understanding the logical flow…

好用的模块

import requests# 1、发get请求urlhttp://api.xxx.xx/api/user/sxx_infodata{stu_name:xxx}reqrequests.get(url,paramsdata) #发get请求print(req.json()) #字典print(req.text) #string,json串# 返回的都是什么# 返回的类型是什么# 中文的好使吗# 2、发请求posturlhttp://api…

5940. 从数组中移除最大值和最小值

5940. 从数组中移除最大值和最小值 给你一个下标从 0 开始的数组 nums &#xff0c;数组由若干 互不相同 的整数组成。 nums 中有一个值最小的元素和一个值最大的元素。分别称为 最小值 和 最大值 。你的目标是从数组中移除这两个元素。 一次 删除 操作定义为从数组的 前面 …

BZOJ4127Abs——树链剖分+线段树

题目描述 给定一棵树,设计数据结构支持以下操作 1 u v d  表示将路径 (u,v) 加d 2 u v 表示询问路径 (u,v) 上点权绝对值的和 输入 第一行两个整数n和m&#xff0c;表示结点个数和操作数接下来一行n个整数a_i,表示点i的权值接下来n-1行,每行两个整数u,v表示存在一条(u,v)的…

数据挖掘流程_数据流挖掘

数据挖掘流程1-简介 (1- Introduction) The fact that the pace of technological change is at its peak, Silicon Valley is also introducing new challenges that need to be tackled via new and efficient ways. Continuous research is being carried out to improve th…

北门外的小吃街才是我的大学食堂

学校北门外的那些小吃摊&#xff0c;陪我度过了漫长的大学四年。 细数下来&#xff0c;我最怀念的是…… &#xff08;1&#xff09;烤鸡翅 吸引指数&#xff1a;★★★★★ 必杀技&#xff1a;酥流油 烤鸡翅有蜂蜜味、香辣味、孜然味……最爱店家独创的秘制鸡翅。鸡翅的外皮被…

786. 第 K 个最小的素数分数

786. 第 K 个最小的素数分数 给你一个按递增顺序排序的数组 arr 和一个整数 k 。数组 arr 由 1 和若干 素数 组成&#xff0c;且其中所有整数互不相同。 对于每对满足 0 < i < j < arr.length 的 i 和 j &#xff0c;可以得到分数 arr[i] / arr[j] 。 那么第 k 个…