pandas与sql对应关系【帮助sql使用者快速上手pandas】

本页旨在提供一些如何使用pandas执行各种SQL操作的示例,来帮助SQL使用者快速上手使用pandas。

目录

    • SQL语法
      • 一、选择SELECT
        • 1、选择
        • 2、添加计算列
      • 二、连接JOIN ON
        • 1、内连接
        • 2、左外连接
        • 3、右外连接
        • 4、全外连接
      • 三、过滤WHERE
        • 1、AND
        • 2、OR
        • 3、IS NULL
        • 4、IS NOT NULL
        • 5、BETWEEN
        • 6、LIKE
        • 7、CASE WHEN
      • 四、分组GROUP BY
        • 1、count()
        • 2、avg()
        • 3、sum()、max()、min()
      • 五、HAVING
      • 六、排序ORDER BY
      • 七、LIMIT/OFFSET
        • 1、LIMIT
        • 2、指定列中最大的前N行
        • 3、OFFSET
      • 八、UNION ALL/UNION
        • 1、UNION ALL
        • 2、UNION
      • 九、开窗函数
        • 1、ROW_NUMBER()
        • 2、RANK()
        • 3、SUM()

SQL语法

  • SELECT [DISTINCT | ALL] column1, column2, …, aggregate_function(columnN), …
  • FROM
  • table_name [AS alias]
  • [JOIN type JOIN table2_name [AS alias2] ON join_condition]
  • [, JOIN type JOIN table3_name [AS alias3] ON join_condition, …]
  • [WHERE condition]
  • [GROUP BY column1, column2, …]
  • [HAVING condition]
  • [ORDER BY column1 [ASC | DESC], column2 [ASC | DESC], …]
  • [LIMIT number [OFFSET offset]]
  • [UNION [ALL] SELECT …] – 可以链式添加多个UNION SELECT语句
  1. DISTINCT:确保结果集中的行是唯一的。ALL(默认)表示返回所有匹配的行,包括重复的行。
  2. aggregate_function():聚合函数,如**SUM(), AVG(), COUNT(), MAX(), MIN()**等,用于对一组值执行计算并返回单个值。
  3. JOIN type:指定连接类型,如INNER JOIN, LEFT JOIN, RIGHT JOIN, FULL JOIN等。ON join_condition:定义连接条件。
  4. WHERE condition:过滤结果集中的行,只返回满足条件的行。
  5. GROUP BY:将结果集按一个或多个列分组。通常与聚合函数一起使用。
  6. HAVING condition:过滤分组后的结果集,只返回满足条件的组。
  7. ORDER BY:对结果集进行排序。可以指定多个列和排序方向(ASC升序[默认]或DESC降序)。
  8. LIMIT number [OFFSET offset]:限制返回的行数,并可选地指定跳过的行数。
  9. UNION [ALL]:合并两个或多个SELECT语句的结果集。UNION默认去除重复行,而UNION ALL保留所有行。

一、选择SELECT

在SQL中,选择是使用要选择的列的逗号分隔列表(或* 选择所有列)

1、选择

SQL语法:

SELECT total_bill, tip, smoker, time
FROM data;

对应pandas实现:

In :data[["total_bill", "tip", "smoker", "time"]]
Out :
total_bill	tip	smoker	time
0	16.99	1.01	No	Dinner
1	10.34	1.66	No	Dinner
2	21.01	3.50	No	Dinner
3	23.68	3.31	No	Dinner
4	24.59	3.61	No	Dinner
...	...	...	...	...
239	29.03	5.92	No	Dinner
240	27.18	2.00	Yes	Dinner
241	22.67	2.00	Yes	Dinner
242	17.82	1.75	No	Dinner
243	18.78	3.00	No	Dinner
2、添加计算列

SQL语法:

SELECT *, tip/total_bill as tip_rate
FROM data;

对应pandas实现:

1)可以使用DataFrame的DataFrame.assign()方法来追加新列

In :data = data.assign(tip_rate=data["tip"] / data["total_bill"])
In :dataOut :
total_bill	tip	sex	smoker	day	time	size	tip_rate
0	16.99	1.01	Female	No	Sun	Dinner	2	0.059447
1	10.34	1.66	Male	No	Sun	Dinner	3	0.160542
2	21.01	3.50	Male	No	Sun	Dinner	3	0.166587
3	23.68	3.31	Male	No	Sun	Dinner	2	0.139780
4	24.59	3.61	Female	No	Sun	Dinner	4	0.146808
...	...	...	...	...	...	...	...	...
239	29.03	5.92	Male	No	Sat	Dinner	3	0.203927
240	27.18	2.00	Female	Yes	Sat	Dinner	2	0.073584
241	22.67	2.00	Male	Yes	Sat	Dinner	2	0.088222
242	17.82	1.75	Male	No	Sat	Dinner	2	0.098204
243	18.78	3.00	Female	No	Thur	Dinner	2	0.159744

2)也可以直接计算

In :data['tip_rate2'] = data["tip"] / data["total_bill"]
In :dataOut :
total_bill	tip	sex	smoker	day	time	size	tip_rate	tip_rate2
0	16.99	1.01	Female	No	Sun	Dinner	2	0.059447	0.059447
1	10.34	1.66	Male	No	Sun	Dinner	3	0.160542	0.160542
2	21.01	3.50	Male	No	Sun	Dinner	3	0.166587	0.166587
3	23.68	3.31	Male	No	Sun	Dinner	2	0.139780	0.139780
4	24.59	3.61	Female	No	Sun	Dinner	4	0.146808	0.146808
...	...	...	...	...	...	...	...	...	...
239	29.03	5.92	Male	No	Sat	Dinner	3	0.203927	0.203927
240	27.18	2.00	Female	Yes	Sat	Dinner	2	0.073584	0.073584
241	22.67	2.00	Male	Yes	Sat	Dinner	2	0.088222	0.088222
242	17.82	1.75	Male	No	Sat	Dinner	2	0.098204	0.098204
243	18.78	3.00	Female	No	Thur	Dinner	2	0.159744	0.159744

二、连接JOIN ON

构造测试数据

In :df1 = pd.DataFrame({"key": ["A", "B", "C", "D"], "value": np.random.randn(4)})
In :df2 = pd.DataFrame({"key": ["B", "D", "D", "E"], "value": np.random.randn(4)})
1、内连接

SQL语法:

SELECT *
FROM df1
INNER JOIN df2ON df1.key = df2.key; 

对应pandas实现:

In :pd.merge(df1, df2, on="key")
Out :	
key	value_x	value_y
0	B	0.227232	1.011278
1	D	1.415853	-0.149207
2	D	1.415853	-0.608430
2、左外连接

SQL语法:

SELECT *
FROM df1
LEFT OUTER JOIN df2ON df1.key = df2.key;

对应pandas实现:

In :pd.merge(df1, df2, on="key", how="left")
Out :	
key	value_x	value_y
0	A	1.418532	NaN
1	B	0.227232	1.011278
2	C	-0.578408	NaN
3	D	1.415853	-0.149207
4	D	1.415853	-0.608430
3、右外连接

SQL语法:

SELECT *
FROM df1
RIGHT OUTER JOIN df2ON df1.key = df2.key;

对应pandas实现:

In :pd.merge(df1, df2, on="key", how="right")
Out :
key	value_x	value_y
0	B	0.227232	1.011278
1	D	1.415853	-0.149207
2	D	1.415853	-0.608430
3	E	NaN	1.437388
4、全外连接

SQL语法:

SELECT *
FROM df1
FULL OUTER JOIN df2ON df1.key = df2.key;

对应pandas实现:

In :pd.merge(df1, df2, on="key", how="outer")
Out :key	value_x	value_y
0	A	1.418532	NaN
1	B	0.227232	1.011278
2	C	-0.578408	NaN
3	D	1.415853	-0.149207
4	D	1.415853	-0.608430
5	E	NaN	1.437388

三、过滤WHERE

SQL中的过滤是通过WHERE子句完成的。

SQL语法:

SELECT *
FROM data
WHERE total_bill >10;

对应pandas实现:

In :data[data["total_bill"] > 10]
Out :
total_bill	tip	sex	smoker	day	time	size	tip_rate	tip_rate2
0	16.99	1.01	Female	No	Sun	Dinner	2	0.059447	0.059447
1	10.34	1.66	Male	No	Sun	Dinner	3	0.160542	0.160542
2	21.01	3.50	Male	No	Sun	Dinner	3	0.166587	0.166587
3	23.68	3.31	Male	No	Sun	Dinner	2	0.139780	0.139780
4	24.59	3.61	Female	No	Sun	Dinner	4	0.146808	0.146808
...	...	...	...	...	...	...	...	...	...
239	29.03	5.92	Male	No	Sat	Dinner	3	0.203927	0.203927
240	27.18	2.00	Female	Yes	Sat	Dinner	2	0.073584	0.073584
241	22.67	2.00	Male	Yes	Sat	Dinner	2	0.088222	0.088222
242	17.82	1.75	Male	No	Sat	Dinner	2	0.098204	0.098204
243	18.78	3.00	Female	No	Thur	Dinner	2	0.159744	0.159744
1、AND

对应pandas中的&

SQL语法:

# 查询晚餐小费超过5美元的数据
SELECT *
FROM data
WHERE time = 'Dinner' AND tip > 5.00;

对应pandas实现:

In :data[(data["time"] == "Dinner") & (data["tip"] > 5.00)]
Out :
total_bill	tip	sex	smoker	day	time	size	tip_rate	tip_rate2
23	39.42	7.58	Male	No	Sat	Dinner	4	0.192288	0.192288
44	30.40	5.60	Male	No	Sun	Dinner	4	0.184211	0.184211
47	32.40	6.00	Male	No	Sun	Dinner	4	0.185185	0.185185
52	34.81	5.20	Female	No	Sun	Dinner	4	0.149382	0.149382
59	48.27	6.73	Male	No	Sat	Dinner	4	0.139424	0.139424
116	29.93	5.07	Male	No	Sun	Dinner	4	0.169395	0.169395
155	29.85	5.14	Female	No	Sun	Dinner	5	0.172194	0.172194
170	50.81	10.00	Male	Yes	Sat	Dinner	3	0.196812	0.196812
172	7.25	5.15	Male	Yes	Sun	Dinner	2	0.710345	0.710345
181	23.33	5.65	Male	Yes	Sun	Dinner	2	0.242177	0.242177
183	23.17	6.50	Male	Yes	Sun	Dinner	4	0.280535	0.280535
211	25.89	5.16	Male	Yes	Sat	Dinner	4	0.199305	0.199305
212	48.33	9.00	Male	No	Sat	Dinner	4	0.186220	0.186220
214	28.17	6.50	Female	Yes	Sat	Dinner	3	0.230742	0.230742
239	29.03	5.92	Male	No	Sat	Dinner	3	0.203927	0.203927
2、OR

对应pandas中的|

SQL语法:

# 查询至少5名用餐者的小费或账单总额超过45美元的数据
SELECT *
FROM data
WHERE size >= 5 OR total_bill > 45;

对应pandas实现:

In :data[(data["size"] >= 5) | (data["total_bill"] > 45)]
Out :
total_bill	tip	sex	smoker	day	time	size	tip_rate	tip_rate2
59	48.27	6.73	Male	No	Sat	Dinner	4	0.139424	0.139424
125	29.80	4.20	Female	No	Thur	Lunch	6	0.140940	0.140940
141	34.30	6.70	Male	No	Thur	Lunch	6	0.195335	0.195335
142	41.19	5.00	Male	No	Thur	Lunch	5	0.121389	0.121389
143	27.05	5.00	Female	No	Thur	Lunch	6	0.184843	0.184843
155	29.85	5.14	Female	No	Sun	Dinner	5	0.172194	0.172194
156	48.17	5.00	Male	No	Sun	Dinner	6	0.103799	0.103799
170	50.81	10.00	Male	Yes	Sat	Dinner	3	0.196812	0.196812
182	45.35	3.50	Male	Yes	Sun	Dinner	3	0.077178	0.077178
185	20.69	5.00	Male	No	Sun	Dinner	5	0.241663	0.241663
187	30.46	2.00	Male	Yes	Sun	Dinner	5	0.065660	0.065660
212	48.33	9.00	Male	No	Sat	Dinner	4	0.186220	0.186220
216	28.15	3.00	Male	Yes	Sat	Dinner	5	0.106572	0.106572
3、IS NULL

构造测试数据

In :frame = pd.DataFrame({"col1": ["A", "B", np.nan, "C", "D"], "col2": ["F", np.nan, "G", "H", "I"]}
)

SQL语法:

SELECT *
FROM frame
WHERE col2 IS NULL;

对应pandas实现:

In :frame[frame["col2"].isna()]
Out :
col1	col2
1	B	NaN
4、IS NOT NULL

SQL语法:

SELECT *
FROM frame
WHERE col1 IS NOT NULL;

对应pandas实现:

In :frame[frame["col1"].notna()]
Out :
col1	col2
0	A	F
1	B	NaN
3	C	H
4	D	I
5、BETWEEN

SQL语法:

SELECT *
FROM data
WHERE tip between 5 and 7;

对应pandas实现:

In :data[data['tip'].between(5, 7)]
Out :total_bill	tip	sex	smoker	day	time	size	tip_rate	tip_rate2
11	35.26	5.00	Female	No	Sun	Dinner	4	0.141804	0.141804
39	31.27	5.00	Male	No	Sat	Dinner	3	0.159898	0.159898
44	30.40	5.60	Male	No	Sun	Dinner	4	0.184211	0.184211
46	22.23	5.00	Male	No	Sun	Dinner	2	0.224921	0.224921
47	32.40	6.00	Male	No	Sun	Dinner	4	0.185185	0.185185
52	34.81	5.20	Female	No	Sun	Dinner	4	0.149382	0.149382
59	48.27	6.73	Male	No	Sat	Dinner	4	0.139424	0.139424
73	25.28	5.00	Female	Yes	Sat	Dinner	2	0.197785	0.197785
83	32.68	5.00	Male	Yes	Thur	Lunch	2	0.152999	0.152999
85	34.83	5.17	Female	No	Thur	Lunch	4	0.148435	0.148435
88	24.71	5.85	Male	No	Thur	Lunch	2	0.236746	0.236746
116	29.93	5.07	Male	No	Sun	Dinner	4	0.169395	0.169395
141	34.30	6.70	Male	No	Thur	Lunch	6	0.195335	0.195335
142	41.19	5.00	Male	No	Thur	Lunch	5	0.121389	0.121389
143	27.05	5.00	Female	No	Thur	Lunch	6	0.184843	0.184843
155	29.85	5.14	Female	No	Sun	Dinner	5	0.172194	0.172194
156	48.17	5.00	Male	No	Sun	Dinner	6	0.103799	0.103799
172	7.25	5.15	Male	Yes	Sun	Dinner	2	0.710345	0.710345
181	23.33	5.65	Male	Yes	Sun	Dinner	2	0.242177	0.242177
183	23.17	6.50	Male	Yes	Sun	Dinner	4	0.280535	0.280535
185	20.69	5.00	Male	No	Sun	Dinner	5	0.241663	0.241663
197	43.11	5.00	Female	Yes	Thur	Lunch	4	0.115982	0.115982
211	25.89	5.16	Male	Yes	Sat	Dinner	4	0.199305	0.199305
214	28.17	6.50	Female	Yes	Sat	Dinner	3	0.230742	0.230742
239	29.03	5.92	Male	No	Sat	Dinner	3	0.203927	0.203927
6、LIKE

开头/结尾字符匹配可以用startswith()/endswith()函数实现

SQL语法:

SELECT *
FROM data
WHERE time like 'Di%';

对应pandas实现:

In :data[data['time'].str.startswith('Di')]
Out :
total_bill	tip	sex	smoker	day	time	size	tip_rate	tip_rate2
0	16.99	1.01	Female	No	Sun	Dinner	2	0.059447	0.059447
1	10.34	1.66	Male	No	Sun	Dinner	3	0.160542	0.160542
2	21.01	3.50	Male	No	Sun	Dinner	3	0.166587	0.166587
3	23.68	3.31	Male	No	Sun	Dinner	2	0.139780	0.139780
4	24.59	3.61	Female	No	Sun	Dinner	4	0.146808	0.146808
...	...	...	...	...	...	...	...	...	...
239	29.03	5.92	Male	No	Sat	Dinner	3	0.203927	0.203927
240	27.18	2.00	Female	Yes	Sat	Dinner	2	0.073584	0.073584
241	22.67	2.00	Male	Yes	Sat	Dinner	2	0.088222	0.088222
242	17.82	1.75	Male	No	Sat	Dinner	2	0.098204	0.098204
243	18.78	3.00	Female	No	Thur	Dinner	2	0.159744	0.159744

中间字符匹配可以用contains()函数实现,na参数设置为False表示在缺失值上不返回True,case参数设置为False表示不区分大小写匹配

SQL语法:

SELECT *
FROM data
WHERE time like '%inne%';

对应pandas实现:

In :data[data['time'].str.contains('inne', na=False, case=False)]
Out :
total_bill	tip	sex	smoker	day	time	size	tip_rate	tip_rate2
0	16.99	1.01	Female	No	Sun	Dinner	2	0.059447	0.059447
1	10.34	1.66	Male	No	Sun	Dinner	3	0.160542	0.160542
2	21.01	3.50	Male	No	Sun	Dinner	3	0.166587	0.166587
3	23.68	3.31	Male	No	Sun	Dinner	2	0.139780	0.139780
4	24.59	3.61	Female	No	Sun	Dinner	4	0.146808	0.146808
...	...	...	...	...	...	...	...	...	...
239	29.03	5.92	Male	No	Sat	Dinner	3	0.203927	0.203927
240	27.18	2.00	Female	Yes	Sat	Dinner	2	0.073584	0.073584
241	22.67	2.00	Male	Yes	Sat	Dinner	2	0.088222	0.088222
242	17.82	1.75	Male	No	Sat	Dinner	2	0.098204	0.098204
243	18.78	3.00	Female	No	Thur	Dinner	2	0.159744	0.159744
7、CASE WHEN

SQL语法:

SELECT tip,case when tip<2 then 'LOW'when 2<=tip<=3 then 'MID'when 3<tip then 'HIG'end flag
FROM data;

对应pandas实现:

In :data['flag'] = data['tip'].apply(lambda x: 'LOW' if x < 2 else ('MID' if 2 <= x <= 3 else 'HIG'))
In :data[['tip', 'flag']]
Out :tip	flag
0	1.01	LOW
1	1.66	LOW
2	3.50	HIG
3	3.31	HIG
4	3.61	HIG
...	...	...
239	5.92	HIG
240	2.00	MID
241	2.00	MID
242	1.75	LOW
243	3.00	MID

四、分组GROUP BY

在pandas中,SQL的GROUP BY操作是使用类似名称的 groupby()方法。配合aggregate_function()使用

1、count()

SQL语法:

SELECT sex, count(*)
FROM data
GROUP BY sex;

对应pandas实现:

In :data.groupby("sex").size()
Out :
sex
Female     87
Male      157
dtype: int64
2、avg()

SQL语法:

SELECT day, AVG(tip), COUNT(*)
FROM tips
GROUP BY day;

对应pandas实现:

In :data.groupby("day").agg({"tip": "mean", "day": "size"})
Out :
tip	day
day		
Fri	2.734737	19
Sat	2.993103	87
Sun	3.255132	76
Thur	2.771452	62
3、sum()、max()、min()

SQL语法:

SELECT day, AVG(tip), SUM(tip), MAX(tip), MIN(tip), COUNT(tip)
FROM data
GROUP BY day;

对应pandas实现:

In :data.groupby("day").agg({"tip": ["mean", "sum", "max", "min"],"day": "size"
}).reset_index()
Out :
day	tip	day
mean	sum	max	min	size
0	Fri	2.734737	51.96	4.73	1.00	19
1	Sat	2.993103	260.40	10.00	1.00	87
2	Sun	3.255132	247.39	6.50	1.01	76
3	Thur	2.771452	171.83	6.70	1.25	62

五、HAVING

SQL语法:

SELECT day, AVG(tip), SUM(tip), MAX(tip), MIN(tip), COUNT(*)
FROM data
GROUP BY day
HAVING SUM(tip) > 200;

对应pandas实现:

In :result = data.groupby("day").agg({"tip": ["mean", "sum", "max", "min"],"day": "size"
}).reset_index()
In :result.columns = ['day', 'avg_tip', 'sum_tip', 'max_tip', 'min_tip', 'count_tips']
In :result[result['sum_tip'] > 200].reset_index()
Out :index	day	avg_tip	sum_tip	max_tip	min_tip	count_tips
0	1	Sat	2.993103	260.40	10.0	1.00	87
1	2	Sun	3.255132	247.39	6.5	1.01	76

六、排序ORDER BY

SQL语法:

SELECT *
FROM data
ORDER BY tip;

对应pandas实现:

In :data.sort_values("tip")
Out :
total_bill	tip	sex	smoker	day	time	size	tip_rate	tip_rate2
67	3.07	1.00	Female	Yes	Sat	Dinner	1	0.325733	0.325733
236	12.60	1.00	Male	Yes	Sat	Dinner	2	0.079365	0.079365
92	5.75	1.00	Female	Yes	Fri	Dinner	2	0.173913	0.173913
111	7.25	1.00	Female	No	Sat	Dinner	1	0.137931	0.137931
0	16.99	1.01	Female	No	Sun	Dinner	2	0.059447	0.059447
...	...	...	...	...	...	...	...	...	...
141	34.30	6.70	Male	No	Thur	Lunch	6	0.195335	0.195335
59	48.27	6.73	Male	No	Sat	Dinner	4	0.139424	0.139424
23	39.42	7.58	Male	No	Sat	Dinner	4	0.192288	0.192288
212	48.33	9.00	Male	No	Sat	Dinner	4	0.186220	0.186220
170	50.81	10.00	Male	Yes	Sat	Dinner	3	0.196812	0.196812

SQL语法:

SELECT *
FROM data
ORDER BY tip,total_bill;

对应pandas实现:

In :data.sort_values(["tip","total_bill"])
Out :
total_bill	tip	sex	smoker	day	time	size	tip_rate	tip_rate2
67	3.07	1.00	Female	Yes	Sat	Dinner	1	0.325733	0.325733
92	5.75	1.00	Female	Yes	Fri	Dinner	2	0.173913	0.173913
111	7.25	1.00	Female	No	Sat	Dinner	1	0.137931	0.137931
236	12.60	1.00	Male	Yes	Sat	Dinner	2	0.079365	0.079365
0	16.99	1.01	Female	No	Sun	Dinner	2	0.059447	0.059447
...	...	...	...	...	...	...	...	...	...
141	34.30	6.70	Male	No	Thur	Lunch	6	0.195335	0.195335
59	48.27	6.73	Male	No	Sat	Dinner	4	0.139424	0.139424
23	39.42	7.58	Male	No	Sat	Dinner	4	0.192288	0.192288
212	48.33	9.00	Male	No	Sat	Dinner	4	0.186220	0.186220
170	50.81	10.00	Male	Yes	Sat	Dinner	3	0.196812	0.196812

SQL语法:

SELECT *
FROM data
ORDER BY tip asc,total_bill desc;

对应pandas实现:

In :data.sort_values(by=["tip", "total_bill"], ascending=[True, False])
Out :total_bill	tip	sex	smoker	day	time	size	tip_rate	tip_rate2
236	12.60	1.00	Male	Yes	Sat	Dinner	2	0.079365	0.079365
111	7.25	1.00	Female	No	Sat	Dinner	1	0.137931	0.137931
92	5.75	1.00	Female	Yes	Fri	Dinner	2	0.173913	0.173913
67	3.07	1.00	Female	Yes	Sat	Dinner	1	0.325733	0.325733
0	16.99	1.01	Female	No	Sun	Dinner	2	0.059447	0.059447
...	...	...	...	...	...	...	...	...	...
141	34.30	6.70	Male	No	Thur	Lunch	6	0.195335	0.195335
59	48.27	6.73	Male	No	Sat	Dinner	4	0.139424	0.139424
23	39.42	7.58	Male	No	Sat	Dinner	4	0.192288	0.192288
212	48.33	9.00	Male	No	Sat	Dinner	4	0.186220	0.186220
170	50.81	10.00	Male	Yes	Sat	Dinner	3	0.196812	0.196812

七、LIMIT/OFFSET

1、LIMIT

在pandas中使用head()实现

SQL语法:

SELECT * 
FROM data
LIMIT 10;

对应pandas实现:

In :data.head(10)
Out :
total_bill	tip	sex	smoker	day	time	size	tip_rate	tip_rate2
0	16.99	1.01	Female	No	Sun	Dinner	2	0.059447	0.059447
1	10.34	1.66	Male	No	Sun	Dinner	3	0.160542	0.160542
2	21.01	3.50	Male	No	Sun	Dinner	3	0.166587	0.166587
3	23.68	3.31	Male	No	Sun	Dinner	2	0.139780	0.139780
4	24.59	3.61	Female	No	Sun	Dinner	4	0.146808	0.146808
5	25.29	4.71	Male	No	Sun	Dinner	4	0.186240	0.186240
6	8.77	2.00	Male	No	Sun	Dinner	2	0.228050	0.228050
7	26.88	3.12	Male	No	Sun	Dinner	4	0.116071	0.116071
8	15.04	1.96	Male	No	Sun	Dinner	2	0.130319	0.130319
9	14.78	3.23	Male	No	Sun	Dinner	2	0.218539	0.218539
2、指定列中最大的前N行

SQL语法:

SELECT * 
FROM data
ORDER BY tip DESC
LIMIT 10;

对应pandas实现:

In :data.nlargest(10, columns="tip")
或
In :data.sort_values(by="tip", ascending=False).head(10)
Out :
total_bill	tip	sex	smoker	day	time	size	tip_rate	tip_rate2
170	50.81	10.00	Male	Yes	Sat	Dinner	3	0.196812	0.196812
212	48.33	9.00	Male	No	Sat	Dinner	4	0.186220	0.186220
23	39.42	7.58	Male	No	Sat	Dinner	4	0.192288	0.192288
59	48.27	6.73	Male	No	Sat	Dinner	4	0.139424	0.139424
141	34.30	6.70	Male	No	Thur	Lunch	6	0.195335	0.195335
183	23.17	6.50	Male	Yes	Sun	Dinner	4	0.280535	0.280535
214	28.17	6.50	Female	Yes	Sat	Dinner	3	0.230742	0.230742
47	32.40	6.00	Male	No	Sun	Dinner	4	0.185185	0.185185
239	29.03	5.92	Male	No	Sat	Dinner	3	0.203927	0.203927
88	24.71	5.85	Male	No	Thur	Lunch	2	0.236746	0.236746
3、OFFSET

跳过排序后的前5行,选出接下来的10行

SQL语法:

SELECT * FROM tips
ORDER BY tip DESC
LIMIT 10 OFFSET 5;

对应pandas实现:

In :data.sort_values(by="tip", ascending=False).iloc[5:15]
Out :	
total_bill	tip	sex	smoker	day	time	size	tip_rate	tip_rate2
214	28.17	6.50	Female	Yes	Sat	Dinner	3	0.230742	0.230742
183	23.17	6.50	Male	Yes	Sun	Dinner	4	0.280535	0.280535
47	32.40	6.00	Male	No	Sun	Dinner	4	0.185185	0.185185
239	29.03	5.92	Male	No	Sat	Dinner	3	0.203927	0.203927
88	24.71	5.85	Male	No	Thur	Lunch	2	0.236746	0.236746
181	23.33	5.65	Male	Yes	Sun	Dinner	2	0.242177	0.242177
44	30.40	5.60	Male	No	Sun	Dinner	4	0.184211	0.184211
52	34.81	5.20	Female	No	Sun	Dinner	4	0.149382	0.149382
85	34.83	5.17	Female	No	Thur	Lunch	4	0.148435	0.148435
211	25.89	5.16	Male	Yes	Sat	Dinner	4	0.199305	0.199305

八、UNION ALL/UNION

pandas中使用concat()函数实现

构造测试数据

In :df1 = pd.DataFrame({"city": ["Chicago", "San Francisco", "New York City"], "rank": range(1, 4)}
)
In :df2 = pd.DataFrame({"city": ["Chicago", "Boston", "Los Angeles"], "rank": [1, 4, 5]}
)
1、UNION ALL

SQL语法:

SELECT city, rank
FROM df1
UNION ALL
SELECT city, rank
FROM df2;

对应pandas实现:

In :pd.concat([df1, df2])
Out :
city	rank
0	Chicago	1
1	San Francisco	2
2	New York City	3
0	Chicago	1
1	Boston	4
2	Los Angeles	5
2、UNION

SQL语法:

SELECT city, rank
FROM df1
UNION
SELECT city, rank
FROM df2;

对应pandas实现:

In :pd.concat([df1, df2]).drop_duplicates()
Out :city	rank
0	Chicago	1
1	San Francisco	2
2	New York City	3
1	Boston	4
2	Los Angeles	5

九、开窗函数

1、ROW_NUMBER()

为结果集中的每一行分配一个唯一的数字,顺序为1,2,3,4,5……

SQL语法:

查询每天total_bill最大的两行数据

SELECT * FROM (SELECTt.*,ROW_NUMBER() OVER(PARTITION BY day ORDER BY total_bill DESC) AS rnFROM data t
)
WHERE rn < 3
ORDER BY day, rn;

对应pandas实现:

In :(data.assign(rn=data.sort_values(["total_bill"], ascending=False).groupby(["day"]).cumcount()+ 1).query("rn < 3").sort_values(["day", "rn"])
)
Out :
total_bill	tip	sex	smoker	day	time	size	tip_rate	tip_rate2	rn
95	40.17	4.73	Male	Yes	Fri	Dinner	4	0.117750	0.117750	1
90	28.97	3.00	Male	Yes	Fri	Dinner	2	0.103555	0.103555	2
170	50.81	10.00	Male	Yes	Sat	Dinner	3	0.196812	0.196812	1
212	48.33	9.00	Male	No	Sat	Dinner	4	0.186220	0.186220	2
156	48.17	5.00	Male	No	Sun	Dinner	6	0.103799	0.103799	1
182	45.35	3.50	Male	Yes	Sun	Dinner	3	0.077178	0.077178	2
197	43.11	5.00	Female	Yes	Thur	Lunch	4	0.115982	0.115982	1
142	41.19	5.00	Male	No	Thur	Lunch	5	0.121389	0.121389	2
2、RANK()

为结果集中的每一行分配一个排名,相同的值会获得相同的排名,但会跳过之后的排名,顺序为1,2,2,4,5,5,5,8……

SQL语法:

查询每天total_bill最大的两行数据

SELECT * FROM (SELECTt.*,RANK() OVER(PARTITION BY day ORDER BY total_bill DESC) AS rnFROM data t
)
WHERE rn < 3
ORDER BY day, rn;

对应pandas实现:

In :(data.assign(rnk=data.groupby(["day"])["total_bill"].rank(method="first", ascending=False)).query("rnk < 3").sort_values(["day", "rnk"])
)
Out :
total_bill	tip	sex	smoker	day	time	size	tip_rate	tip_rate2	rnk
95	40.17	4.73	Male	Yes	Fri	Dinner	4	0.117750	0.117750	1.0
90	28.97	3.00	Male	Yes	Fri	Dinner	2	0.103555	0.103555	2.0
170	50.81	10.00	Male	Yes	Sat	Dinner	3	0.196812	0.196812	1.0
212	48.33	9.00	Male	No	Sat	Dinner	4	0.186220	0.186220	2.0
156	48.17	5.00	Male	No	Sun	Dinner	6	0.103799	0.103799	1.0
182	45.35	3.50	Male	Yes	Sun	Dinner	3	0.077178	0.077178	2.0
197	43.11	5.00	Female	Yes	Thur	Lunch	4	0.115982	0.115982	1.0
142	41.19	5.00	Male	No	Thur	Lunch	5	0.121389	0.121389	2.0
3、SUM()

SQL语法:

SELECTt.*,SUM() OVER(PARTITION BY day) AS snFROM data t;
In :data['sn'] = data.groupby('day')['total_bill'].cumsum()
In :data
Out :total_bill	tip	sex	smoker	day	time	size	tip_rate	tip_rate2	sn
0	16.99	1.01	Female	No	Sun	Dinner	2	0.059447	0.059447	16.99
1	10.34	1.66	Male	No	Sun	Dinner	3	0.160542	0.160542	27.33
2	21.01	3.50	Male	No	Sun	Dinner	3	0.166587	0.166587	48.34
3	23.68	3.31	Male	No	Sun	Dinner	2	0.139780	0.139780	72.02
4	24.59	3.61	Female	No	Sun	Dinner	4	0.146808	0.146808	96.61
...	...	...	...	...	...	...	...	...	...	...
239	29.03	5.92	Male	No	Sat	Dinner	3	0.203927	0.203927	1710.73
240	27.18	2.00	Female	Yes	Sat	Dinner	2	0.073584	0.073584	1737.91
241	22.67	2.00	Male	Yes	Sat	Dinner	2	0.088222	0.088222	1760.58
242	17.82	1.75	Male	No	Sat	Dinner	2	0.098204	0.098204	1778.40
243	18.78	3.00	Female	No	Thur	Dinner	2	0.159744	0.159744	1096.33

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/web/65814.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

第432场周赛:跳过交替单元格的之字形遍历、机器人可以获得的最大金币数、图的最大边权的最小值、统计 K 次操作以内得到非递减子数组的数目

Q1、跳过交替单元格的之字形遍历 1、题目描述 给你一个 m x n 的二维数组 grid&#xff0c;数组由 正整数 组成。 你的任务是以 之字形 遍历 grid&#xff0c;同时跳过每个 交替 的单元格。 之字形遍历的定义如下&#xff1a; 从左上角的单元格 (0, 0) 开始。在当前行中向…

《探索鸿蒙Next上开发人工智能游戏应用的技术难点》

在科技飞速发展的当下&#xff0c;鸿蒙Next系统为应用开发带来了新的机遇与挑战&#xff0c;开发一款运行在鸿蒙Next上的人工智能游戏应用更是备受关注。以下是在开发过程中可能会遇到的一些技术难点&#xff1a; 鸿蒙Next系统适配性 多设备协同&#xff1a;鸿蒙Next的一大特色…

Harry技术添加存储(minio、aliyun oss)、短信sms(aliyun、模拟)、邮件发送等功能

Harry技术添加存储&#xff08;minio、aliyun oss&#xff09;、短信sms&#xff08;aliyun、模拟&#xff09;、邮件发送等功能 基于SpringBoot3Vue3前后端分离的Java快速开发框架 项目简介&#xff1a;基于 JDK 17、Spring Boot 3、Spring Security 6、JWT、Redis、Mybatis-P…

Vue2: el-table为每一行添加超链接,并实现光标移至文字上时改变形状

为表格中的某一列添加超链接 一个表格通常有许多列,网上许多教程都可以实现为某一列添加超链接,如下,实现了当光标悬浮在“姓名”上时,改变为手形,点击可实现跳转。 <el-table :data="tableData"><el-table-column label="姓名" prop=&quo…

R数据分析:多分类问题预测模型的ROC做法及解释

有同学做了个多分类的预测模型,结局有三个类别,做的模型包括多分类逻辑回归、随机森林和决策树,多分类逻辑回归是用ROC曲线并报告AUC作为模型评估的,后面两种模型报告了混淆矩阵,审稿人就提出要统一模型评估指标。那么肯定是统一成ROC了,刚好借这个机会给大家讲讲ROC在多…

A3. Springboot3.x集成LLama3.2实战

本文将介绍集成ollama官网提供的API在Springboot工程中进行整合。由于没找到java-llama相关合适的sdk可以使用,因此只好对接官方给出的API开发一套RESTFull API服务。下面将从Ollama以下几个API展开介绍,逐渐的了解其特性以及可以干些什么。具体llama API说明可参数我前面写的…

面试:类模版中函数声明在.h,定义在.cpp中,其他cpp引用引入这个头文件,会有什么错误?

1、概述 类模版中函数声明在.h&#xff0c;定义在.cpp中&#xff0c;其他cpp引用引入这个头文件&#xff0c;会有什么错误?报编译错误&#xff1a;error C2512: Demo<int>: no appropriate default constructor available 举例如下代码&#xff1a;demo.h 声明模版类 …

记一次学习skynet中的C/Lua接口编程解析protobuf过程

1.引言 最近在学习skynet过程中发现在网络收发数据的过程中数据都是裸奔&#xff0c;就想加入一种数据序列化方式&#xff0c;json、xml简单好用&#xff0c;但我就是不想用&#xff0c;于是就想到了protobuf&#xff0c;对于protobuf C/C的使用个人感觉有点重&#xff0c;正好…

SQLAlchemy

https://docs.sqlalchemy.org.cn/en/20/orm/quickstart.htmlhttps://docs.sqlalchemy.org.cn/en/20/orm/quickstart.html 声明模型 在这里&#xff0c;我们定义模块级构造&#xff0c;这些构造将构成我们从数据库中查询的结构。这种结构被称为 声明式映射&#xff0c;它同时定…

Trimble自动化激光监测支持历史遗产实现可持续发展【沪敖3D】

故事桥&#xff08;Story Bridge&#xff09;位于澳大利亚布里斯班&#xff0c;建造于1940年&#xff0c;全长777米&#xff0c;横跨布里斯班河&#xff0c;可载汽车、自行车和行人往返于布里斯班的北部和南部郊区。故事桥是澳大利亚最长的悬臂桥&#xff0c;是全世界两座手工建…

CentOS 和 Ubantu你该用哪个

文章目录 **一、CentOS 和 Ubuntu 的详细介绍****1. CentOS****1.1 基本信息****1.2 特点****1.3 缺点** **2. Ubuntu****2.1 基本信息****2.2 特点****2.3 缺点** **二、CentOS 和 Ubuntu 的异同****1. 相同点****2. 不同点****3. 使用体验对比** **三、总结和选择建议** Cent…

Android RIL(Radio Interface Layer)全面概述和知识要点(3万字长文)

在Android面试时,懂得越多越深android framework的知识,越为自己加分。 目录 第一章:RIL 概述 1.1 RIL 的定义与作用 1.2 RIL 的发展历程 1.3 RIL 与 Android 系统的关系 第二章:RIL 的架构与工作原理 2.1 RIL 的架构组成 2.2 RIL 的工作原理 2.3 RIL 的接口与协议…

前端学习-事件对象与典型案例(二十六)

目录 前言 事件对象 目标 事件对象是什么 语法 获取事件对象 部分常用属性 示例代码 示例代码&#xff1a;评论回车发布 总结 前言 长风破浪会有时&#xff0c;直挂云帆济沧海。 事件对象 目标 能说出什么是事件对象 事件对象是什么 也是个对象&#xff0c;这个对…

Playwright vs Selenium:全面对比分析

在现代软件开发中&#xff0c;自动化测试工具在保证应用质量和加快开发周期方面发挥着至关重要的作用。Selenium 作为自动化测试领域的老牌工具&#xff0c;长期以来被广泛使用。而近年来&#xff0c;Playwright 作为新兴工具迅速崛起&#xff0c;吸引了众多开发者的关注。那么…

Windows 程序设计3:宽窄字节的区别及重要性

文章目录 前言一、宽窄字节简介二、操作系统及VS编译器对宽窄字节的编码支持1. 操作系统2. 编译器 三、宽窄字符串的优缺点四、宽窄字节数据类型总结 前言 Windows 程序设计3&#xff1a;宽窄字节的区别及重要性。 一、宽窄字节简介 在C中&#xff0c;常用的字符串指针就是ch…

进阶——十六届蓝桥杯嵌入式熟练度练习(LED的全开,全闭,点亮指定灯,交替闪烁,PWM控制LED呼吸灯)

点亮灯的函数 void led_show(unsigned char upled) { HAL_GPIO_WritePin(GPIOC,GPIO_PIN_All,GPIO_PIN_SET); HAL_GPIO_WritePin(GPIOC,upled<<8,GPIO_PIN_RESET); HAL_GPIO_WritePin(GPIOD,GPIO_PIN_2,GPIO_PIN_SET); HAL_GPIO_WritePin(GPIOD,GPIO_PIN_2,GPIO_PIN_RE…

力扣 最大子数组和

动态规划&#xff0c;前缀和&#xff0c;维护状态更新。 题目 从题可以看出&#xff0c;找的是最大和的连续子数组&#xff0c;即一个数组中的其中一个连续部分。从前往后遍历&#xff0c;每遍历到一个数可以尝试做叠加&#xff0c;注意是尝试&#xff0c;因为有可能会遇到一个…

Homestyler 和 Tripo AI 如何利用人工智能驱动的 3D 建模改变定制室内设计

让设计梦想照进现实 在Homestyler,我们致力于为每一个梦想设计师提供灵感的源泉,而非挫折。无论是初学者打造第一套公寓,或是专业设计师展示作品集,我们的直观工具都能让您轻松以惊人的3D形式呈现空间。 挑战:实现定制设计的新纪元 我们知道,将个人物品如传家宝椅子、…

如何当前正在运行的 Elasticsearch 集群信息

要查看当前正在运行的 Elasticsearch 集群信息&#xff0c;可以通过以下几种方法&#xff1a; 1. 使用 _cluster/health API _cluster/health API 返回集群的健康状态、节点数量、分片状态等信息。可以用 curl 命令直接访问&#xff1a; curl -X GET "http://localhost…

算法练习4——一个六位数

这道题特别妙 大家仔细做一做 我这里采用的是动态规划来解这道题 结合题目要求找出数与数之间的规律 抽象出状态转移方程 题目描述 有一个六位数&#xff0c;其个位数字 7 &#xff0c;现将个位数字移至首位&#xff08;十万位&#xff09;&#xff0c;而其余各位数字顺序不…