人工智能数据分析Python常用库 03 pandas库

文章目录

  • 一、对象创建
    • 1、Series对象
      • (1)用列表创建
      • (2)用一维numpy数组创建
      • (3)用字典创建
      • (4)data为标量的情况
    • 2、DataFrame对象
      • (1)通过Series对象创建
      • (2)通过Series对象字典创建
      • (3)通过字典列表对象创建
      • (4)通过numpy二维数组创建
  • 二、DataFrame性质
    • 1、属性
      • (1)values返回numpy数组表示的数据
      • (2)index返回行索引
      • (3)columns返回列索引
      • (4)shape返回形状
      • (5)size返回大小
      • (6)dtypes返回每列数据类型
    • 2、索引
      • (1)获取列
      • (2)获取行
      • (3)获取标量
      • (4)Series对象的索引
    • 3、切片
      • (1)行切片
      • (2)列切片
      • (3)多种多样的取值
    • 4、布尔索引
    • 5、赋值
  • 三、数值运算及统计分析
    • 1、数据查看
      • (1)查看前面的行
      • (2)查看后面的行
      • (3)查看总体信息
    • 2、numpy通用函数适用于pandas
      • (1)向量化运算
      • (2)矩阵化运算
      • (3)广播运算
    • 3、新的用法
      • (1)索引对齐
      • (2)统计相关
  • 四、缺失值处理
    • 1、发现缺失值
    • 2、删除缺失值
      • (1)删除整行
      • (2)删除整列
    • 3、填充缺失值
  • 五、合并数据
    • 1、垂直合并
    • 2、水平合并
    • 3、索引重叠
    • 4、对齐合并merge()
    • 5、例:合并城市信息
  • 六、分组和数据透视表
    • 1、分组
      • (1)延迟计算
      • (2)按列取值
      • (3)按组迭代
      • (4)调用方法
      • (5)支持更复杂的操作
      • (6)过滤
      • (7)转换
      • (8)apply()方法
      • (9)将列表、数组设为分组间
      • (10)用字典将索引映射到分组
      • (11)任意Python函数
      • (12)多个有效值组成的列表
      • (13)例:行星观测数据处理
    • 2、数据透视表
  • 七、多级索引:多用于多维数据
  • 八、高性能的pandas
    • 1、eval()和query()用法
    • 2、eval()和query()使用时机

一、对象创建

1、Series对象

Series是带标签数据的一维数组

(1)用列表创建

pd.Series(data, index=index, dtype=dtype)
data:数据,可以是列表,字典或numpy数组
index:索引,为可选参数
dtype:数据类型,为可选参数

①示例

import pandas as pd
# index缺省,默认为整数序列
data = pd.Series([2,4,3,6])
print(data)
'''
0    2
1    4
2    3
3    6
dtype: int64
'''

②增加index

import pandas as pddata = pd.Series([2,4,3,6],index=["a", "b", "c", "d"])
print(data)
'''
a    2
b    4
c    3
d    6
dtype: int64
'''

③增加数据类型

import pandas as pddata = pd.Series([2,4,3,6],index=["a", "b", "c", "d"],dtype=float)
print(data)
'''
a    2.0
b    4.0
c    3.0
d    6.0
dtype: float64
'''
print(data["c"])    # 3.0
'''

④数据类型可被强制改变

import pandas as pddata = pd.Series([2,4,"3",6],index=["a", "b", "c", "d"],dtype=float)
print(data)
'''
a    2.0
b    4.0
c    3.0
d    6.0
dtype: float64
'''
print(data["c"])    # 3.0

(2)用一维numpy数组创建

import pandas as pd
import numpy as npx = np.arange(5)
data = pd.Series(x)
print(data)
'''
0    0
1    1
2    2
3    3
4    4
dtype: int32
'''

(3)用字典创建

默认以键为index,值为data

import pandas as pddic = {"x":1,"y":10}
data = pd.Series(dic)
print(data)
'''
x     1
y    10
dtype: int64
'''

字典创建,如果指定index,则会到字典的键中筛选,找不到的,值设为NaN

import pandas as pddic = {"x":1,"y":10}
data = pd.Series(dic, index=["x","z"])
print(data)
'''
x    1.0
z    NaN
dtype: float64
'''

(4)data为标量的情况

x    5
z    5
dtype: int64

2、DataFrame对象

DataFrame是带标签数据的多维数组
pd.DataFrame(data, index=index, columns=columns)
data:数据,可以是列表,字典或numpy数组
index:索引,为可选参数
columns:列标签,为可选参数

(1)通过Series对象创建

import pandas as pddic = {"beijing":110, "shanghai":370}
popu = pd.Series(dic)
dpopu = pd.DataFrame(popu)
print(dpopu)
'''0
beijing   110
shanghai  370
'''
import pandas as pddic = {"beijing":110, "shanghai":370}
popu = pd.Series(dic)
dpopu = pd.DataFrame(popu,columns=["icode"])
print(dpopu)
'''icode
beijing     110
shanghai    370
'''

(2)通过Series对象字典创建

import pandas as pddic1 = {"beijing":2300, "shanghai":2100}
dic2 = {"beijing":110, "shanghai":370}
popu = pd.Series(dic1)
icode = pd.Series(dic2)
data = pd.DataFrame({"population":popu, "icode":icode, "country":"China"})
# country数量不够,会自动补齐
print(data)
'''population  icode country
beijing         2300    110   China
shanghai        2100    370   China
'''

(3)通过字典列表对象创建

列表索引作为index,字典键作为columns

import pandas as pddata = [{"a":i, "b":2*i} for i in range(3)]
print(pd.DataFrame(data))
'''a  b
0  0  0
1  1  2
2  2  4
'''

不存在的键,会默认值为NaN

import pandas as pddata = [{"a":1,"b":2},{"b":3,"c":10}]
print(pd.DataFrame(data))
'''a  b     c
0  1.0  2   NaN
1  NaN  3  10.0
'''

(4)通过numpy二维数组创建

import pandas as pd
import numpy as npdata = pd.DataFrame(np.random.randint(10, size=(3,2)), \columns=["foo","bar"],index=["a","b","c"])
print(data)
'''foo  bar
a    3    2
b    1    8
c    3    5
'''

二、DataFrame性质

1、属性

(1)values返回numpy数组表示的数据

import pandas as pd
import numpy as npdata = pd.DataFrame(np.random.randint(10, size=(3,2)), \columns=["foo","bar"],index=["a","b","c"])# print(data)
print(data.values)
'''
[[0 5][7 0][4 8]]
'''

(2)index返回行索引

print(data.index)
'''
Index(['a', 'b', 'c'], dtype='object')
'''

(3)columns返回列索引

print(data.columns)
'''
Index(['foo', 'bar'], dtype='object')
'''

(4)shape返回形状

print(data.shape)   # (3, 2)

(5)size返回大小

print(data.size)   # 6

(6)dtypes返回每列数据类型

print(data.dtypes)
'''
foo    int32
bar    int32
dtype: object
'''

2、索引

(1)获取列

字典式:

import pandas as pd
import numpy as npdata = pd.DataFrame(np.arange(6).reshape(3,2), \columns=["foo","bar"],index=["a","b","c"])print(data)
'''foo  bar
a    0    1
b    2    3
c    4    5
'''
print(data["foo"])
'''
a    0
b    2
c    4
Name: foo, dtype: int32
'''
print(data[["foo", "bar"]])
'''foo  bar
a    0    1
b    2    3
c    4    5
'''

对象属性式

print(data.bar)
'''
a    1
b    3
c    5
Name: bar, dtype: int32
'''

(2)获取行

绝对索引 loc

print(data.loc["b"])
'''
foo    2
bar    3
Name: b, dtype: int32
'''

相对索引 iloc

print(data.iloc[1])
'''
foo    2
bar    3
Name: b, dtype: int32
'''
print(data.iloc[[0,2]])
'''foo  bar
a    0    1
c    4    5
'''

(3)获取标量

print(data.loc["b","bar"])  # 3
print(data.iloc[0,1])       # 1
print(data.values[0][1])    # 1

(4)Series对象的索引

print(type(data.foo))   # <class 'pandas.core.series.Series'>print(data.foo["c"]) # 4

3、切片

import pandas as pd
import numpy as npdatas = pd.date_range(start="2019-01-01", periods=6)
print(datas)
'''
DatetimeIndex(['2019-01-01', '2019-01-02', '2019-01-03', '2019-01-04','2019-01-05', '2019-01-06'],dtype='datetime64[ns]', freq='D')
'''
df = pd.DataFrame(np.random.randn(6,4), index=datas,columns=["A","B","C","D"])
print(df)
'''A         B         C         D
2019-01-01 -0.593472 -0.526596 -0.663579 -0.475506
2019-01-02  0.029637 -1.542327  1.446231 -0.219709
2019-01-03  0.312669 -0.540142  0.106548 -0.569854
2019-01-04 -0.031100  1.409991 -0.625770  1.349713
2019-01-05 -0.752705 -0.302528  0.043599  0.592143
2019-01-06  0.956202 -0.393068  0.466223 -1.890532
'''

(1)行切片

print(df["2019-01-01":"2019-01-03"])
print(df.loc["2019-01-01":"2019-01-03"])
print(df.iloc[0:3])
'''A         B         C         D
2019-01-01 -0.563258 -0.981668 -0.038098  0.313748
2019-01-02  1.453888 -1.075848  1.452511 -0.562839
2019-01-03  0.797852  0.774357  1.796320  1.337514
'''

(2)列切片

print(df.loc[:, "A":"C"])
print(df.iloc[:,0:3])
'''A         B         C
2019-01-01  0.121463 -2.668285  0.175662
2019-01-02 -0.042151  1.250018  0.964810
2019-01-03  0.641962  0.892863 -0.091651
2019-01-04 -0.381722  0.014011 -0.962964
2019-01-05  1.158018 -0.030124  0.599618
2019-01-06  0.569749 -0.435110 -0.319675
'''

(3)多种多样的取值

行、列同时切片

print(df.loc["2019-01-02":"2019-01-04", "B":"C"])
print(df.iloc[1:4,1:3])
'''B         C
2019-01-02  1.885370  0.439749
2019-01-03 -1.054281  0.271491
2019-01-04 -0.781519 -0.872194
'''

行切片,列分散取值

print(df.loc["2019-01-04":"2019-01-06", ["A","C"]])
print(df.iloc[3:, [0,2]])
'''A         C
2019-01-04  0.057934  0.415995
2019-01-05  0.656228  0.836275
2019-01-06 -0.956402  0.720133
'''

行分散取值,列切片

print(df.loc[["2019-01-04", "2019-01-06"], "C":"D"])
print(df.iloc[[3,5],2:4])
'''C         D
2019-01-04 -0.796464 -1.371296
2019-01-06  2.131938 -1.106263
'''

行、列分散取值

print(df.loc[["2019-01-04","2019-01-06"], ["B", "D"]])
print(df.iloc[[3,5],[1,3]])
'''B         D
2019-01-04 -0.320283  1.346262
2019-01-06 -0.216891 -0.844410
'''

4、布尔索引

print(df[df>0])
'''A         B         C         D
2019-01-01  1.170066       NaN       NaN       NaN
2019-01-02  0.786002  2.158762       NaN       NaN
2019-01-03       NaN       NaN  0.322335  0.602991
2019-01-04       NaN       NaN       NaN       NaN
2019-01-05  0.416069       NaN  0.838723  0.687255
2019-01-06  0.277207  0.086217       NaN       NaN
'''
print(df.A>0)   # 判断A列的元素是否大于0
'''
2019-01-01     True
2019-01-02     True
2019-01-03    False
2019-01-04    False
2019-01-05    False
2019-01-06     True
'''
print(df[df.A>0])
'''A         B         C         D
2019-01-01  0.590420 -1.282202  0.318478  0.415096
2019-01-02  2.072327 -0.121314  1.713179  1.663085
2019-01-06  0.106245 -0.522096  0.417755 -0.524761
'''

isin()方法

df2 = df.copy()
df2["E"] = ["one","one","two","three","four","three"]
print(df2)
'''A         B         C         D      E
2019-01-01 -0.432689  1.960850  0.079677  0.609651    one
2019-01-02  0.026600  0.081690  0.555260 -0.193917    one
2019-01-03  1.346473 -0.249037 -0.398267  1.376942    two
2019-01-04  1.631712 -1.757012 -0.386546 -0.215699  three
2019-01-05  0.802655 -0.033013  0.771480 -1.589764   four
2019-01-06  0.615043 -0.240700  0.678544 -0.838852  three
'''
ind = df2["E"].isin(["two","four"])
print(ind)
'''
2019-01-01    False
2019-01-02    False
2019-01-03     True
2019-01-04    False
2019-01-05     True
2019-01-06    False
'''
print(df2[ind])
'''A         B         C         D     E
2019-01-03  0.704706  0.123659  1.147022  0.104124   two
2019-01-05  0.065825  0.207168  1.425794 -0.267355  four
'''

5、赋值

DateFrame增加新列

s1 = pd.Series([1,2,3,4,5,6], index=pd.date_range("20190101", periods=6))
print(s1)
'''
2019-01-01    1
2019-01-02    2
2019-01-03    3
2019-01-04    4
2019-01-05    5
2019-01-06    6
Freq: D, dtype: int64
'''df["E"] = s1
print(df)
'''A         B         C         D  E
2019-01-01 -2.192860  1.744378 -0.671842  0.704741  1
2019-01-02  0.125302 -1.141235  1.145471  1.860608  2
2019-01-03  1.462714 -0.632829 -0.046127  0.379126  3
2019-01-04  1.745818 -0.688786  0.574567 -0.900502  4
2019-01-05  0.680510 -0.194625 -1.047654  1.482277  5
2019-01-06  1.627649 -0.205627 -1.003146  0.453174  6
'''

修改赋值

df.loc["2019-01-01", "A"] = 0
df.iloc[0,1] = 0
df["D"] = np.array([5]*len(df)) # 可简化成df["D"] = 5   len(df)返回df的行数
print(df)
'''A         B         C  D
2019-01-01  0.000000  0.000000  1.095675  5
2019-01-02 -2.028600  2.048896 -1.527212  5
2019-01-03  2.149004 -0.904068  0.471809  5
2019-01-04 -0.034528  2.151367 -0.219636  5
2019-01-05 -0.544008 -1.098587 -1.873869  5
2019-01-06 -1.547652 -2.084554 -0.701767  5
'''

修改index和columns

df.index = [i for i in range(len(df))]
df.columns = [i*10 for i in range(df.shape[1])]
print(df)
'''0         10        20        30
0 -0.942362  0.191228  0.891761 -0.520997
1 -1.330733 -0.462275 -0.711679  1.503393
2 -0.187491  1.461077  0.557227 -0.798765
3 -0.012331 -1.728701  0.018166  0.659837
4  0.518749  0.776088  2.482731 -0.020565
5  0.475219 -1.025717  1.293841  1.236391
'''

三、数值运算及统计分析

1、数据查看

import pandas as pd
import numpy as npdates = pd.date_range(start="2019-01-01", periods=6)
df = pd.DataFrame(np.random.randn(6,4), index=dates,columns=["A","B","C","D"])
print(df)
'''A         B         C         D
2019-01-01 -1.061156  0.591245 -0.885117  1.123434
2019-01-02 -1.142466 -0.807766 -1.519887  0.051029
2019-01-03 -0.739533  1.907320 -1.359995  0.335202
2019-01-04 -0.290423 -1.784109 -1.033240  0.706024
2019-01-05  1.179959  0.660133  0.596361  0.384645
2019-01-06  1.093600 -0.395159 -0.799479 -0.308565
'''

(1)查看前面的行

print(df.head(2))   # 默认显示前5行
'''A         B         C         D
2019-01-01 -1.062086 -1.966453  0.638081  0.922812
2019-01-02  0.683613  1.363954  0.004098  1.308496
'''

(2)查看后面的行

print(df.tail(2))   # 默认显示后5行
'''A         B         C         D
2019-01-05 -0.370315  0.187505 -0.272255  0.296648
2019-01-06  1.393871 -0.341858  0.361288  0.834284
'''

(3)查看总体信息

df.iloc[0, 3] = np.nan  # 将第1行,第4列的值设置为NaN
print(df)
'''A         B         C         D
2019-01-01  0.529576 -0.582373  1.174552       NaN
2019-01-02  1.381525  2.005128 -0.084598 -0.680730
2019-01-03  0.634071 -0.421678 -0.695929  1.936779
2019-01-04 -0.146882  1.434341  0.553859 -0.452890
2019-01-05 -0.257330 -0.119174 -0.859402  0.163590
2019-01-06 -1.684116  0.372460  1.312178 -1.548088
'''
df.info()
'''
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 6 entries, 2019-01-01 to 2019-01-06
Freq: D
Data columns (total 4 columns):#   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  0   A       6 non-null      float641   B       6 non-null      float642   C       6 non-null      float643   D       5 non-null      float64
dtypes: float64(4)
memory usage: 240.0 bytes
'''

2、numpy通用函数适用于pandas

(1)向量化运算

x = pd.DataFrame(np.arange(4).reshape(1,4))
print(x)
'''0  1  2  3
0  0  1  2  3
'''
print(x+5)
'''0  1  2  3
0  5  6  7  8
'''
print(np.exp(x))
'''0         1         2          3
0  1.0  2.718282  7.389056  20.085537
'''
x = pd.DataFrame(np.arange(4).reshape(1,4))
print(x)
'''0  1  2  3
0  0  1  2  3
'''
y = pd.DataFrame(np.arange(4,8).reshape(1,4))
print(y)
'''0  1  2  3
0  4  5  6  7
'''
print(x*y)
'''0  1   2   3
0  0  5  12  21
'''

(2)矩阵化运算

np.random.seed(42)
x = pd.DataFrame(np.random.randint(10, size=(5,5)))
print(x)
'''0  1  2  3  4
0  6  3  7  4  6
1  9  2  6  7  4
2  3  7  7  2  5
3  4  1  7  5  1
4  4  0  9  5  8
'''
print(x.dtypes)
'''
0    int32
1    int32
2    int32
3    int32
4    int32
dtype: object
'''
np.random.seed(42)
x = pd.DataFrame(np.random.randint(10, size=(5,5)))
print(x)
'''0  1  2  3  4
0  6  3  7  4  6
1  9  2  6  7  4
2  3  7  7  2  5
3  4  1  7  5  1
4  4  0  9  5  8
'''
z = x.T     # 转置
print(z)
'''0  1  2  3  4
0  6  9  3  4  4
1  3  2  7  1  0
2  7  6  7  7  9
3  4  7  2  5  5
4  6  4  5  1  8
'''
print(x.dot(z))
'''0    1    2    3    4
0  146  154  126  102  155
1  154  186  117  119  157
2  126  117  136   83  125
3  102  119   83   92  112
4  155  157  125  112  186
'''
print(np.dot(x,z))
'''
[[146 154 126 102 155][154 186 117 119 157][126 117 136  83 125][102 119  83  92 112][155 157 125 112 186]]
'''

执行相同运算,一般来说纯粹的计算在numpy里执行的更快。numpy更侧重于计算,pandas更侧重于数据处理。

(3)广播运算

np.random.seed(42)
x = pd.DataFrame(np.random.randint(10, size=(3,3)), columns=list("ABC"))
print(x)
'''A  B  C
0  6  3  7
1  4  6  9
2  2  6  7
'''

按行广播

print(x.iloc[0])
'''
A    6
B    3
C    7
Name: 0, dtype: int32
'''print(x/x.iloc[0])
'''A    B         C
0  1.000000  1.0  1.000000
1  0.666667  2.0  1.285714
2  0.333333  2.0  1.000000
'''

按列广播

print(x.A)
'''
0    6
1    4
2    2
Name: A, dtype: int32
'''print(x.div(x.A,axis=0))    # 每列都除以A列
'''A    B         C
0  1.0  0.5  1.166667
1  1.0  1.5  2.250000
2  1.0  3.0  3.500000
'''
print(x.iloc[0])
'''
A    6
B    3
C    7
Name: 0, dtype: int32
'''print(x.div(x.iloc[0], axis=1)) # 默认axis=1,即按行计算
'''A    B         C
0  1.000000  1.0  1.000000
1  0.666667  2.0  1.285714
2  0.333333  2.0  1.000000
'''

3、新的用法

(1)索引对齐

np.random.seed(42)
x = pd.DataFrame(np.random.randint(0,20,size=(2,2)), columns=list("AB"))
print(x)
'''A   B
0   6  19
1  14  10
'''
y = pd.DataFrame(np.random.randint(0,10,size=(3,3)), columns=list("ABC"))
print(y)
'''A  B  C
0  7  4  6
1  9  2  6
2  7  4  3
'''

pandas会自动对齐两个对象的索引,没有的值用np.nan表示

print(x+y)
'''A     B   C
0  13.0  23.0 NaN
1  23.0  12.0 NaN
2   NaN   NaN NaN
'''

缺省值也可用 fill_value 来填充

print(x.add(y, fill_value=0))
'''A     B    C
0  13.0  23.0  6.0
1  23.0  12.0  6.0
2   7.0   4.0  3.0
'''

(2)统计相关

数据种类统计

import pandas as pd
import numpy as np
from collections import Counternp.random.seed(42)
y = np.random.randint(3, size=10)
print(y)    # [2 0 2 2 0 0 2 1 2 2]print(np.unique(y)) # [0 1 2]
print(Counter(y))   # Counter({2: 6, 0: 3, 1: 1})
y1 = pd.DataFrame(y,columns=["A"])
print(y1)
'''A
0  2
1  0
2  2
3  2
4  0
5  0
6  2
7  1
8  2
9  2
'''
print(np.unique(y1))    # [0 1 2]
print(y1["A"].value_counts())
'''
2    6
0    3
1    1
Name: A, dtype: int64
'''

产生新的结果,并进行排序

import pandas as pd
import numpy as nppopulation_dict = {"BeiJing":2154,"ShangHai":2424,"ShenZhen":1303,"HangZhou":981}
population = pd.Series(population_dict)
GDP_dict = {"BeiJing":30320,"ShangHai":32680,"ShenZhen":24222,"HangZhou":13468}
GDP = pd.Series(GDP_dict)
city_info = pd.DataFrame({"population":population,"GDP":GDP})
city_info["per_GDP"] = city_info["GDP"]/city_info["population"]
print(city_info)
'''population    GDP    per_GDP
BeiJing         2154  30320  14.076137
ShangHai        2424  32680  13.481848
ShenZhen        1303  24222  18.589409
HangZhou         981  13468  13.728848
'''

①递增排序

print(city_info.sort_values(by="per_GDP"))
'''population    GDP    per_GDP
ShangHai        2424  32680  13.481848
HangZhou         981  13468  13.728848
BeiJing         2154  30320  14.076137
ShenZhen        1303  24222  18.589409
'''

②递减排序

print(city_info.sort_values(by="per_GDP", ascending=False))
'''population    GDP    per_GDP
ShenZhen        1303  24222  18.589409
BeiJing         2154  30320  14.076137
HangZhou         981  13468  13.728848
ShangHai        2424  32680  13.481848
'''

③按轴排序

data = pd.DataFrame(np.random.randint(20, size=(3,4)),index=[2,1,0],columns=list("CBAD"))
print(data)
'''C   B   A   D
2   2   5  19  16
1  14  11   9   4
0   6  18   5  17
'''
print(data.sort_index())    # 行排序
'''C   B   A   D
0   6  18   5  17
1  14  11   9   4
2   2   5  19  16
'''
print(data.sort_index(axis=1)) # 列排序
'''A   B   C   D
2   3  15   1  14
1  10   7  18   6
0  15  13  11  14
'''
print(data.sort_index(axis=1, ascending=False))
'''D   C   B  A
2  3  10   9  6
1  5  11  15  5
0  5   7  16  2
'''

统计方法

np.random.seed(10)
df = pd.DataFrame(np.random.normal(2, 4, size=(6, 4)),columns=list("ABCD"))
print(df)
'''A         B         C          D
0  7.326346  4.861116 -4.181601   1.966465
1  4.485344 -0.880342  3.062046   2.434194
2  2.017166  1.301599  3.732105   6.812149
3 -1.860263  6.113096  2.914521   3.780550
4 -2.546409  2.540548  7.938148  -2.319220
5 -5.910913 -4.973489  3.064281  11.539869
'''
# 统计非空个数
print(df.count())
'''
A    6
B    6
C    6
D    6
'''
# 求和
print(df.sum())
'''
A     3.511271
B     8.962527
C    16.529499
D    24.214008
dtype: float64
'''
print(df.sum(axis=1))
'''
0     9.972325
1     9.101242
2    13.863019
3    10.947905
4     5.613067
5     3.719748
dtype: float64
'''
# 最大值 最小值
print(df.min()) # 按列
'''
A   -5.910913
B   -4.973489
C   -4.181601
D   -2.319220
dtype: float64
'''
print(df.max(axis=1))   # 按行
'''
0     7.326346
1     4.485344
2     6.812149
3     6.113096
4     7.938148
5    11.539869
dtype: float64
'''
print(df.idxmax())  # 最大值的坐标
'''
A    0
B    3
C    4
D    5
dtype: int64
'''
# 均值
print(df.mean())
'''
A    0.585212
B    1.493755
C    2.754917
D    4.035668
dtype: float64
'''
# 方差
print(df.var())
'''
A    24.138289
B    16.254343
C    15.230314
D    22.263578
dtype: float64
'''
# 标准差
print(df.std())
'''
A    4.913073
B    4.031668
C    3.902604
D    4.718430
dtype: float64
'''
# 中位数
print(df.median())
'''
A    0.078452
B    1.921073
C    3.063163
D    3.107372
dtype: float64
'''
# 众数
data = pd.DataFrame(np.random.randint(5,size=(10,2)),columns=list("AB"))
print(data)
'''A  B
0  2  0
1  3  4
2  2  0
3  1  2
4  0  0
5  3  1
6  3  4
7  1  4
8  2  0
9  0  4
'''
print(data.mode())
'''A  B
0  2  0
1  3  4
'''
print(df.quantile(0.75))    # 75%分数位
'''
A    3.868299
B    4.280974
C    3.565149
D    6.054250
Name: 0.75, dtype: float64
'''
print(df.describe())
'''A         B         C          D
count  6.000000  6.000000  6.000000   6.000000
mean   0.585212  1.493755  2.754917   4.035668
std    4.913073  4.031668  3.902604   4.718430
min   -5.910913 -4.973489 -4.181601  -2.319220
25%   -2.374872 -0.334857  2.951402   2.083397
50%    0.078452  1.921073  3.063163   3.107372
75%    3.868299  4.280974  3.565149   6.054250
max    7.326346  6.113096  7.938148  11.539869
'''
data2 = pd.DataFrame([["a","a","c","d"],["c","a","c","d"],["a","a","d","c"]],
columns=list("ABCD"))
print(data2)
'''A  B  C  D
0  a  a  c  d
1  c  a  c  d
2  a  a  d  c
'''
print(data2.describe())
'''A  B  C  D
count   3  3  3  3
unique  2  1  2  2
top     a  a  c  d
freq    2  3  2  2
'''
'''
count 表示每列的数据数量,
unique 表示每列的唯一值数量,
top 表示每列中出现频率最高的值,
freq 表示最常见值的出现频次。
'''
# 相关性系数
print(df.corr())
'''A         B         C         D
A  1.000000  0.409966 -0.655007 -0.383420
B  0.409966  1.000000 -0.255655 -0.631457
C -0.655007 -0.255655  1.000000 -0.152966
D -0.383420 -0.631457 -0.152966  1.000000
'''
print(df.corrwith(df["A"]))
'''
A    1.000000
B    0.409966
C   -0.655007
D   -0.383420
dtype: float64
'''

自定义输出
apply(method)的用法:使用method方法默认对每一列进行相应的操作

np.random.seed(10)
df = pd.DataFrame(np.random.normal(2, 4, size=(6, 4)),columns=list("ABCD"))
print(df)
'''A         B         C          D
0  7.326346  4.861116 -4.181601   1.966465
1  4.485344 -0.880342  3.062046   2.434194
2  2.017166  1.301599  3.732105   6.812149
3 -1.860263  6.113096  2.914521   3.780550
4 -2.546409  2.540548  7.938148  -2.319220
5 -5.910913 -4.973489  3.064281  11.539869
'''
print(df.apply(np.cumsum))  # 按列方向,累加
'''A          B          C          D
0   7.326346   4.861116  -4.181601   1.966465
1  11.811690   3.980774  -1.119555   4.400659
2  13.828856   5.282373   2.612550  11.212808
3  11.968593  11.395469   5.527070  14.993359
4   9.422184  13.936017  13.465218  12.674139
5   3.511271   8.962527  16.529499  24.214008
'''
print(df.apply(np.cumsum, axis=1))  # 按行方向,累加
'''A          B         C          D
0  7.326346  12.187462  8.005861   9.972325
1  4.485344   3.605002  6.667048   9.101242
2  2.017166   3.318765  7.050870  13.863019
3 -1.860263   4.252834  7.167354  10.947905
4 -2.546409  -0.005861  7.932287   5.613067
5 -5.910913 -10.884402 -7.820122   3.719748
'''
print(df.apply(sum))
'''
A     3.511271
B     8.962527
C    16.529499
D    24.214008
dtype: float64
'''
print(df.apply(lambda x: x.max()-x.min()))
'''
A    13.237259
B    11.086585
C    12.119749
D    13.859089
dtype: float64
'''
def my_describe(x):return pd.Series([x.count(), x.mean(), x.max(),x.idxmin(), x.std()],index=["Count", "mean", "max", "idxmin", "std"])
print(df.apply(my_describe))
'''A         B         C          D
Count   6.000000  6.000000  6.000000   6.000000
mean    0.585212  1.493755  2.754917   4.035668
max     7.326346  6.113096  7.938148  11.539869
idxmin  5.000000  5.000000  0.000000   4.000000
std     4.913073  4.031668  3.902604   4.718430
'''

四、缺失值处理

1、发现缺失值

import pandas as pd
import numpy as npdata = pd.DataFrame(np.array([[1, np.nan, 2],[np.nan, 3, 4],[5, 6, None]]),columns=["A", "B", "C"])
print(data)
'''A    B     C
0    1  NaN     2
1  NaN    3     4
2    5    6  None
'''

注意:有None、字符串等,数据类型全部变为object,它比int和float更消耗资源

print(data.dtypes)
'''
A    object
B    object
C    object
dtype: object
'''
print(data.isnull())
'''A      B      C
0  False   True  False
1   True  False  False
2  False  False   True
'''
print(data.notnull())
'''A      B      C
0   True  False   True
1  False   True   True
2   True   True  False
'''

2、删除缺失值

import pandas as pd
import numpy as npdata = pd.DataFrame(np.array([[1, np.nan, 2, 3],[np.nan, 3, 4, 6],[7, 8, np.nan, 9],[10, 11, 12, 13]]),columns=["A", "B", "C", "D"])
print(data)
'''A     B     C     D
0   1.0   NaN   2.0   3.0
1   NaN   3.0   4.0   6.0
2   7.0   8.0   NaN   9.0
3  10.0  11.0  12.0  13.0
'''

注意:np.nan是一种特殊的浮点数

print(data.dtypes)
'''
A    float64
B    float64
C    float64
D    float64
dtype: object
'''

(1)删除整行

print(data.dropna())
'''A     B     C     D
3  10.0  11.0  12.0  13.0
'''

(2)删除整列

print(data.dropna(axis=1))
'''D
0   3.0
1   6.0
2   9.0
3  13.0
'''
data["D"] = np.nan
print(data)
'''A     B     C   D
0   1.0   NaN   2.0 NaN
1   NaN   3.0   4.0 NaN
2   7.0   8.0   NaN NaN
3  10.0  11.0  12.0 NaN
'''
print(data.dropna(axis=1, how="all"))
'''A     B     C
0   1.0   NaN   2.0
1   NaN   3.0   4.0
2   7.0   8.0   NaN
3  10.0  11.0  12.0
'''
data.loc[3] = np.nan
print(data)
'''A    B    C   D
0  1.0  NaN  2.0 NaN
1  NaN  3.0  4.0 NaN
2  7.0  8.0  NaN NaN
3  NaN  NaN  NaN NaN
'''
print(data.dropna(how="all"))
'''A    B    C   D
0  1.0  NaN  2.0 NaN
1  NaN  3.0  4.0 NaN
2  7.0  8.0  NaN NaN
'''

3、填充缺失值

import pandas as pd
import numpy as npdata = pd.DataFrame(np.array([[1, np.nan, 2, 3],[np.nan, 3, 4, 6],[7, 8, np.nan, 9],[10, 11, 12, 13]]),columns=["A", "B", "C", "D"])
print(data)
'''A     B     C     D
0   1.0   NaN   2.0   3.0
1   NaN   3.0   4.0   6.0
2   7.0   8.0   NaN   9.0
3  10.0  11.0  12.0  13.0
'''print(data.fillna(value=5))
'''A     B     C     D
0   1.0   5.0   2.0   3.0
1   5.0   3.0   4.0   6.0
2   7.0   8.0   5.0   9.0
3  10.0  11.0  12.0  13.0
'''

用均值进行替换

print(data.fillna(value=data.mean()))   # 填充每列的均值
'''A          B     C     D
0   1.0   7.333333   2.0   3.0
1   6.0   3.000000   4.0   6.0
2   7.0   8.000000   6.0   9.0
3  10.0  11.000000  12.0  13.0
'''
print(data.fillna(value=data.stack().mean()))   #用这个DataFrame中所有非空数据的均值填充
'''A          B          C     D
0   1.000000   6.846154   2.000000   3.0
1   6.846154   3.000000   4.000000   6.0
2   7.000000   8.000000   6.846154   9.0
3  10.000000  11.000000  12.000000  13.0
'''

五、合并数据

构造一个生产DataFrame的函数

import pandas as pddef make_df(cols, ind):data = {c: [str(c)+str(i) for i in ind] for c in cols}return pd.DataFrame(data, ind)print(make_df("ABC", range(3)))
'''A   B   C
0  A0  B0  C0
1  A1  B1  C1
2  A2  B2  C2
'''

1、垂直合并

df_1 = make_df("AB", [1, 2])
df_2 = make_df("AB", [3 ,4])
print(df_1)
'''A   B
1  A1  B1
2  A2  B2
'''
print(df_2)
'''A   B
3  A3  B3
4  A4  B4
'''
print(pd.concat([df_1, df_2]))
'''A   B
1  A1  B1
2  A2  B2
3  A3  B3
4  A4  B4
'''

2、水平合并

df_3 = make_df("AB", [0,1])
df_4 = make_df("CD", [0,1])
print(df_3)
'''A   B
0  A0  B0
1  A1  B1
'''
print(df_4)
'''C   D
0  C0  D0
1  C1  D1
'''
print(pd.concat([df_3, df_4], axis=1))
'''A   B   C   D
0  A0  B0  C0  D0
1  A1  B1  C1  D1
'''

3、索引重叠

df_5 = make_df("AB", [1, 2])
df_6 = make_df("AB", [1, 2])
print(df_5)
'''A   B
1  A1  B1
2  A2  B2
'''
print(df_6)
'''A   B
1  A1  B1
2  A2  B2
'''
print(pd.concat([df_5, df_6]))
'''A   B
1  A1  B1
2  A2  B2
1  A1  B1
2  A2  B2
'''
print(pd.concat([df_5, df_6], ignore_index=True))
'''A   B
0  A1  B1
1  A2  B2
2  A1  B1
3  A2  B2
'''

4、对齐合并merge()

df_9 = make_df("AB", [1, 2])
df_10 = make_df("BC", [1, 2])
print(df_9)
'''A   B
1  A1  B1
2  A2  B2
'''
print(df_10)
'''B   C
1  B1  C1
2  B2  C2
'''
print(pd.merge(df_9, df_10))
'''A   B   C
0  A1  B1  C1
1  A2  B2  C2
'''

5、例:合并城市信息

import pandas as pdpopulation_dict = {"city": ("BeiJing", "HangZhou", "ShenZhen"),"pop": (2154, 981,1303)}
population = pd.DataFrame(population_dict)
print(population)
'''city   pop
0   BeiJing  2154
1  HangZhou   981
2  ShenZhen  1303
'''
GDP_dict = {"city": ("BeiJing", "ShangHai", "HangZhou"),"GDP": (30320, 32680, 13468)}
GDP = pd.DataFrame(GDP_dict)
print(GDP)
'''city    GDP
0   BeiJing  30320
1  ShangHai  32680
2  HangZhou  13468
'''
city_info = pd.merge(population, GDP)
print(city_info)
'''city   pop    GDP
0   BeiJing  2154  30320
1  HangZhou   981  13468
'''
city_info = pd.merge(population, GDP, how="outer")	# 设置为并集,默认为交集
print(city_info)
'''city     pop      GDP
0   BeiJing  2154.0  30320.0
1  HangZhou   981.0  13468.0
2  ShenZhen  1303.0      NaN
3  ShangHai     NaN  32680.0
'''

六、分组和数据透视表

import pandas as pd
import numpy as npnp.random.seed(10)
df = pd.DataFrame({"key":["A", "B", "C", "A", "B", "C"],"data1":range(6),"data2":np.random.randint(0, 10, size=6)})
print(df)
'''key  data1  data2
0   A      0      9
1   B      1      4
2   C      2      0
3   A      3      1
4   B      4      9
5   C      5      0
'''

1、分组

(1)延迟计算

print(df.groupby("key"))
# <pandas.core.groupby.generic.DataFrameGroupBy object at 0x000001E5DB95F610>print(df.groupby("key").sum())
'''data1  data2
key              
A        3     10
B        5     13
C        7      0
'''

(2)按列取值

print(df.groupby("key")["data2"].sum())
'''
key
A    10
B    13
C     0
Name: data2, dtype: int32
'''

(3)按组迭代

for data, group in df.groupby("key"):print("{0:5} shape={1}".format(data, group.shape))
'''
A     shape=(2, 3)
B     shape=(2, 3)
C     shape=(2, 3)
'''

(4)调用方法

print(df.groupby("key")["data1"].describe())
'''count  mean      std  min   25%  50%   75%  max
key                                                 
A      2.0   1.5  2.12132  0.0  0.75  1.5  2.25  3.0
B      2.0   2.5  2.12132  1.0  1.75  2.5  3.25  4.0
C      2.0   3.5  2.12132  2.0  2.75  3.5  4.25  5.0
'''

(5)支持更复杂的操作

print(df.groupby("key").aggregate(["min", "median", "max"]))
'''data1            data2           min median max   min median max
key                                  
A       0    1.5   3     1    5.0   9
B       1    2.5   4     4    6.5   9
C       2    3.5   5     0    0.0   0
'''

(6)过滤

def filter_func(x):return x["data2"].std() > 3print(df.groupby("key")["data2"].std())
'''
key
A    5.656854
B    3.535534
C    0.000000
Name: data2, dtype: float64
'''
print(df.groupby("key").filter(filter_func))
'''key  data1  data2
0   A      0      9
1   B      1      4
3   A      3      1
4   B      4      9
'''

(7)转换

print(df.groupby("key").transform(lambda x: x-x.mean()))
'''data1  data2
0   -1.5    4.0
1   -1.5   -2.5
2   -1.5    0.0
3    1.5   -4.0
4    1.5    2.5
5    1.5    0.0
'''

(8)apply()方法

def norm_by_data2(x):x["data1"] /= x["data2"].sum()return xprint(df.groupby("key").apply(norm_by_data2)
)
'''key     data1  data2
0   A  0.000000      9
1   B  0.076923      4
2   C       inf      0
3   A  0.300000      1
4   B  0.307692      9
5   C       inf      0
'''

(9)将列表、数组设为分组间

L = [0, 1, 0, 1, 2, 0]
print(df.groupby(L).sum())
'''data1  data2
0      7      9
1      4      5
2      4      9
'''

(10)用字典将索引映射到分组

df2 = df.set_index("key")
print(df2)
'''data1  data2
key              
A        0      9
B        1      4
C        2      0
A        3      1
B        4      9
C        5      0
'''
mapping = {"A": "first", "B": "constant", "C": "constant"}print(df2.groupby(mapping).sum())
'''data1  data2
key                   
constant     12     13
first         3     10
'''

(11)任意Python函数

print(df2.groupby(str.lower).mean()
)
'''data1  data2
key              
a      1.5    5.0
b      2.5    6.5
c      3.5    0.0
'''

(12)多个有效值组成的列表

mapping = {"A": "first", "B": "constant", "C": "constant"}print(df2.groupby([str.lower, mapping]).mean()
)
'''data1  data2
key key                   
a   first       1.5    5.0
b   constant    2.5    6.5
c   constant    3.5    0.0
'''

(13)例:行星观测数据处理

import seaborn as snsplanets = sns.load_dataset("planets")# print(planets)
# print(planets.shape)
# print(planets.head())
# print(planets.describe())decade = 10*(planets["year"]//10)
decade = decade.astype(str) + "s"
decade.name = "decade"
print(decade.head())# print(planets.groupby(["method", decade]).sum())
print(planets.groupby(["method", decade])[["number"]].sum().unstack().fillna(0))

2、数据透视表

import seaborn as snstitanic = sns.load_dataset("titanic")
# print(titanic.head())
# print(titanic.describe())
# print(titanic.groupby("sex")[["survived"]].mean())
'''survived
sex             
female  0.742038
male    0.188908
'''
# print(titanic.groupby("sex")["survived"].mean())
'''
sex
female    0.742038
male      0.188908
Name: survived, dtype: float64
'''
# print(
#     titanic.groupby(["sex", "class"])["survived"].aggregate("mean").unstack()
# )
'''
class      First    Second     Third
sex                                 
female  0.968085  0.921053  0.500000
male    0.368852  0.157407  0.135447
'''
# 数据透视表
# print(
#     titanic.pivot_table("survived", index="sex", columns="class",
#                         aggfunc="mean", margins=True)
# )
'''
class      First    Second     Third       All
sex                                           
female  0.968085  0.921053  0.500000  0.742038
male    0.368852  0.157407  0.135447  0.188908
All     0.629630  0.472826  0.242363  0.383838
'''
print(titanic.pivot_table(index="sex", columns="class",aggfunc={"survived":sum, "fare":"mean"})
)
'''fare                       survived             
class        First     Second      Third    First Second Third
sex                                                           
female  106.125798  21.970121  16.118810       91     70    72
male     67.226127  19.741782  12.661633       45     17    47
'''

七、多级索引:多用于多维数据

import pandas as pd
import numpy as npbase_data = np.array([[1771, 11115],[2154, 30320],[2141, 14070],[2424, 32680],[1077, 7806],[1303, 24222],[798, 4789],[981, 13468]
])data = pd.DataFrame(base_data, index=[["BeiJing", "BeiJing", "ShangHai", "ShangHai","ShenZhen", "ShenZhen", "HangZhou", "HangZhou"],[2008, 2018] * 4], columns=["population", "GDP"])
data.index.names = ["city", "year"]
print(data)
'''population    GDP
city     year                   
BeiJing  2008        1771  111152018        2154  30320
ShangHai 2008        2141  140702018        2424  32680
ShenZhen 2008        1077   78062018        1303  24222
HangZhou 2008         798   47892018         981  13468
'''
print(data["GDP"])
'''
city      year
BeiJing   2008    111152018    30320
ShangHai  2008    140702018    32680
ShenZhen  2008     78062018    24222
HangZhou  2008     47892018    13468
Name: GDP, dtype: int32
'''
print(data.loc["ShangHai", "GDP"])
'''
year
2008    14070
2018    32680
Name: GDP, dtype: int32
'''
print(data.loc["ShangHai", 2018]["GDP"])    # 32680

八、高性能的pandas

1、eval()和query()用法

减少了符合代数式计算过程中间的内存分配

import pandas as pd
import numpy as npdf1, df2, df3, df4 = (pd.DataFrame(np.random.random((10000,100))) for i in range(4))
print(np.allclose((df1+df2)/(df3+df4),pd.eval("(df1+df2)/(df3+df4)")))  # True

query()用法和eval()相同

2、eval()和query()使用时机

小数组时,普通方法更快。它们适用于大数组。

# 计算 DataFrame df1 中所有元素所占用的内存空间大小,单位为字节(bytes)
print(df1.values.nbytes)    # 8000000

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/799679.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

阿里云服务器可以干嘛 阿里云服务器应用场景有哪些

阿里云服务器可以干嘛&#xff1f;能干啥你还不知道么&#xff01;简单来讲可用来搭建网站、个人博客、企业官网、论坛、电子商务、AI、LLM大语言模型、测试环境等&#xff0c;阿里云百科aliyunbaike.com整理阿里云服务器的用途&#xff1a; 阿里云服务器活动 aliyunbaike.com…

单例模式--理解

单例模式 单例模式是指在内存中只会创建且仅创建一次对象的设计模式。在程序中多次使用同一个对象且作用相同时&#xff0c;为了防止频繁地创建对象使得内存飙升&#xff0c;单例模式可以让程序仅在内存中创建一个对象&#xff0c;让所有需要调用的地方都共享这一单例对象。 单…

使用docker-compose创建多项目容器运行

使用docker-compose创建多项目容器运行 按招网友提供方法创建 docker-compose.yml内容&#xff08;这里改了桥接模式&#xff0c;并且注释放开&#xff09; version: "3" services:docker_python:image: python:2.7.18container_name: py_appworking_dir: "/r…

婴儿洗衣机哪种比较实用?精选四大热门口碑婴儿洗衣机推荐

对于有了宝宝的家庭来说&#xff0c;洗衣成为了一项重要的家务事。大家都知道&#xff0c;宝宝的皮肤比较娇嫩&#xff0c;容易受到各种细菌、病毒的侵扰。所以&#xff0c;宝宝的衣物应该与大人的分开洗。婴儿洗衣机作为一种专门为婴幼儿家庭设计的洗衣机&#xff0c;其具有除…

空调(c++实现)

题目 Farmer John 的 N 头奶牛对他们牛棚的室温非常挑剔。 有些奶牛喜欢温度低一些&#xff0c;而有些奶牛则喜欢温度高一些。 Farmer John 的牛棚包含一排 N个牛栏&#xff0c;编号为 1…N&#xff0c;每个牛栏里有一头牛。 第 i 头奶牛希望她的牛栏中的温度是 pi&#xff0c…

nginx部署前端教程

目录 一、前言二、部署三、注意四、参考 一、前言 一般来说现在的软件项目&#xff0c;都是分用户端以及管理端的&#xff0c;并且是前后端分离的&#xff0c;这里我来记录一下部署两个前端的教程。 部署前端之前需要的准备工作是部署springBoot后端程序&#xff0c;这里我do…

qt设置异形图片并实现透明效果

思路:将背景设置为透明,然后将图片设置给label,将laben和this都设置为图片大小 setAttribute(Qt::WA_TranslucentBackground, true); 可以将背景设置为透明 然后 QPixmap *pixnew QPixmap(":/Image/xxx.png"); this->setFixedSize(pix->width(),pix->…

对OceanBase中的配置项与系统变量,合法性检查实践

在“OceanBase 配置项&系统变量实现及应用详解”的系列文章中&#xff0c;我们已经对配置项和系统变量的源码进行了解析。当涉及到新增配置项或系统变量时&#xff0c;通常会为其指定一个明确的取值范围或定义一个专门的合法性检查函数。本文将详细阐述在不同情境下&#x…

深入理解指针2:数组名理解、一维数组传参本质、二级指针、指针数组和数组指针、函数中指针变量

目录 1、数组名理解 2、一维数组传参本质 3、二级指针 4、指针数组和数组指针 5、函数指针变量 1、数组名理解 首先来看一段代码&#xff1a; int main() {int arr[10] { 1,2,3,4,5,6,7,8,9,10 };printf("%d\n", sizeof(arr));return 0; } 输出的结果是&…

[大模型]大语言模型量化方法对比:GPTQ、GGUF、AWQ

在过去的一年里&#xff0c;大型语言模型(llm)有了飞速的发展&#xff0c;在本文中&#xff0c;我们将探讨几种(量化)的方式&#xff0c;除此以外&#xff0c;还会介绍分片及不同的保存和压缩策略。 说明&#xff1a;每次加载LLM示例后&#xff0c;建议清除缓存&#xff0c;以…

Jupyter Notebook中常见的快捷键

Jupyter Notebook的快捷键主要分为两种模式&#xff1a;命令模式和编辑模式。 在命令模式下&#xff0c;键盘输入用于运行程序命令&#xff0c;此时单元格框线是蓝色的&#xff1b; 在编辑模式下&#xff0c;可以往单元格中键入代码或文本&#xff0c;此时单元格框线是绿色的…

MySQL的Seconds_Behind_Master 是如何计算的

Seconds_Behind_Master 如何计算 以下是源码中关于延迟时间计算方法的注释说明 # 位于rpl_mi.h中定义clock_diff_with_master附近&#xff08;翻阅了5.6.34和5.7.22 两个版本&#xff0c;对于复制延迟的计算公式两者一致&#xff09; # 从源码注释上来看&#xff0c;复制延迟…

直通车人群和引力魔方人群的区别

1&#xff09;直通车的人群&#xff0c;他和引力魔方的人群看起来一样&#xff0c;但是作用不一样&#xff0c;直通车广告是通过关键词筛选流量的&#xff0c;人群只是起到溢价作用&#xff0c;引力魔方人群是筛选作用出价作用&#xff1b; 2&#xff09;直通车的人群&#xf…

python之列表操作

1、创建列表 代码示例&#xff1a; i [1, 2, 34, 4] o list((1, 2, 3, 4, 5, 6)) 分别创建了两个数组&#xff0c;这两种格式都能创建数组 2、关于数组的操作 1、添加元素 1、append&#xff08;&#xff09; append方法主要是添加一个元素 代码示例如下&#xff1a;…

深度学习理论基础(七)Transformer编码器和解码器

学习目录&#xff1a; 深度学习理论基础&#xff08;一&#xff09;Python及Torch基础篇 深度学习理论基础&#xff08;二&#xff09;深度神经网络DNN 深度学习理论基础&#xff08;三&#xff09;封装数据集及手写数字识别 深度学习理论基础&#xff08;四&#xff09;Parse…

手机软件何时统一--桥接模式

1.1 凭什么你的游戏我不能玩 2007年苹果手机尚未出世&#xff0c;机操作系统多种多样&#xff08;黑莓、塞班、Tizen等&#xff09;&#xff0c;互相封闭。而如今&#xff0c;存世的手机操作系统只剩下苹果OS和安卓&#xff0c;鸿蒙正在稳步进场。 1.2 紧耦合的程序演化 手机…

vue的 blob文件下载文件时,后端自定义异常,并返回json错误提示信息,前端捕获信息并展示给用户

1.后端返回的json数据结构为&#xff1a; {"message":"下载失败&#xff0c;下载文件不存在&#xff0c;请联系管理员处理&#xff01;","code":500} 2.vue 请求后台接口返回的 Blob数据 3.问题出现的原因是&#xff0c;正常其他数据列表接口&…

[C++][算法基础]堆排序(堆)

输入一个长度为 n 的整数数列&#xff0c;从小到大输出前 m 小的数。 输入格式 第一行包含整数 n 和 m。 第二行包含 n 个整数&#xff0c;表示整数数列。 输出格式 共一行&#xff0c;包含 m 个整数&#xff0c;表示整数数列中前 m 小的数。 数据范围 1≤m≤n≤&#x…

第4章 Redis,一站式高性能存储方案,笔记问题

点赞具体要实现功能有哪些&#xff1f; 可以点赞的地方&#xff1a;对帖子点赞&#xff0c;对评论点赞点一次是点赞&#xff0c;再点一次是取消赞统计点赞的数量&#xff08;计数&#xff0c;string&#xff09;&#xff0c;帖子被点赞的数量&#xff0c;某个用户被点赞的数量…

【数据结构】考研真题攻克与重点知识点剖析 - 第 5 篇:树与二叉树

&#xff08;考研真题待更新&#xff09; 欢迎订阅专栏&#xff1a;408直通车 请注意&#xff0c;本文中的部分内容来自网络搜集和个人实践&#xff0c;如有任何错误&#xff0c;请随时向我们提出批评和指正。本文仅供学习和交流使用&#xff0c;不涉及任何商业目的。如果因本…