目录
写字前面:
结果展示
获取数据
行业数据
控制人数据
十大流通股东数据
开始合并
1 从行业数据中提取证券股的行业数据
2 合并控制人数据
3 合并十大流通股东
4 把三个结果按列合并
写字前面:
在分析数据的时候,常常需要的字段分散在多个文档里,这时候就需要把这些字段合并到一起方便查看和分析。本文以证券板块的股票为例,整合行业、控制人、十大流通股东这三项数据。
结果展示
获取数据
行业数据
数据来源优矿【具体优矿的使用过程看“python_寻找底部股票”博文】
下载的文件存储为industry_list.csv
控制人数据
巨潮-》数据-》专题统计-》股本股东-》实际控制人持股变动
可以直接下载
注意:好像非交易日没法打开,那我这就不截网站的图了。
下图是下载后的数据:
十大流通股东数据
通过akshare获取
akshare文档中 Welcome to AKShare’s Online Documentation! — AKShare 1.10.57 文档
有股票基本面数据的接口
路径:数据字典-》股票数据-》基本面数据-》流通股东
注意:最好使用python3.8或python3.8以上版本安装akshare包执行接口代码
import akshare as akpre_dir = r'D:/temp001/'code_list = ['000166','000686','000712','000728','000750','000776','000783','002500','002670','002673','002736','002797','002926','002939','002945','300059','600030','600061','600095','600109','600155','600369','600621','600837','600864','600906','600909','600918','600858','600999','601059','601066','601099','601108','601136','601162','601198','601211','601236','601375','601377','601456','601555','601688','601696','601788','601878','601881','601901','601990','601995']for code in code_list:# 流通股东try:df = ak.stock_circulate_stock_holder(symbol=code)df.to_excel(f'{pre_dir}流通股东_{code}.xlsx', engine='openpyxl')except Exception as e:print(str(e))print(f'{code} 失败')passtime.sleep(5)
开始合并
已知:证券股代码列表
1 从行业数据中提取证券股的行业数据
def method_001():res_dir = r'D:/temp000/'def pd_000(x):sec_str = x['secID']res_str = sec_str.split('.')[0]return res_strindustry_path = r'D:/temp000/industry_list.csv'code_list = ['000166','000686','000712','000728','000750','000776','000783','002500','002670','002673','002736','002797','002926','002939','002945','300059','600030','600061','600095','600109','600155','600369','600621','600837','600864','600906','600909','600918','600858','600999','601059','601066','601099','601108','601136','601162','601198','601211','601236','601375','601377','601456','601555','601688','601696','601788','601878','601881','601901','601990','601995']df = pd.read_csv(industry_path,encoding='utf-8')df = df.loc[df['isNew']==1].copy()df['ticker00'] = df.apply(pd_000,axis=1)df00 = df.loc[df['ticker00'].isin(code_list)].copy()df00 = df00.loc[:,['ticker00','secShortName','industryName1','industryName2','industryName3']]df00.to_excel(res_dir+'industry_000.xlsx',engine='openpyxl')pass
执行后结果
2 合并控制人数据
def method_002():res_dir = r'D:/temp003/'code_list = ['000166', '000686', '000712', '000728', '000750', '000776', '000783', '002500', '002670', '002673','002736', '002797', '002926', '002939', '002945', '300059', '600030', '600061', '600095', '600109','600155', '600369', '600621', '600837', '600864', '600906', '600909', '600918', '600858', '600999','601059', '601066', '601099', '601108', '601136', '601162', '601198', '601211', '601236', '601375','601377', '601456', '601555', '601688', '601696', '601788', '601878', '601881', '601901', '601990','601995']control_path = r'D:/temp000/专题统计_实际控制人持股变动.xlsx'df = pd.read_excel(control_path,engine='openpyxl')df = df.loc[:,['证券代码','实际控制人名称','控股数量(万股)','控股比例(%)','直接控制人名称','控制类型']].copy()df = df.loc[df['证券代码'].isin(code_list)].copy()df.to_excel(res_dir+'control_000.xlsx',engine='openpyxl')pass
执行后结果
3 合并十大流通股东
先看下每个股票十大流通股东的数据结构
它是每季度十行数据,只取最新一季度的十大流通股东,把十行的股东名称转成一个字符串,彼此之间用分号隔开
def method_003():code_list = ['000166', '000686', '000712', '000728', '000750', '000776', '000783', '002500', '002670', '002673','002736', '002797', '002926', '002939', '002945', '300059', '600030', '600061', '600095', '600109','600155', '600369', '600621', '600837', '600864', '600906', '600909', '600918', '600858', '600999','601059', '601066', '601099', '601108', '601136', '601162', '601198', '601211', '601236', '601375','601377', '601456', '601555', '601688', '601696', '601788', '601878', '601881', '601901', '601990','601995']# pre_dir 十大流通股东文件存储的文件夹pre_dir = r'D:/temp000/000/'results_list = []for code in code_list:print(code)file_name = f"流通股东_{code}.xlsx"file_path = pre_dir + file_namedf = pd.read_excel(file_path,engine='openpyxl')# 取最新的一次十大流通股东df = df.iloc[0:10].copy()holders = df['股东名称'].values.tolist()holders_str = ';'.join(holders)date = df.iloc[0]['截止日期']results_list.append({'ticker':code,'date':date,'circulation_holders':holders_str})passres_df = pd.DataFrame(results_list)res_df.to_excel('D:/temp003/holders_000.xlsx',engine='openpyxl')pass
执行结果
4 把三个结果按列合并
def method_004():one_path = r'D:/temp003/industry_000.xlsx'two_path = r'D:/temp003/control_000.xlsx'three_path = r'D:/temp003/holders_000.xlsx'df_one = pd.read_excel(one_path,engine='openpyxl',converters={'ticker00':str})df_two = pd.read_excel(two_path,engine='openpyxl',converters={'证券代码':str})df_three = pd.read_excel(three_path,engine='openpyxl',converters={'ticker':str})df00 = pd.merge(df_one,df_two,how='inner',left_on='ticker00',right_on='证券代码')df01 = pd.merge(df00,df_three,how='inner',left_on='ticker00',right_on='ticker')df01.to_excel(r'D:/temp003/res_000.xlsx',engine='openpyxl')pass
最终结果
把数据都合并到一个表后,就可以进行下一步的分析了。