目录
前置:
步骤:
step one 安装包
step two 获取股票代码
step three 敲代码,实现
step four 网址转pdf网址
视频
前置:
1 本系列将以 “PyQt6实例_批量下载pdf工具”开头,放在 【PyQt6实例】 专栏
2 本节讲述“批量pdf网址获取”没有涉及到PyQt6的知识点,是“批量下载pdf工具”的一个步骤
3 “批量下载pdf工具”实例是以下载巨潮pdf文件为使用场景,所以pdf网址获取来自巨潮
4 本系列后续会在B站录制视频,到时会在文末贴出链接。本人还是建议先看博文,不懂的再看视频,这样效率高,节约时间。
步骤:
step one 安装包
1 新建项目,创建虚拟环境
2 安装包 pip install akshare
step two 获取股票代码
打开通达信-》行情-》A股-》按“34”回车
取代码这一列,存储到txt文件中
step three 敲代码,实现
import akshare as ak
from concurrent.futures import ThreadPoolExecutor
from datetime import datetime# {'年报', '半年报', '一季报', '三季报', '业绩预告', '权益分派',
# '董事会', '监事会', '股东大会', '日常经营', '公司治理', '中介报告',
# '首发', '增发', '股权激励', '配股', '解禁', '公司债', '可转债', '其他融资',
# '股权变动', '补充更正', '澄清致歉', '风险提示', '特别处理和退市', '退市整理期'}
def req_from_ak(thread_num:int,stock_ticker_list:list):category_str = '权益分派'end_date_str = '20250329'pre_dir = r'E:/temp003/'print(f'thread {thread_num} start.')for symbol_str in stock_ticker_list:try:df = ak.stock_zh_a_disclosure_report_cninfo(symbol=symbol_str, market="沪深京",category=category_str,start_date="20000101",end_date=end_date_str)df.to_excel(pre_dir+symbol_str+'.xlsx',engine='openpyxl')except:print(symbol_str)print(f'thread {thread_num} execute end. {datetime.now().strftime("%Y-%m-%d %H:%M:%s")}')passdef start_execute():with open('./stock_ticker.txt',mode='r',encoding='utf-8') as fr:contents = fr.read()stock_ticker_list = contents.split('\n')print(len(stock_ticker_list))thread_count = 5interval = len(stock_ticker_list)//thread_countif interval == 0:thread_count = 1params_list = []thread_num_list = []for i in range(0,thread_count):if i == thread_count-1:pre_list = stock_ticker_list[i*interval:]else:pre_list = stock_ticker_list[i*interval:i*interval+interval]thread_num_list.append(i)params_list.append(pre_list)with ThreadPoolExecutor() as executor:executor.map(req_from_ak, thread_num_list,params_list)print('线程池任务分配完毕')passif __name__ == '__main__':start_execute()pass
使用多线程,获取得快些
公告链接是要使用的。
step four 网址转pdf网址
import os
import pandas as pddef trans_url_to_pdfurl():pre_dir = r'E:/temp003/'tar_dir = r'E:/temp005/'file_list = os.listdir(pre_dir)for file_one in file_list:ticker = file_one[0:6]pre_file_path = pre_dir + file_onedf = pd.read_excel(pre_file_path,engine='openpyxl')url_list = df['公告链接'].to_list()pdf_url_list = []for u_one in url_list:u_one_00 = u_one.split('&')node_00 = u_one_00[1].replace('announcementId=','')node_01 = u_one_00[-1].replace('announcementTime=','')node_01 = node_01[0:10]tar_node = f'http://static.cninfo.com.cn/finalpage/{node_01}/{node_00}.PDF'pdf_url_list.append(tar_node)passpdf_url_list_str = '\n'.join(pdf_url_list)with open(f'{tar_dir}/{ticker}.txt', mode='w', encoding='utf-8') as fw:fw.write(pdf_url_list_str)passpassif __name__ == '__main__':trans_url_to_pdfurl()pass
至此,批量下载pdf工具 用于下载的pdf网址就准备好了。
视频
https://www.bilibili.com/video/BV1ASZwYhEGn/
https://www.bilibili.com/video/BV1oEZwYDE6N/
https://www.bilibili.com/video/BV1wuZwYZEJe/
https://www.bilibili.com/video/BV1XtZwYyEo4/