【Alphalens】使用Alphalens配合Akshare进行双均线因子分析,附源码及常见问题

Alphalens 是非常著名的一个python因子分析库。但是该库由于目前已经不怎么维护,问题非常多。最新的使用建议使用alphalens-reloaded,地址:stefan-jansen/alphalens-reloaded: Performance analysis of predictive (alpha) stock factors (github.com)。

由于该库的demo都是基于国外雅虎财经的接口yfinance。如果使用国内的akshare配合使用会出现一下问题。需要对Alphalens接口非常熟悉。建议阅读其原始接口的注释,特别是get_clean_factor_and_forward_returns方法。

def get_clean_factor_and_forward_returns(factor,prices,groupby=None,binning_by_group=False,quantiles=5,bins=None,periods=(1, 5, 10),filter_zscore=20,groupby_labels=None,max_loss=0.35,zero_aware=False,cumulative_returns=True):"""Formats the factor data, pricing data, and group mappings into a DataFramethat contains aligned MultiIndex indices of timestamp and asset. Thereturned data will be formatted to be suitable for Alphalens functions.It is safe to skip a call to this function and still make use of Alphalensfunctionalities as long as the factor data conforms to the format returnedfrom get_clean_factor_and_forward_returns and documented hereParameters----------factor : pd.Series - MultiIndexA MultiIndex Series indexed by timestamp (level 0) and asset(level 1), containing the values for a single alpha factor.::-----------------------------------date    |    asset   |-----------------------------------|   AAPL     |   0.5-----------------------|   BA       |  -1.1-----------------------2014-01-01  |   CMG      |   1.7-----------------------|   DAL      |  -0.1-----------------------|   LULU     |   2.7-----------------------prices : pd.DataFrameA wide form Pandas DataFrame indexed by timestamp with assetsin the columns.Pricing data must span the factor analysis time period plus anadditional buffer window that is greater than the maximum numberof expected periods in the forward returns calculations.It is important to pass the correct pricing data in depending onwhat time of period your signal was generated so to avoid lookaheadbias, or  delayed calculations.'Prices' must contain at least an entry for each timestamp/assetcombination in 'factor'. This entry should reflect the buy pricefor the assets and usually it is the next available price after thefactor is computed but it can also be a later price if the factor ismeant to be traded later (e.g. if the factor is computed at marketopen but traded 1 hour after market open the price information shouldbe 1 hour after market open).'Prices' must also contain entries for timestamps following eachtimestamp/asset combination in 'factor', as many more timestampsas the maximum value in 'periods'. The asset price after 'period'timestamps will be considered the sell price for that asset whencomputing 'period' forward returns.::----------------------------------------------------| AAPL |  BA  |  CMG  |  DAL  |  LULU  |----------------------------------------------------Date     |      |      |       |       |        |----------------------------------------------------2014-01-01  |605.12| 24.58|  11.72| 54.43 |  37.14 |----------------------------------------------------2014-01-02  |604.35| 22.23|  12.21| 52.78 |  33.63 |----------------------------------------------------2014-01-03  |607.94| 21.68|  14.36| 53.94 |  29.37 |----------------------------------------------------groupby : pd.Series - MultiIndex or dictEither A MultiIndex Series indexed by date and asset,containing the period wise group codes for each asset, ora dict of asset to group mappings. If a dict is passed,it is assumed that group mappings are unchanged for theentire time period of the passed factor data.binning_by_group : boolIf True, compute quantile buckets separately for each group.This is useful when the factor values range vary considerablyacross gorups so that it is wise to make the binning group relative.You should probably enable this if the factor is intendedto be analyzed for a group neutral portfolioquantiles : int or sequence[float]Number of equal-sized quantile buckets to use in factor bucketing.Alternately sequence of quantiles, allowing non-equal-sized bucketse.g. [0, .10, .5, .90, 1.] or [.05, .5, .95]Only one of 'quantiles' or 'bins' can be not-Nonebins : int or sequence[float]Number of equal-width (valuewise) bins to use in factor bucketing.Alternately sequence of bin edges allowing for non-uniform bin widthe.g. [-4, -2, -0.5, 0, 10]Chooses the buckets to be evenly spaced according to the valuesthemselves. Useful when the factor contains discrete values.Only one of 'quantiles' or 'bins' can be not-Noneperiods : sequence[int]periods to compute forward returns on.filter_zscore : int or float, optionalSets forward returns greater than X standard deviationsfrom the the mean to nan. Set it to 'None' to avoid filtering.Caution: this outlier filtering incorporates lookahead bias.groupby_labels : dictA dictionary keyed by group code with values correspondingto the display name for each group.max_loss : float, optionalMaximum percentage (0.00 to 1.00) of factor data dropping allowed,computed comparing the number of items in the input factor index andthe number of items in the output DataFrame index.Factor data can be partially dropped due to being flawed itself(e.g. NaNs), not having provided enough price data to computeforward returns for all factor values, or because it is not possibleto perform binning.Set max_loss=0 to avoid Exceptions suppression.zero_aware : bool, optionalIf True, compute quantile buckets separately for positive and negativesignal values. This is useful if your signal is centered and zero isthe separation between long and short signals, respectively.cumulative_returns : bool, optionalIf True, forward returns columns will contain cumulative returns.Setting this to False is useful if you want to analyze how predictivea factor is for a single forward day.Returns-------merged_data : pd.DataFrame - MultiIndexA MultiIndex Series indexed by date (level 0) and asset (level 1),containing the values for a single alpha factor, forward returns foreach period, the factor quantile/bin that factor value belongs to, and(optionally) the group the asset belongs to.- forward returns column names follow  the format accepted bypd.Timedelta (e.g. '1D', '30m', '3h15m', '1D1h', etc)- 'date' index freq property (merged_data.index.levels[0].freq) will beset to a trading calendar (pandas DateOffset) inferred from the inputdata (see infer_trading_calendar for more details). This is currentlyused only in cumulative returns computation::-------------------------------------------------------------------|       | 1D  | 5D  | 10D  |factor|group|factor_quantile-------------------------------------------------------------------date   | asset |     |     |      |      |     |-------------------------------------------------------------------| AAPL  | 0.09|-0.01|-0.079|  0.5 |  G1 |      3--------------------------------------------------------| BA    | 0.02| 0.06| 0.020| -1.1 |  G2 |      5--------------------------------------------------------2014-01-01 | CMG   | 0.03| 0.09| 0.036|  1.7 |  G2 |      1--------------------------------------------------------| DAL   |-0.02|-0.06|-0.029| -0.1 |  G3 |      5--------------------------------------------------------| LULU  |-0.03| 0.05|-0.009|  2.7 |  G1 |      2--------------------------------------------------------See Also--------utils.get_clean_factorFor use when forward returns are already available."""forward_returns = compute_forward_returns(factor,prices,periods,filter_zscore,cumulative_returns,)factor_data = get_clean_factor(factor, forward_returns, groupby=groupby,groupby_labels=groupby_labels,quantiles=quantiles, bins=bins,binning_by_group=binning_by_group,max_loss=max_loss, zero_aware=zero_aware)return factor_data

源码

使用Akshare获取a股600519数据,然后使用alphalens-reloaded进行最基本的因子分析,因子使用5日均线与10日均线的交叉,代码如下:

import warnings
warnings.filterwarnings('ignore')
import pandas as pd
import alphalens
import seaborn as sns
import akshare as ak
from pytz import timezone
# %matplotlib inline
sns.set_style('white')
# pd.set_option('display.max_columns', None)
# pd.set_option('display.max_rows', None)# 使用 akshare 的 stock_zh_a_hist 函数
df = ak.stock_zh_a_hist(symbol='600519', period="daily", start_date='20200101', end_date='20201231', adjust="qfq")
# 调整 DataFrame 列名
df.rename(columns={
'日期': 'date',
'开盘': 'open',
'收盘': 'close',
'最高': 'high',
'最低': 'low',
'成交量': 'volume'
}, inplace=True)
df['asset'] = '600519'
# 计算开盘价和收盘价之差
# df['factor'] = df['close']
df['ma5'] = df['close'].rolling(window=5).mean().fillna(0)
df['ma10'] = df['close'].rolling(window=10).mean().fillna(0)
df['factor'] = df['ma5']-df['ma10']
df = df.iloc[20:]
df.head(30)# 使用dff,不影响原来的df
dff = df
dff['date'] = pd.to_datetime(dff['date'])
dff = dff.set_index(['date', 'asset'])
dff.index = dff.index.set_levels([dff.index.levels[0].tz_localize('UTC'), dff.index.levels[1]])
factor = dff['factor']# factor.head()
# print(factor)df['date'] = pd.to_datetime(df['date']).dt.tz_localize('UTC')  # convert date column to datetime format with UTC timezone
df.set_index(['date', 'asset'], inplace=True)# select 'close' column to create the prices dataframe
prices = df['close'].unstack('asset')
prices.head()
print(prices.index.tz)
print(factor.index.levels[0].tz)
# print(prices)# 现在对factor和prices进行对齐
# factor, prices = factor.align(prices, join='inner', axis=0)factor_data = alphalens.utils.get_clean_factor_and_forward_returns( factor,prices,groupby=None,binning_by_group=False,quantiles=2,bins=None,periods=(1, 5, 10),filter_zscore=20,groupby_labels=None,max_loss=0.35,zero_aware=True,cumulative_returns=True,)
# factor_data.head()alphalens.tears.create_full_tear_sheet(factor_data,long_short=False)

结果如图:

常见错误

  1. AttributeError: ‘Index’ object has no attribute ‘tz’
    时区问题,国外的数据默认都带了时区,国内的tushare、akshare需要自己把时区加上,可以参考上述源码的处理。

  2. MaxLossExceededError: max_loss (35.0%) exceeded 100.0%, consider increasing it.
    get_clean_factor_and_forward_returns函数默认的max_loss为35.0%,自己也可以配置,最开始使用默认的quantiles=5会出现这个问题,可以把入参quantiles改为2。该因子可分为正数和负数两类。

  3. Inferred frequency None from passed values does not conform to passed frequency C
    频率问题,解决频率问题可以将数据同步一下,可能是由于部分NaN值或者将factor与prices值对齐。

如有问题欢迎评论区留言或者私信。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/819494.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

【数据结构|C语言版】顺序表应用

前言1. 基于动态顺序表实现通讯录1.1 通讯录功能1.2 代码实现1.2.1 SeqList.h1.2.2 SeqList.c1.2.3 Contact.h1.2.4 Contact.c1.2.5 test.c 1.3 控制台测试1.3.1 添加联系人1.3.2 删除联系人1.3.3 修改联系人1.3.4 查找联系人1.3.5 清空通讯录1.3.6 通讯录读档和存档 2. 好题测…

Java SPI机制详解

Java SPI机制详解 1、什么是SPI? SPI 全称为 (Service Provider Interface) ,是JDK内置的一种服务提供发现机制。SPI是一种动态替换发现的机制, 比如有个接口,想运行时动态的给它添加实现,你只需要添加一个实现。我们…

B端:导航条长得不都一样吗?错了,这里看过来就懂了。

B端导航条看似都一样,大差不差,仔细看一下,其实各有各的不同,这里方向了十多个,大家仔细看细节。

avicat连接异常,错误编号2059-authentication plugin…

错误原因为密码方式不对,具体可自行百度 首先管理员执行cmd进入 mysql安装目录 bin下边 我的是C:\Program Files\MySQL\MySQL Server 8.2\bin> 执行 mysql -u -root -p 然后输入密码 123456 进入mysql数据库 use mysql 执行 ALTER USER rootlocalhost IDE…

关于沃进科技无线模块demo软件移植问题

文章目录 一、无线模块开发测试准备二、开发板硬件三、开发板默认功能上电默认界面功能选择界面数据包发送界面数据包接收显示界面射频性能测试界面参数设置界面固件信息显示界面 四、软件开发软件SDK框图1、射频硬件驱动(详见./radio/myRadio_gpio.c)2、…

51单片机实验04 -数码管的动态显示实验

目录 一、实验目的 二、实验内容 三、实验原理 四、实验方法 五,实验效果及代码 1,效果 2,代码 六,课后习题 1,使用定时器T0的中断函数1 从999999~0计时 1)效果 2)代码 2&#xff0c…

配置linux的oracle 21c启停服务

一、配置启停 1、使用root用户登陆 su - root 2、修改oratab文件 修改oratab文件,将红框里面的N改为“Y”,使启停脚本能够生效 vi /etc/oratab 3、验证 配置好后就能够使用 dbshut 停止服务 和 dbstart 启动服务 了 2.1启动服务 su - oracle dbstart…

什么是线程?线程和进程谁更弔?

第一个参数是所创建进程的pid。 第二个是线程的属性。 第三个参数是返回值为void*&#xff0c;参数也为void*的函数指针。 第四个参数是给第三个参数的参数&#xff0c;也就是给给函数传参。 #include<iostream> #include<pthread.h> #include<unistd.h>…

折叠面板组件(vue)

代码 <template><div class"collapse-info"><div class"collapse-title"><div class"title-left">{{ title }}</div><div click"changeHide"> <Button size"small" v-if"sho…

生产计划和排单管理怎么做

阅读本文&#xff0c;你将了解到&#xff1a;1、企业是如何制定生产计划和进行排单管理&#xff1f; 2.企业在执行生产计划和进行排单管理过程中会遇到那些问题&#xff1f; 3.企业如何高效利用工具去解决问题&#xff1f; 一、生产计划和排单管理是什么 1.生产计划和排单管理…

【uniapp】【uview2.0】【u-sticky】Sticky 吸顶

把pages.json文件中的 “navigationStyle"设置为"custom”, 出现的问题是&#xff0c;莫名奇妙多了个 一个高度 解决方法 /* 使用CSS的sticky定位 */ .sticky {/* #ifdef H5 */ position: -webkit-sticky;position: sticky;top: 0; /* 设置距顶部的距离 */z-ind…

[Python开发问题] Selenium ERROR: Unable to find a matching set of capabilities

&#x1f49d;&#x1f49d;&#x1f49d;欢迎莅临我的博客&#xff0c;很高兴能够在这里和您见面&#xff01;希望您在这里可以感受到一份轻松愉快的氛围&#xff0c;不仅可以获得有趣的内容和知识&#xff0c;也可以畅所欲言、分享您的想法和见解。 推荐:「stormsha的主页」…

在Spring Boot实战中碰到的拦截器与过滤器是什么?

在Spring Boot实战中&#xff0c;拦截器&#xff08;Interceptors&#xff09;和过滤器&#xff08;Filters&#xff09;是两个常用的概念&#xff0c;它们用于在应用程序中实现一些通用的逻辑&#xff0c;如日志记录、权限验证、请求参数处理等。虽然它们都可以用于对请求进行…

最大子树和(遇到的题)

题目是给出一个树&#xff0c;求其中最大的权值块 题解&#xff1a; #include <bits/stdc.h> using namespace std; const int N1e59; int dp[N]; //dp[i]表示第i结点为根最大权值 int w[N]; //记录每个结点的权值 int n; //点的数量 int t; //样例个数 …

Ubuntu安装VMVare Workstation pro 17.5.1

由于需要装Kali&#xff0c;我电脑是Ubuntu单系统&#xff0c;所以只能使用linux版本的虚拟机&#xff0c;通过这种方式来安装虚拟机和Kali镜像。 参考CSDN博客资料&#xff1a;https://blog.csdn.net/xiaochong0302/article/details/127420124 github代码资料&#xff1a;vm…

程序运行在 STM32H750 的外扩 FLASH 上两小时后死机

1. 问题现象 客户使用 STM32H750VBT6&#xff0c;通过 QSPI 外扩了一个 4M 的 NOR FLASH&#xff0c;采用memory map 模式。当程序跳转运行到外设 FLASH 后&#xff0c;大约两个小时后程序死机。 客户使用的 IDE 是 KEIL&#xff0c;此问题可以固定重现。 在 KEIL 调试模式下…

百货商场用户画像描绘and价值分析(下)

目录 内容概述数据说明技术点主要内容4 会员用户画像和特征字段创造4.1 构建会员用户基本特征标签4.2 会员用户词云分析 5 会员用户细分和营销方案制定5.1 会员用户的聚类分析及可视化5.2 对会员用户进行精细划分并分析不同群体带来的价值差异 内容概述 本项目内容主要是基于P…

Springboot+Vue项目-基于Java+MySQL的免税商品优选购物商城系统(附源码+演示视频+LW)

大家好&#xff01;我是程序猿老A&#xff0c;感谢您阅读本文&#xff0c;欢迎一键三连哦。 &#x1f49e;当前专栏&#xff1a;Java毕业设计 精彩专栏推荐&#x1f447;&#x1f3fb;&#x1f447;&#x1f3fb;&#x1f447;&#x1f3fb; &#x1f380; Python毕业设计 &…

【Qt】常用控件(LCD Number/进度条/日历)

需要云服务器等云产品来学习Linux可以移步/-->腾讯云<--/官网&#xff0c;轻量型云服务器低至112元/年&#xff0c;新用户首次下单享超低折扣。 目录 一、LCD Number(LCD显示器) 一个倒计时程序 二、ProgressBar(进度条) 1、创建一个进度条&#xff0c;100ms进度增加…

✌粤嵌—2024/3/14—判断子序列

代码实现&#xff1a; 方法一&#xff1a;一次遍历 bool isSubsequence(char *s, char *t) {if (strlen(s) 0) {return true;}int i 0;for (int j 0; j < strlen(t); j) {if (s[i] t[j]) {i;}if (i strlen(s)) {return true;}}return false; } 方法二&#xff1a;动态规…