2024年专题
- 量化专题 共计 102 篇
涵盖量化框架、数据篇、风险与收益、策略篇、多因子模型、编程篇、图形篇、机器学习、Backtrader等主题,全方面了解量化领域知识。 - 机器学习共计29篇
涵盖机器学习基本介绍、监督学习、集成算法、无监督学习以及机器学习实战。 - 研报复现 对光大、广发等券商研报进行复现,主要包括RSRS、技术指标等复现和学习。
- 券商研报 共计 9 篇
- 技术指标 共计29篇
- 文献精汇共计9篇
收录国内外机器学习、量化交易、人工智能方面的文献及研究。 - 聊一聊系列共计20篇
该专题围绕金融行业政策及法律法规解读以及券商信息系统介绍。券商信息系统方面介绍券商业务系统,包括新一代集中交易(经纪业务系统)、公募基金券商结算系统、券商场外柜台交易系统、转融通等。 - 金融周报 共计104篇
基于基金公司研报实现对ETF、可转债每周行情分析。
介绍
开发一个机器学习模型来尝试通过线性回归分析来预测 Apple 股票的价格会很有趣。PyCaret 的库,这是一个开源的 Python 低代码机器学习库,可以自动化机器学习工作流程,非常适合像我这样的机器学习初学者。
线性回归分析
线性回归分析用于根据另一个变量的值预测变量的值,例如股票的收盘价。
在这个项目中使用了机器学习,通过 PyCaret 的回归库为数据集创建、测试和确定最佳回归模型。为了获得有关机器学习及其如何影响我们日常生活的更多信息,请单击此处访问麻省理工学院斯隆学院网站上发布的文章,该文章深入探讨了该主题。
获取数据
# Obtaining data
aapl = pd.read_csv('./data/aapl.csv') # Last 10 years
aapl = aapl.set_index('Date')
# See dataframe
aapl.head(10)
Open | High | Low | Close | Volume | Dividends | Stock Splits | |
---|---|---|---|---|---|---|---|
Date | |||||||
2015-01-07 00:00:00-05:00 | 23.872833 | 24.095527 | 23.761486 | 23.995316 | 160423600 | 0.0 | 0.0 |
2015-01-08 00:00:00-05:00 | 24.324901 | 24.975168 | 24.206871 | 24.917267 | 237458000 | 0.0 | 0.0 |
2015-01-09 00:00:00-05:00 | 25.090962 | 25.220125 | 24.543135 | 24.943985 | 214798000 | 0.0 | 0.0 |
2015-01-12 00:00:00-05:00 | 25.075387 | 25.082067 | 24.229149 | 24.329361 | 198603200 | 0.0 | 0.0 |
2015-01-13 00:00:00-05:00 | 24.814830 | 25.119922 | 24.253641 | 24.545370 | 268367600 | 0.0 | 0.0 |
2015-01-14 00:00:00-05:00 | 24.282595 | 24.605501 | 24.162340 | 24.451843 | 195826400 | 0.0 | 0.0 |
2015-01-15 00:00:00-05:00 | 24.496380 | 24.509741 | 23.752582 | 23.788212 | 240056000 | 0.0 | 0.0 |
2015-01-16 00:00:00-05:00 | 23.834977 | 23.957459 | 23.427446 | 23.603374 | 314053200 | 0.0 | 0.0 |
2015-01-20 00:00:00-05:00 | 24.015355 | 24.267000 | 23.716945 | 24.211327 | 199599600 | 0.0 | 0.0 |
2015-01-21 00:00:00-05:00 | 24.262544 | 24.732429 | 24.111112 | 24.396162 | 194303600 | 0.0 | 0.0 |
探索性分析
# Using Pandas Profiling to generate a report on our dataframe
Profile_1 = pp.ProfileReport(aapl)
Profile_1.to_file("Report1.html")
删除列
# Removing 'Dividends' and 'Stock Splits' columns
aapl = aapl.drop(['Dividends','Stock Splits'], axis = 1)
# See results
aapl.head(10)
Open | High | Low | Close | Volume | |
---|---|---|---|---|---|
Date | |||||
2015-01-07 00:00:00-05:00 | 23.872833 | 24.095527 | 23.761486 | 23.995316 | 160423600 |
2015-01-08 00:00:00-05:00 | 24.324901 | 24.975168 | 24.206871 | 24.917267 | 237458000 |
2015-01-09 00:00:00-05:00 | 25.090962 | 25.220125 | 24.543135 | 24.943985 | 214798000 |
2015-01-12 00:00:00-05:00 | 25.075387 | 25.082067 | 24.229149 | 24.329361 | 198603200 |
2015-01-13 00:00:00-05:00 | 24.814830 | 25.119922 | 24.253641 | 24.545370 | 268367600 |
2015-01-14 00:00:00-05:00 | 24.282595 | 24.605501 | 24.162340 | 24.451843 | 195826400 |
2015-01-15 00:00:00-05:00 | 24.496380 | 24.509741 | 23.752582 | 23.788212 | 240056000 |
2015-01-16 00:00:00-05:00 | 23.834977 | 23.957459 | 23.427446 | 23.603374 | 314053200 |
2015-01-20 00:00:00-05:00 | 24.015355 | 24.267000 | 23.716945 | 24.211327 | 199599600 |
2015-01-21 00:00:00-05:00 | 24.262544 | 24.732429 | 24.111112 | 24.396162 | 194303600 |
添加简单移动平均线
为了增加 PyCaret 回归模型要分析的特征数量,我将为目标变量(即收盘价)添加两个简单的移动平均线,期望它能提高我们模型的准确性值。
# Adding two simple moving averages in order to increase the number of features to be analyzed by PyCaret Regression models
aapl['SMA7'] = aapl.Close.rolling(window=7).mean().round(2)
aapl['SMA30'] = aapl.Close.rolling(window=30).mean().round(2)
# See results
aapl.head(10)
Open | High | Low | Close | Volume | SMA7 | SMA30 | |
---|---|---|---|---|---|---|---|
Date | |||||||
2015-01-07 00:00:00-05:00 | 23.872833 | 24.095527 | 23.761486 | 23.995316 | 160423600 | NaN | NaN |
2015-01-08 00:00:00-05:00 | 24.324901 | 24.975168 | 24.206871 | 24.917267 | 237458000 | NaN | NaN |
2015-01-09 00:00:00-05:00 | 25.090962 | 25.220125 | 24.543135 | 24.943985 | 214798000 | NaN | NaN |
2015-01-12 00:00:00-05:00 | 25.075387 | 25.082067 | 24.229149 | 24.329361 | 198603200 | NaN | NaN |
2015-01-13 00:00:00-05:00 | 24.814830 | 25.119922 | 24.253641 | 24.545370 | 268367600 | NaN | NaN |
2015-01-14 00:00:00-05:00 | 24.282595 | 24.605501 | 24.162340 | 24.451843 | 195826400 | NaN | NaN |
2015-01-15 00:00:00-05:00 | 24.496380 | 24.509741 | 23.752582 | 23.788212 | 240056000 | 24.42 | NaN |
2015-01-16 00:00:00-05:00 | 23.834977 | 23.957459 | 23.427446 | 23.603374 | 314053200 | 24.37 | NaN |
2015-01-20 00:00:00-05:00 | 24.015355 | 24.267000 | 23.716945 | 24.211327 | 199599600 | 24.27 | NaN |
2015-01-21 00:00:00-05:00 | 24.262544 | 24.732429 | 24.111112 | 24.396162 | 194303600 | 24.19 | NaN |
# Generating new report with pandas_profiling
Profile_2 = pp.ProfileReport(aapl)
Profile_2.to_file("Report2.html")
可以观察到 SMA7 列有 6 个缺失值,SMA30 列有 29 个缺失值,考虑到这些列的数据只够分别在第 7 天和第 30 天显示其移动平均值,这是正常的。
通过相关矩阵,可以观察到添加的两个移动平均线都与开盘价、最低价、最高价和收盘价具有高水平的相关性。
image.png
绘制过去 10 年的 K 线图
# Using plotly to plot a candlestick chart of the last 20 years
fig = go.Figure(data=[go.Candlestick(x = aapl.index,open = aapl.Open,high = aapl.High,low = aapl.Low,close = aapl.Close),go.Scatter(x=aapl.index, y = aapl.SMA7, line=dict(color='orange',width=1),name='SMA7'),go.Scatter(x=aapl.index, y = aapl.SMA30, line=dict(color='green',width=1.5),name='SMA30')])
fig.update_layout(title = 'Apple stocks from June 11th, 2015 to June 10th, 2025')
fig.update_layout(autosize=False,width=1200,height=800,)
fig.show()
svg
创建新 DataFrame
将创建一个包含过去 2 年数据的新 DataFrame,并使用这些数据来测试我们的预测模型预测过去 2 个交易年的收盘价的能力。
# Creating a new dataframe containing the last 2 years data to later test how well our predicting model will compare to the closing prices
aapl_predict = aapl.tail(506)
# See results
aapl_predict
Open | High | Low | Close | Volume | SMA7 | SMA30 | |
---|---|---|---|---|---|---|---|
Date | |||||||
2022-12-30 00:00:00-05:00 | 127.073710 | 128.597677 | 126.103905 | 128.577881 | 77034200 | 129.38 | 139.47 |
2023-01-03 00:00:00-05:00 | 128.924229 | 129.537772 | 122.877812 | 123.768448 | 112117500 | 127.91 | 138.63 |
2023-01-04 00:00:00-05:00 | 125.569520 | 127.321104 | 123.778358 | 125.045036 | 89113600 | 127.08 | 137.81 |
2023-01-05 00:00:00-05:00 | 125.807014 | 126.440353 | 123.461682 | 123.718971 | 80962700 | 126.11 | 137.05 |
2023-01-06 00:00:00-05:00 | 124.698677 | 128.934129 | 123.590330 | 128.271103 | 87754700 | 126.05 | 136.37 |
... | ... | ... | ... | ... | ... | ... | ... |
2024-12-30 00:00:00-05:00 | 252.229996 | 253.500000 | 250.750000 | 252.199997 | 35557500 | 254.94 | 243.14 |
2024-12-31 00:00:00-05:00 | 252.440002 | 253.279999 | 249.429993 | 250.419998 | 39480700 | 255.03 | 243.99 |
2025-01-02 00:00:00-05:00 | 248.929993 | 249.100006 | 241.820007 | 243.850006 | 55740700 | 253.51 | 244.52 |
2025-01-03 00:00:00-05:00 | 243.360001 | 244.179993 | 241.889999 | 243.360001 | 40244100 | 251.81 | 245.02 |
2025-01-06 00:00:00-05:00 | 244.309998 | 247.330002 | 243.199997 | 245.000000 | 45007100 | 249.92 | 245.55 |
506 rows × 7 columns
aapl_predict 包含 2022 年 12 月 30 日至 2025 年 1 月 06 日的数据
# Removing last 2 years from the original dataframe
aapl.drop(aapl_predict.index,inplace = True)
# See results
aapl.head(10)
Open | High | Low | Close | Volume | SMA7 | SMA30 | |
---|---|---|---|---|---|---|---|
Date | |||||||
2015-01-07 00:00:00-05:00 | 23.872833 | 24.095527 | 23.761486 | 23.995316 | 160423600 | NaN | NaN |
2015-01-08 00:00:00-05:00 | 24.324901 | 24.975168 | 24.206871 | 24.917267 | 237458000 | NaN | NaN |
2015-01-09 00:00:00-05:00 | 25.090962 | 25.220125 | 24.543135 | 24.943985 | 214798000 | NaN | NaN |
2015-01-12 00:00:00-05:00 | 25.075387 | 25.082067 | 24.229149 | 24.329361 | 198603200 | NaN | NaN |
2015-01-13 00:00:00-05:00 | 24.814830 | 25.119922 | 24.253641 | 24.545370 | 268367600 | NaN | NaN |
2015-01-14 00:00:00-05:00 | 24.282595 | 24.605501 | 24.162340 | 24.451843 | 195826400 | NaN | NaN |
2015-01-15 00:00:00-05:00 | 24.496380 | 24.509741 | 23.752582 | 23.788212 | 240056000 | 24.42 | NaN |
2015-01-16 00:00:00-05:00 | 23.834977 | 23.957459 | 23.427446 | 23.603374 | 314053200 | 24.37 | NaN |
2015-01-20 00:00:00-05:00 | 24.015355 | 24.267000 | 23.716945 | 24.211327 | 199599600 | 24.27 | NaN |
2015-01-21 00:00:00-05:00 | 24.262544 | 24.732429 | 24.111112 | 24.396162 | 194303600 | 24.19 | NaN |
我们将用于创建预测模型的原始数据帧现在包含从 2022-12-29 到 2015-01-07的数据
# Removing NaN
aapl.dropna(inplace=True)
# See Results
aapl.head(10)
Open | High | Low | Close | Volume | SMA7 | SMA30 | |
---|---|---|---|---|---|---|---|
Date | |||||||
2015-02-19 00:00:00-05:00 | 28.724684 | 28.847650 | 28.691150 | 28.717978 | 149449600 | 28.28 | 25.96 |
2015-02-20 00:00:00-05:00 | 28.755990 | 28.952736 | 28.628555 | 28.952736 | 195793600 | 28.52 | 26.12 |
2015-02-23 00:00:00-05:00 | 29.068984 | 29.735231 | 28.988498 | 29.735231 | 283896400 | 28.78 | 26.28 |
2015-02-24 00:00:00-05:00 | 29.721825 | 29.869385 | 29.326100 | 29.549673 | 276912400 | 28.96 | 26.44 |
2015-02-25 00:00:00-05:00 | 29.413290 | 29.422235 | 28.650904 | 28.793991 | 298846800 | 29.02 | 26.59 |
2015-02-26 00:00:00-05:00 | 28.793997 | 29.259030 | 28.306609 | 29.158422 | 365150000 | 29.10 | 26.74 |
2015-02-27 00:00:00-05:00 | 29.064518 | 29.191956 | 28.671030 | 28.720217 | 248059200 | 29.09 | 26.88 |
2015-03-02 00:00:00-05:00 | 28.896841 | 29.127121 | 28.684447 | 28.861069 | 192386800 | 29.11 | 27.05 |
2015-03-03 00:00:00-05:00 | 28.832006 | 28.957207 | 28.637495 | 28.921434 | 151265200 | 29.11 | 27.23 |
2015-03-04 00:00:00-05:00 | 28.863306 | 28.966148 | 28.688919 | 28.738102 | 126665200 | 28.96 | 27.38 |
使用 PyCaret 创建预测模型
# Importing regression lib from PyCaret
from pycaret.regression import *setup(data=aapl,target='Close',session_id=123, remove_multicollinearity=True,multicollinearity_threshold=0.9)
Description | Value | |
---|---|---|
0 | Session id | 123 |
1 | Target | Close |
2 | Target type | Regression |
3 | Original data shape | (1981, 7) |
4 | Transformed data shape | (1981, 3) |
5 | Transformed train set shape | (1386, 3) |
6 | Transformed test set shape | (595, 3) |
7 | Numeric features | 6 |
8 | Preprocess | True |
9 | Imputation type | simple |
10 | Numeric imputation | mean |
11 | Categorical imputation | mode |
12 | Remove multicollinearity | True |
13 | Multicollinearity threshold | 0.900000 |
14 | Fold Generator | KFold |
15 | Fold Number | 10 |
16 | CPU Jobs | -1 |
17 | Use GPU | False |
18 | Log Experiment | False |
19 | Experiment Name | reg-default-name |
20 | USI | bff1 |
<pycaret.regression.oop.RegressionExperiment at 0x14fc6f430>
# Obtaining top 3 best models
top3 = compare_models(n_select = 3)
Model | MAE | MSE | RMSE | R2 | RMSLE | MAPE | TT (Sec) | |
---|---|---|---|---|---|---|---|---|
lr | Linear Regression | 0.5492 | 0.9135 | 0.9472 | 0.9996 | 0.0100 | 0.0072 | 0.8300 |
en | Elastic Net | 0.5479 | 0.9128 | 0.9469 | 0.9996 | 0.0100 | 0.0071 | 0.0090 |
lar | Least Angle Regression | 0.5492 | 0.9135 | 0.9472 | 0.9996 | 0.0100 | 0.0072 | 0.2030 |
llar | Lasso Least Angle Regression | 0.5479 | 0.9128 | 0.9469 | 0.9996 | 0.0100 | 0.0071 | 0.2100 |
br | Bayesian Ridge | 0.5491 | 0.9135 | 0.9472 | 0.9996 | 0.0100 | 0.0072 | 0.0090 |
lasso | Lasso Regression | 0.5479 | 0.9128 | 0.9469 | 0.9996 | 0.0100 | 0.0071 | 0.2210 |
ridge | Ridge Regression | 0.5491 | 0.9135 | 0.9472 | 0.9996 | 0.0100 | 0.0072 | 0.0090 |
rf | Random Forest Regressor | 0.6098 | 1.1406 | 1.0606 | 0.9995 | 0.0104 | 0.0075 | 0.0530 |
et | Extra Trees Regressor | 0.6197 | 1.2088 | 1.0921 | 0.9995 | 0.0104 | 0.0075 | 0.0420 |
gbr | Gradient Boosting Regressor | 0.6332 | 1.1324 | 1.0585 | 0.9995 | 0.0110 | 0.0083 | 0.0290 |
lightgbm | Light Gradient Boosting Machine | 0.6249 | 1.1827 | 1.0797 | 0.9995 | 0.0109 | 0.0080 | 0.7900 |
xgboost | Extreme Gradient Boosting | 0.7092 | 1.5144 | 1.2141 | 0.9994 | 0.0126 | 0.0091 | 0.0200 |
catboost | CatBoost Regressor | 0.7079 | 1.5438 | 1.2157 | 0.9994 | 0.0130 | 0.0095 | 0.2650 |
dt | Decision Tree Regressor | 0.7477 | 1.7109 | 1.3031 | 0.9993 | 0.0131 | 0.0094 | 0.0080 |
ada | AdaBoost Regressor | 1.9198 | 5.8222 | 2.4085 | 0.9976 | 0.0616 | 0.0440 | 0.0240 |
omp | Orthogonal Matching Pursuit | 40.9516 | 2098.7460 | 45.7906 | 0.1216 | 0.6953 | 0.8042 | 0.0100 |
dummy | Dummy Regressor | 43.5007 | 2398.9002 | 48.9572 | -0.0028 | 0.7055 | 0.8631 | 0.0110 |
knn | K Neighbors Regressor | 40.2368 | 2399.9460 | 48.9487 | -0.0047 | 0.6900 | 0.7649 | 0.0170 |
huber | Huber Regressor | 49.0278 | 4457.0476 | 66.7169 | -0.8638 | 1.0082 | 0.6933 | 0.0090 |
par | Passive Aggressive Regressor | 84.8315 | 13155.8677 | 108.9942 | -4.5973 | 1.1798 | 2.0757 | 0.0080 |
创建模型
ridge = create_model('ridge',fold = 10)
MAE | MSE | RMSE | R2 | RMSLE | MAPE | |
---|---|---|---|---|---|---|
Fold | ||||||
0 | 0.6455 | 1.2438 | 1.1153 | 0.9995 | 0.0108 | 0.0079 |
1 | 0.5131 | 0.6515 | 0.8072 | 0.9997 | 0.0094 | 0.0072 |
2 | 0.5559 | 0.8599 | 0.9273 | 0.9997 | 0.0107 | 0.0074 |
3 | 0.5645 | 1.1979 | 1.0945 | 0.9995 | 0.0104 | 0.0071 |
4 | 0.5961 | 1.2698 | 1.1269 | 0.9995 | 0.0113 | 0.0078 |
5 | 0.4975 | 0.5994 | 0.7742 | 0.9998 | 0.0081 | 0.0064 |
6 | 0.5088 | 0.6393 | 0.7996 | 0.9997 | 0.0092 | 0.0066 |
7 | 0.5078 | 0.8698 | 0.9326 | 0.9996 | 0.0087 | 0.0065 |
8 | 0.5362 | 1.0269 | 1.0134 | 0.9995 | 0.0109 | 0.0068 |
9 | 0.5661 | 0.7767 | 0.8813 | 0.9997 | 0.0111 | 0.0081 |
Mean | 0.5491 | 0.9135 | 0.9472 | 0.9996 | 0.0100 | 0.0072 |
Std | 0.0442 | 0.2441 | 0.1276 | 0.0001 | 0.0011 | 0.0006 |
br = create_model('br',fold=10)
MAE | MSE | RMSE | R2 | RMSLE | MAPE | |
---|---|---|---|---|---|---|
Fold | ||||||
0 | 0.6455 | 1.2438 | 1.1153 | 0.9995 | 0.0108 | 0.0079 |
1 | 0.5131 | 0.6515 | 0.8072 | 0.9997 | 0.0094 | 0.0072 |
2 | 0.5559 | 0.8599 | 0.9273 | 0.9997 | 0.0107 | 0.0074 |
3 | 0.5645 | 1.1979 | 1.0945 | 0.9995 | 0.0104 | 0.0071 |
4 | 0.5961 | 1.2698 | 1.1269 | 0.9995 | 0.0113 | 0.0078 |
5 | 0.4975 | 0.5994 | 0.7742 | 0.9998 | 0.0081 | 0.0064 |
6 | 0.5088 | 0.6393 | 0.7996 | 0.9997 | 0.0092 | 0.0066 |
7 | 0.5078 | 0.8698 | 0.9326 | 0.9996 | 0.0087 | 0.0065 |
8 | 0.5362 | 1.0269 | 1.0134 | 0.9995 | 0.0109 | 0.0068 |
9 | 0.5661 | 0.7767 | 0.8813 | 0.9997 | 0.0111 | 0.0081 |
Mean | 0.5491 | 0.9135 | 0.9472 | 0.9996 | 0.0100 | 0.0072 |
Std | 0.0442 | 0.2441 | 0.1276 | 0.0001 | 0.0011 | 0.0006 |
lar = create_model('lar',fold=10)
MAE | MSE | RMSE | R2 | RMSLE | MAPE | |
---|---|---|---|---|---|---|
Fold | ||||||
0 | 0.6455 | 1.2438 | 1.1153 | 0.9995 | 0.0108 | 0.0079 |
1 | 0.5131 | 0.6515 | 0.8072 | 0.9997 | 0.0094 | 0.0072 |
2 | 0.5559 | 0.8599 | 0.9273 | 0.9997 | 0.0107 | 0.0074 |
3 | 0.5645 | 1.1979 | 1.0945 | 0.9995 | 0.0104 | 0.0071 |
4 | 0.5961 | 1.2698 | 1.1269 | 0.9995 | 0.0113 | 0.0078 |
5 | 0.4975 | 0.5994 | 0.7742 | 0.9998 | 0.0081 | 0.0064 |
6 | 0.5089 | 0.6394 | 0.7996 | 0.9997 | 0.0092 | 0.0066 |
7 | 0.5078 | 0.8698 | 0.9326 | 0.9996 | 0.0087 | 0.0065 |
8 | 0.5362 | 1.0269 | 1.0134 | 0.9995 | 0.0109 | 0.0068 |
9 | 0.5661 | 0.7767 | 0.8813 | 0.9997 | 0.0111 | 0.0081 |
Mean | 0.5492 | 0.9135 | 0.9472 | 0.9996 | 0.0100 | 0.0072 |
Std | 0.0442 | 0.2441 | 0.1276 | 0.0001 | 0.0011 | 0.0006 |
优化模型
为了改进我们每个模型的回归误差指标,我们可以使用 PyCaret 的 tune_model( ) 来查看我们可以做出的改进。
# Tuning ridge
ridge_params = {'alpha' : [0.02,0.024,0.025,0.025,0.026,0.03]}
tune_ridge = tune_model(ridge, n_iter=1000, optimize='RMSE',custom_grid = ridge_params)
MAE | MSE | RMSE | R2 | RMSLE | MAPE | |
---|---|---|---|---|---|---|
Fold | ||||||
0 | 0.6455 | 1.2438 | 1.1153 | 0.9995 | 0.0108 | 0.0079 |
1 | 0.5131 | 0.6515 | 0.8072 | 0.9997 | 0.0094 | 0.0072 |
2 | 0.5559 | 0.8599 | 0.9273 | 0.9997 | 0.0107 | 0.0074 |
3 | 0.5645 | 1.1979 | 1.0945 | 0.9995 | 0.0104 | 0.0071 |
4 | 0.5961 | 1.2698 | 1.1269 | 0.9995 | 0.0113 | 0.0078 |
5 | 0.4975 | 0.5994 | 0.7742 | 0.9998 | 0.0081 | 0.0064 |
6 | 0.5089 | 0.6394 | 0.7996 | 0.9997 | 0.0092 | 0.0066 |
7 | 0.5078 | 0.8698 | 0.9326 | 0.9996 | 0.0087 | 0.0065 |
8 | 0.5362 | 1.0269 | 1.0134 | 0.9995 | 0.0109 | 0.0068 |
9 | 0.5661 | 0.7767 | 0.8813 | 0.9997 | 0.0111 | 0.0081 |
Mean | 0.5492 | 0.9135 | 0.9472 | 0.9996 | 0.0100 | 0.0072 |
Std | 0.0442 | 0.2441 | 0.1276 | 0.0001 | 0.0011 | 0.0006 |
MAE 指标值实现的改进
# Tuning Bayesian Ridge
tune_br = tune_model(br,n_iter=1000, optimize='RMSE')
MAE | MSE | RMSE | R2 | RMSLE | MAPE | |
---|---|---|---|---|---|---|
Fold | ||||||
0 | 0.6455 | 1.2438 | 1.1153 | 0.9995 | 0.0108 | 0.0079 |
1 | 0.5131 | 0.6515 | 0.8072 | 0.9997 | 0.0094 | 0.0072 |
2 | 0.5559 | 0.8599 | 0.9273 | 0.9997 | 0.0107 | 0.0074 |
3 | 0.5645 | 1.1979 | 1.0945 | 0.9995 | 0.0104 | 0.0071 |
4 | 0.5961 | 1.2698 | 1.1269 | 0.9995 | 0.0113 | 0.0078 |
5 | 0.4975 | 0.5994 | 0.7742 | 0.9998 | 0.0081 | 0.0064 |
6 | 0.5088 | 0.6393 | 0.7996 | 0.9997 | 0.0092 | 0.0066 |
7 | 0.5077 | 0.8698 | 0.9326 | 0.9996 | 0.0087 | 0.0065 |
8 | 0.5362 | 1.0269 | 1.0134 | 0.9995 | 0.0109 | 0.0068 |
9 | 0.5661 | 0.7767 | 0.8813 | 0.9997 | 0.0111 | 0.0081 |
Mean | 0.5491 | 0.9135 | 0.9472 | 0.9996 | 0.0100 | 0.0072 |
Std | 0.0442 | 0.2441 | 0.1276 | 0.0001 | 0.0011 | 0.0006 |
MAE 指标值进行了改进
# Tuning Least Angle Regression
tune_lar = tune_model(lar,n_iter=1000, optimize='RMSE')
MAE | MSE | RMSE | R2 | RMSLE | MAPE | |
---|---|---|---|---|---|---|
Fold | ||||||
0 | 0.6455 | 1.2438 | 1.1153 | 0.9995 | 0.0108 | 0.0079 |
1 | 0.5131 | 0.6515 | 0.8072 | 0.9997 | 0.0094 | 0.0072 |
2 | 0.5559 | 0.8599 | 0.9273 | 0.9997 | 0.0107 | 0.0074 |
3 | 0.5645 | 1.1979 | 1.0945 | 0.9995 | 0.0104 | 0.0071 |
4 | 0.5961 | 1.2698 | 1.1269 | 0.9995 | 0.0113 | 0.0078 |
5 | 0.4975 | 0.5994 | 0.7742 | 0.9998 | 0.0081 | 0.0064 |
6 | 0.5089 | 0.6394 | 0.7996 | 0.9997 | 0.0092 | 0.0066 |
7 | 0.5078 | 0.8698 | 0.9326 | 0.9996 | 0.0087 | 0.0065 |
8 | 0.5362 | 1.0269 | 1.0134 | 0.9995 | 0.0109 | 0.0068 |
9 | 0.5661 | 0.7767 | 0.8813 | 0.9997 | 0.0111 | 0.0081 |
Mean | 0.5492 | 0.9135 | 0.9472 | 0.9996 | 0.0100 | 0.0072 |
Std | 0.0442 | 0.2441 | 0.1276 | 0.0001 | 0.0011 | 0.0006 |
可视化数据
PyCaret 的另一个重要功能是可以绘制相关图形以可视化有关我们模型的信息。
# Error plot
plot_model(tune_br, plot = 'error')
png
BayesianRidge 的预测误差绘制了一个 R² 图形,显示了模型对数据的适应程度。R² 为 0.9999 表示我们的模型可以很好地适应 99.99% 的数据,这很好!
# Importance Feature Plot
plot_model(tune_br, plot = 'feature')
png
特征重要性图帮助我们了解模型如何处理数据集的每个特征。例如,我们可以看到,对于我们的模型预测来说,最有用的特征是每个交易日协商的较低价格,而交易量和 30 个周期的简单移动平均线根本没有使用。
定型模型
# Finalizing model
final_br_model = finalize_model(tune_br)
# Predicting last 2 years
prediction = predict_model(final_br_model,data = aapl_predict)
prediction
Model | MAE | MSE | RMSE | R2 | RMSLE | MAPE | |
---|---|---|---|---|---|---|---|
0 | Bayesian Ridge | 1.1944 | 2.3733 | 1.5405 | 0.9971 | 0.0081 | 0.0064 |
Open | High | Low | Volume | SMA7 | SMA30 | Close | prediction_label | |
---|---|---|---|---|---|---|---|---|
Date | ||||||||
2022-12-30 00:00:00-05:00 | 127.073708 | 128.597672 | 126.103905 | 77034200 | 129.380005 | 139.470001 | 128.577881 | 127.606610 |
2023-01-03 00:00:00-05:00 | 128.924225 | 129.537766 | 122.877815 | 112117500 | 127.910004 | 138.630005 | 123.768448 | 124.459987 |
2023-01-04 00:00:00-05:00 | 125.569519 | 127.321106 | 123.778358 | 89113600 | 127.080002 | 137.809998 | 125.045036 | 125.289650 |
2023-01-05 00:00:00-05:00 | 125.807014 | 126.440353 | 123.461685 | 80962700 | 126.110001 | 137.050003 | 123.718971 | 124.938037 |
2023-01-06 00:00:00-05:00 | 124.698677 | 128.934128 | 123.590332 | 87754700 | 126.050003 | 136.369995 | 128.271103 | 125.093709 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
2024-12-30 00:00:00-05:00 | 252.229996 | 253.500000 | 250.750000 | 35557500 | 254.940002 | 243.139999 | 252.199997 | 254.026220 |
2024-12-31 00:00:00-05:00 | 252.440002 | 253.279999 | 249.429993 | 39480700 | 255.029999 | 243.990005 | 250.419998 | 252.700273 |
2025-01-02 00:00:00-05:00 | 248.929993 | 249.100006 | 241.820007 | 55740700 | 253.509995 | 244.520004 | 243.850006 | 245.032614 |
2025-01-03 00:00:00-05:00 | 243.360001 | 244.179993 | 241.889999 | 40244100 | 251.809998 | 245.020004 | 243.360001 | 245.046563 |
2025-01-06 00:00:00-05:00 | 244.309998 | 247.330002 | 243.199997 | 45007100 | 249.919998 | 245.550003 | 245.000000 | 246.394365 |
506 rows × 8 columns
绘制 Apple 股票的收盘价和预测价格
对于每个日期,我们现在有 Open、High、Low、Close 、Volume、SMA7、SMA30 和 Label,它代表 AAPL 在数据帧中包含的每一天的预测收盘价。
fig = px.line(round(prediction,2), x = prediction.index, y = ['Close','prediction_label'],title = 'AAPL close price x predicted price from June 10th, 2020 to June 10th, 2022')
newnames = {'Close':'Closing Price', 'prediction_label': 'Predicted Price'}
fig.for_each_trace(lambda t: t.update(name = newnames[t.name],legendgroup = newnames[t.name],hovertemplate = t.hovertemplate.replace(t.name, newnames[t.name])))
fig.update_traces(line=dict(width=2.5))
fig.update_layout(autosize=False,width=1200,height=800,title='AAPL Closing Price X Predicted Price from June 10th, 2020 to June 10th, 2022',margin=dict(l=0, r=0, t=80, b=0),font=dict(size=14)
)
fig.show("png")
png
Conclusion 结论
PyCaret 库提供了一种简单的方法来探索和测试机器学习的不同回归模型,并选择其中哪一个最适合所使用的数据集。
通过这项研究,有可能找到一个模型,它与我们的数据进行了很好的调整,并且能够高度准确地预测过去 2 年的收盘价,成功地表明了该时期 APPL 股票的走向。
参考
什么是线性回归?
[教程:了解 Python 中的回归误差指标](