第一阶段、一个简单策略入门量化投资

1-2 移动均线交叉策略1

第一阶段一个简单策略入门量化投资
1-2 移动均线交叉策略1
- 前言
- 获取数据
- 移动均线交叉策略
- 数据可视化
  - 绘制折线图
  - 绘制K线图
  - 绘制移动均线
- 移动均线交叉策略回测
  - 什么是回测
  - 回溯买卖信号
  - 计算收益
- 未完待续
- 完整代码

前言

本学期订了两个目标：探索量化投资与熟练掌握python
以量化投资为切入点，以python为工具，在研究策略过程中又可以练习python，两个目标相辅相成，想法很美好，于是就有了此文。
作为一个初学者，我定下了第一阶段的目标：
通过一个简单策略，梳理研究并编写一个交易策略的整个流程，从而入门量化投资。
我在网上找到了一篇不错的入门教程：
Python股市数据分析教程——学会它，或可以实现半“智能”炒股 (Part 1)
这也是我的主要参考资料，本文是在此基础上梳理出来的，也可以说是学习过程的记录。下面开始正文

获取数据

在1-1股票数据预处理练习中我们已经介绍了如何获取一只股票的数据并且进行一定的修改以满足我们的需要
使用如下代码，获取苹果公司2010年至今的股票数据

import numpy as np
import pandas as pd
import pandas_datareader.data as web
import datetime
import time##### get historical data of a stock
# the first time we get the stock data from the interface of yahoo
# but we may not want to do this step all the time, so we try to save it in .csv
#start = datetime.datetime(2016, 1, 1)
#end = datetime.date.today()
#apple = web.DataReader("AAPL", "yahoo", start, end)
#print(apple.head())
#apple.to_csv(path_or_buf='data_AAPL.csv')##### read the data from csv
apple=pd.read_csv(filepath_or_buffer='data_AAPL.csv')
print(apple.head())
# note that some format(data type) of data we read from .csv has changed
# for example the attribute 'Date' should be the index of the dataframe, and the date type changed from datetime to string
# this changes would made some methods got trouble
# So we need to make the following changes
date_list = []
for i in range(len(apple)):date_str = apple['Date'][i]t = time.strptime(date_str, "%Y-%m-%d")temp_date = datetime.datetime(t[0], t[1], t[2])date_list.append(temp_date)
apple['DateTime'] = pd.Series(date_list,apple.index)
del apple['Date']
apple = apple.set_index('DateTime')

这样，数据准备工作就完成了

移动均线交叉策略

许多交易策略都试图去找到股票价格的趋势，移动均线交叉策略就是其中最简单的一种。我们来讨论下它是如何工作的。
什么是移动均线？
对于序列xt以及时刻t，q天均线表示过去q天股价的均值：也就是说，如果MAtq表示t时刻的q天均线，那么：
这里写图片描述
显然，移动均线平滑了数据序列，并有助于识别股市的发展趋势。q值越大，移动均线就越难反映序列xt中的短期波动，但也更好的把握了整体的趋势。
什么是均线交叉策略
均线交叉策略的想法是，利用一长一短两条移动均线能够从”噪声”中识别股市的发展趋势。短期均线具有较小的q值，比较紧密地跟随股票的趋势发展，更能代表短期内的股价变动；而长期均线的q值较大，进而使得均线对股票波动的响应较小，而且更加平稳。
因此，移动均线交叉策略描述如下：
当短期均线超越长期均线时，说明股票价格短期内程上涨的趋势，我们将这时作为买入时机。当短期均线跌落到长期均线以下时，认为牛市结束，将股票卖出。
下面，我们通过可视化的方法，绘制出股票的历史数据图，以及两条长短均线，来直观的感受下这个策略。

数据可视化

绘制折线图

我们利用获取的数据中的股票收盘价，绘制出相应的折线图。

apple["Adj Close"].plot(grid=True)

收盘价历史数据折线图

绘制K线图

很显然这种表达过于单调，而且我们获取的数据还包含开盘价、最高价、最低价这些信息没有使用。
众所周知，金融数据通常以日本蜡烛图（即K线图）的形式绘制，这种图表最早在18世纪由日本米市商人命名。matplotlib可以绘制这样的图表，但操作起来比较复杂。于是此文的作者实现了一个函数，可以更容易地在pandas数据框架中创建蜡烛图，并使用它绘制我们的股票数据。你还可以参考matplotlib提供的示例戳这里.
我们调用这个函数，来可视化股票数据（代码可在文末查看）。

draw_candle.pandas_candlestick_ohlc(apple，stick = 'month')

其中stick参数用于指明绘制每个“小蜡烛”的时间间隔，效果如下：
这里写图片描述

绘制移动均线

首先为苹果股票创建了三条移动均线，滑动平均的范围分别是20天、50天、200天。随后，将其与股票数据一同绘制在图表中。
pandas提供了轻松计算移动均线的功能，下面的代码展示了这部分功能。

apple["20d"] = np.round(apple["Close"].rolling(window = 20, center = False).mean(), 2)
apple["50d"] = np.round(apple["Close"].rolling(window = 50, center = False).mean(), 2)
apple["100d"] = np.round(apple["Close"].rolling(window = 200, center = False).mean(), 2)
draw_candle.pandas_candlestick_ohlc(apple.loc['2016-01-04':'2017-09-01',:],stick='week',otherseries = ["20d", "50d", "100d"])

这里写图片描述
多多观察不同滑动长度的移动均线组合而成的交叉策略，以及均线交叉策略作用在不同股票上的图表，能够对该策略有一些更好的认识。

移动均线交叉策略回测

什么是回测

股票回测是指设定了某些股票策略后，基于历史已经发生过的真实行情数据，在历史上某一个时间点开始，严格按照设定的组合进行选股，并模拟真实金融市场交易的规则进行模型买入、模型卖出，得出一个时间段内的盈利率、最大回撤率等数据。
回测是股票模型创建中必不可少的环境，虽然回测效果非常好的策略实盘也不一定表现好，但连回测效果都不好的策略，就无法立足了。

回溯买卖信号

我们现在依照移动均线交叉策略的规则，回溯出在苹果股票上利用该策略产生的买卖信号，进而方便后面计算策略收益。
依照已经制定的策略，定义买卖点如下：
买点：短期均线>长期均线
卖点：短期均线<=长期均线
这个过程可通过下面代码完成：

##### compute the sub of the long time span moving average and the short one
# use regime to represent it
apple['20d-50d'] = apple['20d'] - apple['50d']
apple["Regime"] = np.where(apple['20d-50d'] > 0, 1, 0)
apple["Regime"] = np.where(apple['20d-50d'] < 0, -1, apple["Regime"])
regime_count = apple["Regime"].value_counts()##### use Regime to compute the trading signal
regime_orig = apple.ix[-1, "Regime"]
apple.ix[-1, "Regime"] = 0
apple["Signal"] = np.sign(apple["Regime"] - apple["Regime"].shift(1))
apple.ix[-1, "Regime"] = regime_orig##### build the trading signal dataframe ( it is a list shows the buy and sell operations of the strategy )
apple_signals = pd.concat([pd.DataFrame({"Price": apple.loc[apple["Signal"] == 1, "Close"],"Regime": apple.loc[apple["Signal"] == 1, "Regime"],"Signal": "Buy"}),pd.DataFrame({"Price": apple.loc[apple["Signal"] == -1, "Close"],"Regime": apple.loc[apple["Signal"] == -1, "Regime"],"Signal": "Sell"}),])
apple_signals.sort_index(inplace = True)
print(apple_signals.head())

回溯的买卖信号表单如下所示：

DateTime	Price	Regime	Signal
2010-03-15	31.977142	1.0	Buy
2010-06-11	36.215714	-1.0	Sell
2010-06-18	39.152859	1.0	Buy
2010-07-22	37.002857	-1.0	Sell
2010-08-16	35.377144	0.0	Buy
…	…	…	…

计算收益

我们称买入-卖出为一次完整的交易，那么利用回溯的买卖点表单，我们可以回测出均线交叉策略在历史数据上进行的每笔交易，以及每笔交易产生的收益。代码如下：

##### use the trading signal dataframe apple_signals to compute the profit
apple_long_profits = pd.DataFrame({"Price": apple_signals.loc[(apple_signals["Signal"] == "Buy") &apple_signals["Regime"] == 1, "Price"],"Profit": pd.Series(apple_signals["Price"] - apple_signals["Price"].shift(1)).loc[apple_signals.loc[(apple_signals["Signal"].shift(1) == "Buy") & (apple_signals["Regime"].shift(1) == 1)].index].tolist(),"End Date": apple_signals["Price"].loc[apple_signals.loc[(apple_signals["Signal"].shift(1) == "Buy") & (apple_signals["Regime"].shift(1) == 1)].index].index})
print(apple_long_profits.head())
draw_candle.pandas_candlestick_ohlc(apple, stick = 45, otherseries = ["20d", "50d"])

得到的交易清单如下：

DateTime	End Date	Price	Profit
2010-03-15	2010-06-11	31.977142	4.238572
2010-06-18	2010-07-22	39.152859	-2.150002
2010-09-20	2011-03-30	40.461430	9.342857
2011-05-12	2011-05-27	49.509998	-1.308571
2011-07-14	2011-11-17	51.110001	2.805713
…	…	…	…

这里写图片描述
进行简单统计可以看到：
苹果股票在2010年1月1日至今这段时间内
如果我们从第一天买入股票，一直持有股票，最后一天卖出，获得的收益是每股124.02美元，收益率为412%
如果按照我们的策略进行买卖，总共完成了21笔交易，收益为美股82.35美元，收益率为273%

未完待续…

我们可以很直观的看到，得到的结果有点让人哭笑不得，显然按照移动均线交叉策略得到的结果还不如不使用任何策略来的靠谱。事实真的是这样吗？
此外，我们的策略还有很多漏洞，回测的过程也有许多问题，这些都将会在后面的工作中进行完善。

完整代码

import numpy as np
import pandas as pd
import pandas_datareader.data as web
import datetime
import time
import matplotlib.pyplot as plt
import pylab
import draw_candle##### get historical data of a stock
# the first time we get the stock data from the interface of yahoo
# but we may not want to do this step all the time, so we try to save it in .csv
#start = datetime.datetime(2016, 1, 1)
#end = datetime.date.today()
#apple = web.DataReader("AAPL", "yahoo", start, end)
#print(apple.head())
#apple.to_csv(path_or_buf='data_AAPL.csv')##### read the data from csv
apple=pd.read_csv(filepath_or_buffer='data_AAPL.csv')
#print(apple.head())
# note that some format(data type) of data we read from .csv has changed
# for example the attribute 'Date' should be the index of the dataframe, and the date type changed from datetime to string
# this changes would made our methods in draw_candle.py got trouble
# So we need to make the following changes
date_list = []
for i in range(len(apple)):date_str = apple['Date'][i]t = time.strptime(date_str, "%Y-%m-%d")temp_date = datetime.datetime(t[0], t[1], t[2])date_list.append(temp_date)
apple['DateTime'] = pd.Series(date_list,apple.index)
del apple['Date']
apple = apple.set_index('DateTime')##### we can visualize the data roughly
pylab.rcParams['figure.figsize'] = (10, 6)
#apple["Adj Close"].plot(grid=True)
#draw_candle.pandas_candlestick_ohlc(apple,stick = 'month')##### compute the moving average of the history data for different time span
# use np.round to cut data to a certain accuracy
apple["20d"] = np.round(apple["Close"].rolling(window = 20, center = False).mean(), 2)
apple["50d"] = np.round(apple["Close"].rolling(window = 50, center = False).mean(), 2)
apple["100d"] = np.round(apple["Close"].rolling(window = 200, center = False).mean(), 2)
#draw_candle.pandas_candlestick_ohlc(apple.loc['2016-01-04':'2017-09-01',:],stick='week',otherseries = ["20d", "50d", "100d"])##### compute the sub of the long time span moving average and the short one
# use regime to represent it
apple['20d-50d'] = apple['20d'] - apple['50d']
apple["Regime"] = np.where(apple['20d-50d'] > 0, 1, 0)
apple["Regime"] = np.where(apple['20d-50d'] < 0, -1, apple["Regime"])
regime_count = apple["Regime"].value_counts()
#print(regime_count)
#apple.loc['2017-01-01':'2017-09-01',"Regime"].plot(ylim = (-2,2)).axhline(y = 0, color = "black", lw = 2)
#apple["Regime"].plot(ylim = (-2,2)).axhline(y = 0, color = "black", lw = 2)##### use Regime to compute the trading signal
regime_orig = apple.ix[-1, "Regime"]
apple.ix[-1, "Regime"] = 0
apple["Signal"] = np.sign(apple["Regime"] - apple["Regime"].shift(1))
apple.ix[-1, "Regime"] = regime_orig##### build the trading signal dataframe ( it is a list shows the buy and sell operations of the strategy )
# the apple_signals looks like:
#                  Price  Regime Signal
# Date
# 2010-03-15   31.977142     1.0    Buy
# 2010-06-11   36.215714    -1.0   Sell
# 2010-06-18   39.152859     1.0    Buy
# 2010-07-22   37.002857    -1.0   Sell#print(apple.loc[apple["Signal"] == 1, "Close"])
#print(apple.loc[apple["Signal"] == -1, "Close"])
apple_signals = pd.concat([pd.DataFrame({"Price": apple.loc[apple["Signal"] == 1, "Close"],"Regime": apple.loc[apple["Signal"] == 1, "Regime"],"Signal": "Buy"}),pd.DataFrame({"Price": apple.loc[apple["Signal"] == -1, "Close"],"Regime": apple.loc[apple["Signal"] == -1, "Regime"],"Signal": "Sell"}),])
apple_signals.sort_index(inplace = True)##### use the trading signal dataframe apple_signals to compute the profit
# the apple_long_profits looks like:
#               End Date       Price     Profit
# Date
# 2010-03-15  2010-06-11   31.977142   4.238572
# 2010-06-18  2010-07-22   39.152859  -2.150002
# 2010-09-20  2011-03-30   40.461430   9.342857
apple_long_profits = pd.DataFrame({"Price": apple_signals.loc[(apple_signals["Signal"] == "Buy") &apple_signals["Regime"] == 1, "Price"],"Profit": pd.Series(apple_signals["Price"] - apple_signals["Price"].shift(1)).loc[apple_signals.loc[(apple_signals["Signal"].shift(1) == "Buy") & (apple_signals["Regime"].shift(1) == 1)].index].tolist(),"End Date": apple_signals["Price"].loc[apple_signals.loc[(apple_signals["Signal"].shift(1) == "Buy") & (apple_signals["Regime"].shift(1) == 1)].index].index})
#print(apple_long_profits)
#draw_candle.pandas_candlestick_ohlc(apple, stick = 45, otherseries = ["20d", "50d"])##### take a simple analysis
# compute a rough profit (don't consider fee of the deal)
rough_profit = apple_long_profits['Profit'].sum()
print(rough_profit)# compute the profit if we don't take any operation
# (take long position at the first day and sale it on the last day of the date)
no_operation_profit = apple['Close'][-1]-apple['Close'][0]
print(no_operation_profit)plt.show()