机器学习股票
Note from Towards Data Science’s editors: While we allow independent authors to publish articles in accordance with our rules and guidelines, we do not endorse each author’s contribution. You should not rely on an author’s works without seeking professional advice. See our Reader Terms for details.
Towards Data Science编辑的注意事项: 尽管我们允许独立作者按照我们的 规则和指南 发表文章 ,但我们不认可每位作者的贡献。 您不应在未征求专业意见的情况下依赖作者的作品。 有关 详细信息, 请参见我们的 阅读器条款 。
Probabilistic Machine Learning comes hand in hand with Stock Trading: Probabilistic Machine Learning uses past instances to predict probabilities of certain events happening in future instances. This can be directly applied to stock trading, to predict future stock prices.
概率机器学习与股票交易息息相关:概率机器学习使用过去的实例来预测未来实例中发生的某些事件的概率。 这可以直接应用于股票交易,以预测未来的股票价格。
这个概念: (The Concept:)
This program will use Gaussian Naive Bayes to classify data into increasing stock price, or decreasing stock price.
该程序将使用高斯朴素贝叶斯将数据分类为股票价格上涨或股票价格下跌。
Because of the volatility of the stocks, I will not be using the closing price of the stock to predict it, but rather be using the ratio between the past and current closing prices. To understand how the program works, we must first understand the underling algorithm at play:
由于股票的波动性,我将不会使用股票的收盘价来预测它,而是会使用过去和当前收盘价之间的比率。 要了解程序的工作方式,我们必须首先了解实际的基础算法:
什么是高斯朴素贝叶斯分类器? (What is Gaussian Naive Bayes Classifier?)
Gaussian Naive Bayes is an algorithm that classifies data by extrapolating data using Gaussian Distribution (identical to Normal Distribution) as well as Bayes theorem.
高斯朴素贝叶斯算法是一种算法,它通过使用高斯分布(与正态分布相同)以及贝叶斯定理外推数据来对数据进行分类。
优点: (Advantages:)
- Works on small datasets 适用于小型数据集
Unlike traditional neural networks in which each neuron was directly connected to every other neuron, the probabilities are assumed to be independent.
与传统的神经网络不同,在传统的神经网络中,每个神经元都直接与每个其他神经元相连,因此概率被认为是独立的。
- Not computationally intensive 不需要大量计算
Since the Naive Bayes Classifier is deterministic, The parameters for the Naive Bayes Classifier does not change every iteration, unlike the weights that power a Neural Network. This makes the algorithm much less computationally intensive.
由于朴素贝叶斯分类器是确定性的,因此与朴素的神经网络权重不同,朴素贝叶斯分类器的参数不会每次迭代都更改。 这使算法的计算强度大大降低。
缺点: (Disadvantages:)
- Fails at learning Big Data 学习大数据失败
The complex mapping of a Neural Network outmatches the simple architecture of the Naive Bayes Algorithm when the data is enough to optimize all the parameters.
当数据足以优化所有参数时,神经网络的复杂映射将不满足朴素贝叶斯算法的简单体系结构。
代码: (The Code:)
With a better understanding of how the Gaussian Naive Bayes algorithm works, let’s get to the program:
更好地了解了高斯朴素贝叶斯算法的工作原理,让我们进入程序:
步骤1 | 先决条件: (Step 1| Prerequisites:)
import yfinance
from scipy import statsaapl = yfinance.download('AAPL','2016-1-1','2020-1-1')
These are the two libraries that I will use for the project: yfinance is for downloading stock data and scipy is to create gaussian distributions.
这是我将用于该项目的两个库:yfinance用于下载股票数据,scipy用于创建高斯分布。
I downloaded Apple stock data, from 2016 to 2020, for reproducible results.
我下载了2016年至2020年的Apple股票数据,以获得可重复的结果。
Step 2| Converting to Gaussian Distributions:
步骤2 | 转换为高斯分布:
def calculate_prereq(values):
std = np.std(values)
mean = np.mean(values)
return std,meandef calculate_distribution(mean,std):
norm = stats.norm(mean, std)
return normdef extrapolate(norm,x):
return norm.pdf(x)def values_to_norm(dicts):
for dictionary in dicts:
for term in dictionary:
std,mean = calculate_prereq(dictionary[term])
norm = calculate_distribution(mean,std)
dictionary[term] = norm
return dicts
The “calculate_prereq” function helps to calculate the standard deviation and the mean: The two things needed to create a Gaussian distribution.
“ calculate_prereq”函数有助于计算标准偏差和均值:创建高斯分布所需的两件事。
I would make the function to create a Gaussian distribution from scratch, but scipy’s functions have been highly optimized and would therefore work better on datasets with more features.
我将使用该函数从头开始创建高斯分布,但是scipy的函数已经过高度优化,因此可以在具有更多特征的数据集上更好地工作。
Gaussian distributions are approximations of general probabilistic data. Take the example of the IQ test spectrum. Most people have an average IQ score of 100. Therefore, the peak of the Gaussian distribution would be at 100. On both ends of the spectrum, the number of people getting extremely low and extremely high scores decrease as the scores become more extreme. With a Gaussian distribution, one can extrapolate a probability of a person getting a certain value and therefore gain insight on it.
高斯分布是一般概率数据的近似值。 以IQ测试频谱为例。 大多数人的平均智商得分为100。因此,高斯分布的峰值将为100。在光谱的两端,得分变得越来越低,变得越来越低的人数也越来越少。 使用高斯分布,可以推断一个人获得某个价值的概率,从而获得对价值的洞察力。
步骤3 | 比较可能性: (Step 3| Compare Possibilities:)
def compare_possibilities(dicts,x):
probabilities = []
for dictionary in dicts:
dict_probs = []
for i in range(len(x)):
value = x[i]
dict_probs.append(extrapolate(dictionary[i],value))
probabilities.append(np.prod(dict_probs))
return probabilities.index(max(probabilities))
This function simply runs through the dictionaries (the different classes) and calculates the probability of the price increasing or dropping, given the ratios between the price of the last ten days. It then returns an index in the list of dictionaries the class that the Bayes Classifier calculates to have the highest probability.
该函数仅遍历字典(不同类别),并根据最近十天价格之间的比率来计算价格上涨或下跌的概率。 然后,它返回字典列表中的索引,该字典是贝叶斯分类器计算出的具有最高概率的类。
步骤4 | 运行程序: (Step 4| Run the Program:)
drop = {}
increase = {}
for day in range(10,len(aapl)-1):
previous_close = aapl['Close'][day-10:day]
ratios = []
for i in range(1,len(previous_close)):
ratios.append(previous_close[i]/previous_close[i-1])
if aapl['Close'][day+1] > aapl['Close'][day]:
for i in range(len(ratios)):
if i in increase:
increase[i] += (ratios[i],)
else:
increase[i] = ()
elif aapl['Close'][day+1] < aapl['Close'][day]:
for i in range(len(ratios)):
if i in drop:
drop[i] += (ratios[i],)
else:
drop[i] = ()
new_close = aapl['Close'][-11:-1]
ratios = []
for i in range(1,len(new_close)):
ratios.append(new_close[i]/new_close[i-1])
for i in range(len(ratios)):
if i in increase:
increase[i] += (ratios[i],)
else:
increase[i] = ()
X = ratios
print(X)
dicts = [increase,drop]
dicts = values_to_norm(dicts)
compare_possibilities(dicts,X)
This last part runs all the functions together, and gathers the 9 ratios for the stock price in the last 10 days. It then executes the program and returns if the price will increase, or drop. The value it returns is the index of the dictionary in the list dicts. If it is 1, the price is predicted to drop. If it is 0, the price is predicted to increase.
最后一部分将所有功能运行在一起,并收集了最近10天股票价格的9个比率。 然后,它执行程序并返回价格是否上涨或下跌。 它返回的值是列表字典中字典的索引。 如果为1,则价格预计会下降。 如果为0,则预计价格会上涨。
结论: (Conclusion:)
This program is just the basic framework of a Gaussian Naive Bayes algorithm. Here are a few ways that you can improve my program:
该程序只是高斯朴素贝叶斯算法的基本框架。 您可以通过以下几种方法来改进我的程序:
- Increase the number of features 增加功能数量
You can include features such as volume and opening price, to increase the scope of the data. However, an overload of data could cause Gaussian Naive Bayes to be less effective, as it does not perform well with big data.
您可以包括数量和开盘价之类的功能,以扩大数据范围。 但是,数据过载可能会导致高斯朴素贝叶斯效率降低,因为它在大数据方面表现不佳。
- Link to Alpaca API 链接到Alpaca API
The alpaca API is a great platform to test trading strategies. Try linking this program to make buy or sell trades, based on the predictions of the model!
羊驼API是测试交易策略的绝佳平台。 根据模型的预测,尝试链接此程序以进行买卖交易!
翻译自: https://medium.com/analytics-vidhya/using-probabilistic-machine-learning-to-improve-your-stock-trading-b40782f3710d
机器学习股票
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389552.shtml
如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!