机器学习股票_使用概率机器学习来改善您的股票交易

机器学习股票

Note from Towards Data Science’s editors: While we allow independent authors to publish articles in accordance with our rules and guidelines, we do not endorse each author’s contribution. You should not rely on an author’s works without seeking professional advice. See our Reader Terms for details.

Towards Data Science编辑的注意事项: 尽管我们允许独立作者按照我们的 规则和指南 发表文章 ,但我们不认可每位作者的贡献。 您不应在未征求专业意见的情况下依赖作者的作品。 有关 详细信息, 请参见我们的 阅读器条款

Probabilistic Machine Learning comes hand in hand with Stock Trading: Probabilistic Machine Learning uses past instances to predict probabilities of certain events happening in future instances. This can be directly applied to stock trading, to predict future stock prices.

概率机器学习与股票交易息息相关:概率机器学习使用过去的实例来预测未来实例中发生的某些事件的概率。 这可以直接应用于股票交易,以预测未来的股票价格。

这个概念: (The Concept:)

This program will use Gaussian Naive Bayes to classify data into increasing stock price, or decreasing stock price.

该程序将使用高斯朴素贝叶斯将数据分类为股票价格上涨或股票价格下跌。

Because of the volatility of the stocks, I will not be using the closing price of the stock to predict it, but rather be using the ratio between the past and current closing prices. To understand how the program works, we must first understand the underling algorithm at play:

由于股票的波动性,我将不会使用股票的收盘价来预测它,而是会使用过去和当前收盘价之间的比率。 要了解程序的工作方式,我们必须首先了解实际的基础算法:

什么是高斯朴素贝叶斯分类器? (What is Gaussian Naive Bayes Classifier?)

Gaussian Naive Bayes is an algorithm that classifies data by extrapolating data using Gaussian Distribution (identical to Normal Distribution) as well as Bayes theorem.

高斯朴素贝叶斯算法是一种算法,它通过使用高斯分布(与正态分布相同)以及贝叶斯定理外推数据来对数据进行分类。

优点: (Advantages:)

  • Works on small datasets

    适用于小型数据集

Unlike traditional neural networks in which each neuron was directly connected to every other neuron, the probabilities are assumed to be independent.

与传统的神经网络不同,在传统的神经网络中,每个神经元都直接与每个其他神经元相连,因此概率被认为是独立的。

  • Not computationally intensive

    不需要大量计算

Since the Naive Bayes Classifier is deterministic, The parameters for the Naive Bayes Classifier does not change every iteration, unlike the weights that power a Neural Network. This makes the algorithm much less computationally intensive.

由于朴素贝叶斯分类器是确定性的,因此与朴素的神经网络权重不同,朴素贝叶斯分类器的参数不会每次迭代都更改。 这使算法的计算强度大大降低。

缺点: (Disadvantages:)

  • Fails at learning Big Data

    学习大数据失败

The complex mapping of a Neural Network outmatches the simple architecture of the Naive Bayes Algorithm when the data is enough to optimize all the parameters.

当数据足以优化所有参数时,神经网络的复杂映射将不满足朴素贝叶斯算法的简单体系结构。

代码: (The Code:)

With a better understanding of how the Gaussian Naive Bayes algorithm works, let’s get to the program:

更好地了解了高斯朴素贝叶斯算法的工作原理,让我们进入程序:

步骤1 | 先决条件: (Step 1| Prerequisites:)

import yfinance
from scipy import statsaapl = yfinance.download('AAPL','2016-1-1','2020-1-1')

These are the two libraries that I will use for the project: yfinance is for downloading stock data and scipy is to create gaussian distributions.

这是我将用于该项目的两个库:yfinance用于下载股票数据,scipy用于创建高斯分布。

I downloaded Apple stock data, from 2016 to 2020, for reproducible results.

我下载了2016年至2020年的Apple股票数据,以获得可重复的结果。

Step 2| Converting to Gaussian Distributions:

步骤2 | 转换为高斯分布:

def calculate_prereq(values):
std = np.std(values)
mean = np.mean(values)
return std,meandef calculate_distribution(mean,std):
norm = stats.norm(mean, std)
return normdef extrapolate(norm,x):
return norm.pdf(x)def values_to_norm(dicts):
for dictionary in dicts:
for term in dictionary:
std,mean = calculate_prereq(dictionary[term])
norm = calculate_distribution(mean,std)
dictionary[term] = norm
return dicts

The “calculate_prereq” function helps to calculate the standard deviation and the mean: The two things needed to create a Gaussian distribution.

“ calculate_prereq”函数有助于计算标准偏差和均值:创建高斯分布所需的两件事。

I would make the function to create a Gaussian distribution from scratch, but scipy’s functions have been highly optimized and would therefore work better on datasets with more features.

我将使用该函数从头开始创建高斯分布,但是scipy的函数已经过高度优化,因此可以在具有更多特征的数据集上更好地工作。

Gaussian distributions are approximations of general probabilistic data. Take the example of the IQ test spectrum. Most people have an average IQ score of 100. Therefore, the peak of the Gaussian distribution would be at 100. On both ends of the spectrum, the number of people getting extremely low and extremely high scores decrease as the scores become more extreme. With a Gaussian distribution, one can extrapolate a probability of a person getting a certain value and therefore gain insight on it.

高斯分布是一般概率数据的近似值。 以IQ测试频谱为例。 大多数人的平均智商得分为100。因此,高斯分布的峰值将为100。在光谱的两端,得分变得越来越低,变得越来越低的人数也越来越少。 使用高斯分布,可以推断一个人获得某个价值的概率,从而获得对价值的洞察力。

步骤3 | 比较可能性: (Step 3| Compare Possibilities:)

def compare_possibilities(dicts,x):
probabilities = []
for dictionary in dicts:
dict_probs = []
for i in range(len(x)):
value = x[i]
dict_probs.append(extrapolate(dictionary[i],value))
probabilities.append(np.prod(dict_probs))
return probabilities.index(max(probabilities))

This function simply runs through the dictionaries (the different classes) and calculates the probability of the price increasing or dropping, given the ratios between the price of the last ten days. It then returns an index in the list of dictionaries the class that the Bayes Classifier calculates to have the highest probability.

该函数仅遍历字典(不同类别),并根据最近十天价格之间的比率来计算价格上涨或下跌的概率。 然后,它返回字典列表中的索引,该字典是贝叶斯分类器计算出的具有最高概率的类。

步骤4 | 运行程序: (Step 4| Run the Program:)

drop = {}
increase = {}
for day in range(10,len(aapl)-1):
previous_close = aapl['Close'][day-10:day]
ratios = []
for i in range(1,len(previous_close)):
ratios.append(previous_close[i]/previous_close[i-1])
if aapl['Close'][day+1] > aapl['Close'][day]:
for i in range(len(ratios)):
if i in increase:
increase[i] += (ratios[i],)
else:
increase[i] = ()
elif aapl['Close'][day+1] < aapl['Close'][day]:
for i in range(len(ratios)):
if i in drop:
drop[i] += (ratios[i],)
else:
drop[i] = ()
new_close = aapl['Close'][-11:-1]
ratios = []
for i in range(1,len(new_close)):
ratios.append(new_close[i]/new_close[i-1])
for i in range(len(ratios)):
if i in increase:
increase[i] += (ratios[i],)
else:
increase[i] = ()
X = ratios
print(X)
dicts = [increase,drop]
dicts = values_to_norm(dicts)
compare_possibilities(dicts,X)

This last part runs all the functions together, and gathers the 9 ratios for the stock price in the last 10 days. It then executes the program and returns if the price will increase, or drop. The value it returns is the index of the dictionary in the list dicts. If it is 1, the price is predicted to drop. If it is 0, the price is predicted to increase.

最后一部分将所有功能运行在一起,并收集了最近10天股票价格的9个比率。 然后,它执行程序并返回价格是否上涨或下跌。 它返回的值是列表字典中字典的索引。 如果为1,则价格预计会下降。 如果为0,则预计价格会上涨。

结论: (Conclusion:)

This program is just the basic framework of a Gaussian Naive Bayes algorithm. Here are a few ways that you can improve my program:

该程序只是高斯朴素贝叶斯算法的基本框架。 您可以通过以下几种方法来改进我的程序:

  • Increase the number of features

    增加功能数量

You can include features such as volume and opening price, to increase the scope of the data. However, an overload of data could cause Gaussian Naive Bayes to be less effective, as it does not perform well with big data.

您可以包括数量和开盘价之类的功能,以扩大数据范围。 但是,数据过载可能会导致高斯朴素贝叶斯效率降低,因为它在大数据方面表现不佳。

  • Link to Alpaca API

    链接到Alpaca API

The alpaca API is a great platform to test trading strategies. Try linking this program to make buy or sell trades, based on the predictions of the model!

羊驼API是测试交易策略的绝佳平台。 根据模型的预测,尝试链接此程序以进行买卖交易!

翻译自: https://medium.com/analytics-vidhya/using-probabilistic-machine-learning-to-improve-your-stock-trading-b40782f3710d

机器学习股票

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389552.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

BZOJ 2818 Gcd

传送门 题解&#xff1a;设p为素数 &#xff0c;则gcd(x/p,y/p)1也就是说求 x&#xff0f;p以及 y&#xff0f;p的欧拉函数。欧拉筛前缀和就可以解决 #include <iostream> #include <cstdio> #include <cmath> #include <algorithm> #include <map&…

LeetCode387-字符串中的第一个唯一字符(查找,自定义数据结构)

一开始想用HashMap&#xff0c;把每个字符放进去&#xff0c;然后统计出现的次数。 使用LinkedHashMap的话&#xff0c;键值对的顺序都是不会变的。 LinkedHashMap<Character,Integer> map new LinkedHashMap<>();map.put(i,1111);map.put(j,2222);map.put(k,3333…

r psm倾向性匹配_南瓜香料指标psm如何规划季节性广告

r psm倾向性匹配Retail managers have been facing an extraordinary time with the COVID-19 pandemic. But the typical plans to prepare for seasonal sales will be a new challenge. More seasonal products have been introduced over the years, making August the bes…

主成分分析:PCA的思想及鸢尾花实例实现

主成份分析算法PCA 非监督学习算法 PCA的实现&#xff1a; 简单来说&#xff0c;就是将数据从原始的空间中转换到新的特征空间中&#xff0c;例如原始的空间是三维的(x,y,z)&#xff0c;x、y、z分别是原始空间的三个基&#xff0c;我们可以通过某种方法&#xff0c;用新的坐…

两家大型网贷平台竟在借款人审核问题上“偷懒”?

python信用评分卡&#xff08;附代码&#xff0c;博主录制&#xff09; https://study.163.com/course/introduction.htm?courseId1005214003&utm_campaigncommission&utm_sourcecp-400000000398149&utm_mediumshare 放贷流量增加&#xff0c;逾期率也会随之增加&…

解决 Alfred 每次开机都提示请求通讯录权限的问题

安装完 Alfred 以后&#xff0c;每次开机都会提示请求通讯录权限&#xff0c;把设置里的通讯录关掉也没用&#xff0c;每次都提示又非常烦人&#xff0c;这里把解决方法记录一下。 依次打开 应用程序 - Alfred 3.app - 右键显示包内容 - Contents - Frameworks - Alfred Framew…

【转】DCOM远程调用权限设置

原文&#xff1a;https://blog.csdn.net/ervinsas/article/details/36424127 最近几天被搞得焦头烂额&#xff0c;由于DCOM客户端程序是在32位系统下开发的&#xff0c;调试时DCOM服务端也是安装在同一台机器上&#xff0c;所有过程一直还算顺利。可这次项目实施的时候&#xf…

opencv:边缘检测之Laplacian算子思想及实现

Laplacian算子边缘检测的来源 在边缘部分求取一阶导数&#xff0c;你会看到极值的出现&#xff1a; 如果在边缘部分求二阶导数会出现什么情况? 从上例中我们可以推论检测边缘可以通过定位梯度值大于邻域的相素的方法找到(或者推广到大 于一个阀值). 从以上分析中&#xff0c…

使用机器学习预测天气_如何使用机器学习预测着陆

使用机器学习预测天气Based on every NFL play from 2009–2017根据2009-2017年每场NFL比赛 Ah, yes. The times, they are changin’. The leaves are beginning to fall, the weather is slowly starting to cool down (unless you’re where I’m at in LA, where it’s on…

laravel 导出插件

转发&#xff1a;https://blog.csdn.net/gu_wen_jie/article/details/79296470 版本&#xff1a;laravel5 php 5.6 安装步骤&#xff1a; 一、安装插件 ①、首先在Laravel项目根目录下使用Composer安装依赖&#xff1a; composer require "maatwebsite/excel:~2.1.0"…

国外 广告牌_广告牌下一首流行歌曲的分析和预测,第1部分

国外 广告牌Using Spotify and Billboard’s data to understand what makes a song a hit.使用Spotify和Billboard的数据来了解歌曲的流行。 Thousands of songs are released every year around the world. Some are very successful in the music industry; others less so…

Jmeter测试普通java类说明

概述 Apache JMeter是Apache组织开发的基于Java的压力测试工具。本文档主要描述用Jmeter工具对基于Dubbo、Zookeeper框架的Cassandra接口、区块链接口进行压力测试的一些说明&#xff0c;为以后类似接口的测试提供参考。 环境部署 1、 下载Jmeter工具apache-jmeter-3.3.zip&am…

opencv:Canny边缘检测算法思想及实现

Canny边缘检测算法背景 求边缘幅度的算法&#xff1a; 一阶导数&#xff1a;sobel、Roberts、prewitt等算子 二阶导数&#xff1a;Laplacian、Canny算子 Canny算子效果比其他的都要好&#xff0c;但是实现起来有点麻烦 Canny边缘检测算法的优势&#xff1a; Canny是目前最优…

关于outlook签名图片大小的说明

96 dpiwidth576 height114转载于:https://blog.51cto.com/lch54734/2298115

opencv:畸变矫正:透视变换算法的思想与实现

畸变矫正 注意&#xff1a;虽然能够成功矫正但是也会损失了部分图像&#xff01; 透视变换(Perspective Transformation) 概念&#xff1a; 透视变换是将图片投影到一个新的视平面(Viewing Plane)&#xff0c;也称作投影映射(Projective Mapping)。 我们常说的仿射变换是透视…

数据多重共线性_多重共线性对您的数据科学项目的影响比您所知道的要多

数据多重共线性Multicollinearity is likely far down on a mental list of things to check for, if it is on a list at all. This does, however, appear almost always in real-life datasets, and it’s important to be aware of how to address it.多重共线性可能根本不…

PHP工厂模式计算面积与周长

<?phpinterface InterfaceShape{ function getArea(); function getCircumference();}/** * 矩形 */class Rectangle implements InterfaceShape{ private $width; private $height; public function __construct($width,$height){ $this->width$…

K-Means聚类算法思想及实现

K-Means聚类概念&#xff1a; K-Means聚类是最常用的聚类算法&#xff0c;最初起源于信号处理&#xff0c;其目标是将数据点划分为K个类簇&#xff0c; 找到每个簇的中心并使其度量最小化。 该算法的最大优点是简单、便于理解&#xff0c;运算速度较快&#xff0c;缺点是只能应…

(2.1)DDL增强功能-数据类型、同义词、分区表

1.数据类型 &#xff08;1&#xff09;常用数据类型  1.整数类型 int 存储范围是-2,147,483,648到2,147,483,647之间的整数&#xff0c;主键列常设置此类型。 &#xff08;每个数值占用 4字节&#xff09; smallint 存储范围是-32,768 到 32,767 之间的整数&#xff0c;用…

充分利用昂贵的分析

By Noor Malik努尔马利克(Noor Malik) Let’s say you write a query in Deephaven which performs a lengthy and expensive analysis, resulting in a live table. For example, in a previous project, I wrote a query which pulled data from an RSS feed to create a li…