如何击败腾讯_击败股市

如何击败腾讯

个人项目 (Personal Proyects)

Note from Towards Data Science’s editors: While we allow independent authors to publish articles in accordance with our rules and guidelines, we do not endorse each author’s contribution. You should not rely on an author’s works without seeking professional advice. See our Reader Terms for details.

Towards Data Science编辑的注意事项: 尽管我们允许独立作者按照我们的 规则和指南 发表文章 ,但我们不认可每位作者的贡献。 您不应在未征求专业意见的情况下依赖作者的作品。 有关 详细信息, 请参见我们的 阅读器条款

摘要 (Summary)

This is a personal project in which I have tried to develop a trading application using machine learning tools. Starting with data modelling along with a categorisation based on distribution and machine learning techniques, I have developed a trading strategy for beginner investors to generate low-risk profit with the help of this application.

这是一个个人项目,我尝试使用机器学习工具开发交易应用程序。 从数据建模以及基于分布和机器学习技术的分类开始,我已经开发了一种交易策略,供初学者使用此应用程序产生低风险利润。

介绍 (Introduction)

The market analysis is both interesting and complex as it can be seen in the following link [1]. Nevertheless, there are several works carried out with machine-learning which try to shed light on this field.

市场分析既有趣又复杂,因为可以在以下链接中看到[1] 。 然而,机器学习方面进行了一些工作,试图为这一领域提供启示。

In this piece of work, I have created an application consisting of two main points:

在本文中,我创建了一个包含两个要点的应用程序:

  1. A screen where stock market index may be analysed over different temporal horizons. Here it can be found a candlestick chart; a chart to analyse technical indicators [2]; a line chart which shows the percentage of price change between days, as well as a box-plot representing this last chart in order to understand that distribution.

    一个可以在不同时间范围内分析股市指数的屏幕。 在这里可以找到一个烛台图分析技术指标的图表[2] ; 线形图显示了两天之间价格变化的百分比,以及表示最后一个图的箱形图,以了解这种分布。

  2. A screen where the analysis of the trading strategy which I have developed (Strategyone) can be done. This strategy is divided into two different parts: the first one consisting of the prediction of stock market index movements by means of machine learning, whereas the second one involves the comparison between the current data vectors prediction and what happened in the past. The chosen temporal horizons range from 7, 14, 21 to 28 days.

    可以在此屏幕上分析我开发的交易策略( Strategyone )。 该策略分为两个不同的部分:第一个部分包括通过机器学习预测股市指数运动,而第二个则包括当前数据向量预测与过去发生的情况之间的比较 。 所选的时间范围为7、14、21至28天。

This last section is explained thoroughly in “How to beat the market” and “Trading strategy”

最后一部分在“如何战胜市场”和“交易策略”中进行了详细说明。

Data has been obtained through the Alpha Vantage API [3], while a list of the stock market index from the Finnhub API [4].

数据是通过Alpha Vantage API [3]获得的 ,而股票市场指数则是从Finnhub API [4]获得的 。

语境 (Context)

As a physicist I have been always fascinated by the complex systems world: how certain formulae can be applied to and have interesting results either to biological systems or financial ones, as well as to a group of several electrons.

作为物理学家,我一直着迷于复杂的系统世界:如何将某些公式应用于生物系统或金融系统以及一组多个电子,并对它们产生有趣的结果。

Likewise, how the individual study of an element of the system might result into a different behaviour when it is studied within the system.

同样,当在系统中进行研究时,对系统元素的个别研究可能如何导致不同的行为。

Consequently, this project emerges from the curiosity about the stock market in addition to the software and intellectual challenge that implies to understand such a complex system as the market is.

因此,除了对理解市场这样一个复杂系统的软件和知识挑战之外,该项目还来自对股票市场的好奇心。

The project has gone through three stages: the first version of this work was developed as the final thesis of the Master’s Degree in Data Science which I attended in [5], and whose aim was only the creation of classification model which could predict the future of an stock in the market using machine learning. The second version was designed externally to the Master’s and it tried to improve the first one. Finally, the third version is the one here discussed, and it offers a significant improvement, the development of a trading strategy.

该项目经历了三个阶段:这项工作的第一个版本是我在[5]中参加的数据科学硕士学位的最终论文,其目的仅仅是建立可以预测未来的分类模型。使用机器学习来分析市场中的股票 第二个版本是在硕士课程外部设计的,它试图改进第一个版本。 最后,这里讨论的是第三个版本,它提供了显着的改进,即交易策略的发展。

如何打败市场 (How to beat the market)

In order to use a classification model to predict market movements, I needed to categorise the data. These prediction categories have been called “Strong bull”, predictions in which the price increase is significant; “Bull”, when there is a price increase; “Keep”, the price remains the same; “Bear”, a decrease on the price, and “Strong bear”, the price decrease is significant [6].

为了使用分类模型来预测市场走势,我需要对数据进行分类。 这些预测类别被称为“强牛”,即价格上涨显着的预测。 “牛”,当价格上涨时; “保持”,价格保持不变; 价格下降的“熊市”和价格下跌的“强熊市”很明显[6] 。

How are the stock market index categories chosen?

如何选择股市指数类别?

This have done through the distribution of percentage variation in the stock price. As our aim is predicting the future, in the registers, the percentage variation column needs the daily information about how the price varies in relation to the temporal horizon that we want to predict.

这是通过分配股票价格的百分比变化来实现的 。 由于我们的目标是预测未来,因此在价格记录中,百分比变化列需要有关价格如何相对于我们要预测的时间范围变化每日信息

Therefore, the variation percentage to be categorised is compared to the last 4-month distribution, and one of the categories abovementioned will be selected based on the range of the percentiles in relation to that distribution.

因此,将要分类的变化百分比与最近4个月的分布进行比较,并且将基于相对于该分布的百分位数范围选择上述类别之一。

In this way, we could categorise all the data given a temporal horizon, and this will always be about the future.

这样,我们可以在时间范围内对所有数据进行分类,而这将永远与未来有关。

Once the categorisation is done, the next step was getting to know which the best way to apply an algorithm of classification with more precision is. After a number of trials and different ideas, the selected process was scaling the data by means of the robust scaler technique and Random Forest as classification algorithm. These were the chosen ones since they provide an average higher precision upon all the categories.

分类完成后,下一步就是知道哪种方法更准确地应用分类算法。 经过大量的试验和不同的想法,选择的过程是通过健壮的缩放器技术和随机森林作为分类算法来缩放数据。 选择它们是因为它们在所有类别上均提供了平均较高的精度。

Only following these steps, we can obtain a model which is able to predict “Strong bull” with a 40 % level of accuracy.

仅按照这些步骤,我们就可以得到能够以40%的准确度预测“强牛”的模型

交易策略 (Trading Strategy)

The trading strategy will be based on what happened in the past and on the idea that we guess correctly provided that we win, omitting that in order to win we must also guess the right predicted category.

交易策略将基于过去发生的情况以及我们能够正确猜出的想法(前提是我们获胜),而忽略了为了获胜,我们还必须猜出正确的预测类别

That is, if the prediction is “Bull”, we carry out a long position operation and the resulting outcome is actually “Strong bull”, our prediction will be considered as accurate. Likewise, if we predict “Strong bull” and the result is “Bull” or when the prediction is “Strong bear”, we carry out a short position movement and the outcome achieved is “Bear” and the other way round.

也就是说,如果预测为“牛”,我们执行多头头寸操作,而结果实际上为“强牛”,我们的预测将被认为是准确的。 同样,如果我们预测“强牛”而结果为“牛”,或者当预测为“强熊”时,我们进行空头头寸移动并且获得的结果为“熊”,反之亦然。

If none of the abovementioned cases take place, the operation will be considered as a fail.

如果上述情况均未发生,则该操作将被视为失败。

Having this in mind, the strategy will only consist of long position operation and when the model predicts “Strong bull” given that it is the category with higher accuracy from the classification model.

考虑到这一点, 该策略将仅包括多头头寸操作,并且当模型预测“强牛”时该策略将被认为是分类模型中具有较高准确性的类别。

How does the strategy work?

该策略如何运作?

Once the robust scaler is applied to all the registers, the category is predicted and the actual categorisation, a PCA is applied to reduce the number of dimensions to 4 maintaining the 95 % of data variability. Therefore, we have other 4 variables together with the prediction linked to the register and its actual category. How the variables are can be known when something is predicted in relation to the real category, so we arrange the prediction and the category, and we calculate the median associated to each profile curve to understand how to describe each one.

一旦将鲁棒缩放器应用于所有寄存器,预测了类别并进行了实际分类, 便会应用PCA将维数减少到4,以保持95%的数据可变性。 因此,我们还有其他4个变量以及链接到寄存器及其实际类别的预测。 当预测与真实类别有关的某物时可以知道变量的方式 ,因此我们安排了预测和类别,并计算了与每个轮廓曲线相关的中位数以了解如何描述每个轮廓。

As a result, we will have described the variables in which “Strong bull” is predicted” and the actual outcome was “Strong bull” or any other category.

结果,我们将描述“强牛”被预测的变量,而实际结果是“强牛”或任何其他类别。

All of this will be limited to the last 6-month-data in relation to the prediction day in order to avoid the influence of an old market state on the strategy. The results obtained are summarised below:

所有这些都将仅限于与预测日相关的最后6个月的数据,以避免旧市场状况对策略的影响。 获得的结果总结如下:

Image for post
Description of the variables for each prediction-category after the PCA.
PCA之后每个预测类别的变量说明。

The interpretation of this table is that in the last 6 months before the prediction of “Strong bull” and the category was guess correctly, the variables of the main components had as the median.

该表的解释是,在“强牛”的预测出现之前的最近6个月中,该类别被正确猜出,主要成分的变量为中位数。

Consequently, in order to carry out a operation, we must apply the data of the day in which we are doing the prediction a robust scaler and a PCA,

因此,为了执行操作,我们必须应用进行预测的当天的数据,鲁棒的缩放器和PCA,

If the prediction obtained is “Strong bull”, we will have reached the first step to carry out the operation. The second step is checking which profile of the previous curves is more similar to the data that is being predicted. This will be done using the cosine similarity which will allow us to observe the more similar vector to the data. If it corresponds to “Strong bull-strong bull”, we will have the key to perform a safer operation.

如果获得的预测是“坚强的公牛”,我们将到达执行该操作的第一步。 第二步是检查先前曲线的轮廓与正在预测的数据更相似。 这将使用余弦相似度完成,这将使我们能够观察到与数据更相似的向量。 如果它对应于“强牛-强牛”,我们将拥有执行更安全操作的钥匙。

Following this trading strategy, we will obtain almost a 50 % level of accuracy, but, as it was mentioned at the beginning, guessing correctly does not imply guessing the category too.

按照这种交易策略,我们将获得几乎50%的准确度 ,但是,正如开头提到的那样,正确猜测并不意味着也猜测类别。

Guessing correctly does not imply guessing the category too

正确猜测并不意味着也猜测类别

Under our circumstances, a correct guessing will be also the prediction of “Strong bull” and obtaining “Bull” as a final result. The strategy level of accuracy reaches 58 % when this is taken into account.

在我们的情况下,正确的猜测也将是对“强牛”的预测,并最终获得“牛”。 考虑到这一点,策略的准确性达到58%。

结论 (Conclusion)

The aim of this piece of work was the development of a strategy which allows a beginner investor to carry out to generate low-risk profit without suffering a total loss. As I have mentioned, the strategy ensures a 58 % level of accuracy under the described conditions, but, on a personal note, it is not a strategy to be implemented automatically because the error level assumed raises up to 40%.

这项工作的目的是开发一种策略,该策略允许初学者投资者开展活动以产生低风险的利润而不会造成总损失。 正如我已经提到的那样,该策略可确保在所描述的条件下达到58%的准确度,但是,就个人而言,由于假定的错误级别会上升到40%,因此这不是自动实施的策略。

However, it is interesting to see how a level of accuracy over 50 % is obtained in the performed operations, following a strategy based only on data and with a limited and minimal knowledge about the stock market.

然而,有趣的是,遵循仅基于数据且对股票市场的了解有限且很少的策略,在执行的操作中如何获得超过50%的准确度。

All the project code can be read on: GitHub/esan94/bsm03

所有项目代码都可以在GitHub / esan94 / bsm03上阅读

后续步骤 (Following Steps)

The possible next steps to improve might include:

可能需要改进的后续步骤可能包括:

  • The change of the data model.

    数据模型的变化。
  • The improvement of the classification algorithm.

    分类算法的改进。
  • The addition to the project of more knowledge about the stock market.

    除了该项目以外,还拥有有关股票市场的更多知识。
  • The assignation of value to the main components to apply the cosine similarity.

    将值分配给主要成分以应用余弦相似度。

资源资源 (Resources)

  • [1] https://en.wikipedia.org/wiki/Efficient-market_hypothesis

    [1] https://en.wikipedia.org/wiki/Efficient-market_hypothesis

  • [2] https://www.investopedia.com/technical-analysis-4689657

    [2] https://www.investopedia.com/technical-analysis-4689657

  • [3] https://www.alphavantage.co/

    [3] https://www.alphavantage.co/

  • [4] https://finnhub.io/

    [4] https://finnhub.io/

  • [5] https://kschool.com/

    [5] https://kschool.com/

  • [6] https://www.investopedia.com/insights/digging-deeper-bull-and-bear-markets/

    [6] https://www.investopedia.com/insights/digging-deeper-bull-and-bear-markets/

You can follow me on LinkedIn, GitHub o Medium.

您可以在LinkedIn , GitHub或 o 中关注我。

Translation made by Paloma Sánchez Narváez.

翻译由PalomaSánchezNarváez撰写 。

翻译自: https://towardsdatascience.com/beating-stock-market-8b33c5afb633

如何击败腾讯

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/392181.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

配置静态IPV6 NAT-PT

一.概述: IPV6 NAT-PT( Network Address Translation - Port Translation)应用与ipv4和ipv6网络互访的情况,根据参考链接配置时出现一些问题,所以记录下来。参考链接:http://www.cisco.com/en/US/tech/tk648/tk361/technologies_c…

python3虚拟环境中解决 ModuleNotFoundError: No module named '_ssl'

前提是已经安装了openssl 问题 当我在python3虚拟环境中导入ssl模块时报错,报错如下: (py3) [rootlocalhost Python-3.6.3]# python3 Python 3.6.3 (default, Nov 19 2018, 14:18:18) [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux Type "help…

通过Xshell登录远程服务器实时查看log日志

主要想总结以下几点: 1.如何使用生成密钥的方式来登录Xshell连接远端服务器 2.在远程服务器上如何上传和下载文件(下载log文件到本地) 3.如何实时查看log,提取错误信息 一. 使用生成密钥的方式来登录Xshell连接远端服务器 ssh登录…

如何将Jupyter Notebook连接到远程Spark集群并每天运行Spark作业?

As a data scientist, you are developing notebooks that process large data that does not fit in your laptop using Spark. What would you do? This is not a trivial problem.作为数据科学家,您正在开发使用Spark处理笔记本电脑无法容纳的大数据的笔记本电脑…

是银弹吗?业务基线方法论

Fred.Brooks在1987年就提出:没有银弹。没有任何一项技术或方法可以能让软件工程的生产力在十年内提高十倍。 我无意挑战这个理论,只想讨论一个方案,一个可能大幅提高业务系统开发效率的方案。 方案描述 我管这个方案叫做“由基线扩展…

同一服务器部署多个tomcat时的端口号修改详情

2019独角兽企业重金招聘Python工程师标准>>> 同一服务器部署多个tomcat时&#xff0c;存在端口号冲突的问题&#xff0c;所以需要修改tomcat配置文件server.xml&#xff0c;以tomcat7为例。 首先了解下tomcat的几个主要端口&#xff1a;<Connector port"808…

第一章-从双向链表学习设计

链表学习链表是一种动态的数据结构使用节点作为链表的基本单位存储在节点包括数据元素和节点指针一个完整的数据链表应包括转载于:https://www.cnblogs.com/cjxltd/p/7125747.html

思维导图分析http之http协议版本

1.结构总览 在http协议这一章&#xff0c;我将先后介绍上图六个部分&#xff0c;本文先介绍http的协议版本。 2.http协议版本 http协议的历史并不长&#xff0c;从1991的0.9版本到现在(2017)仅仅才20多年&#xff0c;算算下来,http还是正处青年&#xff0c;正是大好发展的好时光…

使用管道符组合使用命令_如何使用管道的魔力

使用管道符组合使用命令Surely you have heard of pipelines or ETL (Extract Transform Load), or seen some method in a library, or even heard of any tool to create pipelines. However, you aren’t using it yet. So, let me introduce you to the fantastic world of…

C# new关键字和对象类型转换(双括号、is操作符、as操作符)

一、new关键字 CLR要求所有的对象都通过new来创建,代码如下: Object objnew Object(); 以下是new操作符做的事情 1、计算类型及其所有基类型(一直到System.Object,虽然它没有定义自己的实例字段)中定义的所有实例字段需要的字节数.堆上每个对象都需要一些额外的成员,包括“类型…

JDBC01 利用JDBC连接数据库【不使用数据库连接池】

目录&#xff1a; 1 什么是JDBC 2 JDBC主要接口 3 JDBC编程步骤【学渣版本】 5 JDBC编程步骤【学神版本】 6 JDBC编程步骤【学霸版本】 1 什么是JDBC JDBC是JAVA提供的一套标准连接数据库的接口&#xff0c;规定了连接数据库的步骤和功能&#xff1b;不同的数据库提供商提供了一…

编译原理—词法分析器(Java)

1.当运行程序时&#xff0c;程序会读取项目下的program.txt文件 2. 程序将会逐行读取program.txt中的源程序&#xff0c;进行词法分析&#xff0c;并将分析的结果输出。 3. 如果发现错误&#xff0c;程序将会中止读取文件进行分析&#xff0c;并输出错误提示 所用单词的构词规…

为什么我们需要使用Pandas新字符串Dtype代替文本数据对象

We have to represent every bit of data in numerical values to be processed and analyzed by machine learning and deep learning models. However, strings do not usually come in a nice and clean format and require a lot preprocessing.我们必须以数值表示数据的每…

递归方程组解的渐进阶的求法——代入法

递归方程组解的渐进阶的求法——代入法 用这个办法既可估计上界也可估计下界。如前面所指出&#xff0c;方法的关键步骤在于预先对解答作出推测&#xff0c;然后用数学归纳法证明推测的正确性。 例如&#xff0c;我们要估计T(n)的上界&#xff0c;T(n)满足递归方程&#xff1a;…

编译原理—语法分析器(Java)

递归下降语法分析 1. 语法成分说明 <语句块> :: begin<语句串> end <语句串> :: <语句>{&#xff1b;<语句>} <语句> :: <赋值语句> | <循环语句> | <条件语句> <关系运算符> :: < | < | > | > | |…

编译原理—语义分析(Java)

递归下降语法制导翻译 实现含多条简单赋值语句的简化语言的语义分析和中间代码生成。 测试样例 begin a:2; b:4; c:c-1; area:3.14*a*a; s:2*3.1416*r*(hr); end #词法分析 public class analyzer {public static List<String> llistnew ArrayList<>();static …

linux boot菜单列表,Bootstrap 下拉菜单(Dropdowns)简介

Bootstrap 下拉菜单是可切换的&#xff0c;是以列表格式显示链接的上下文菜单。这可以通过与 下拉菜单(Dropdown) JavaScript 插件 的互动来实现。如需使用下拉菜单&#xff0c;只需要在 class .dropdown 内加上下拉菜单即可。下面的实例演示了基本的下拉菜单&#xff1a;实例主…

数据挖掘—Apriori算法(Java实现)

算法描述 &#xff08;1&#xff09;扫描全部数据&#xff0c;产生候选1-项集的集合C1&#xff1b; &#xff08;2&#xff09;根据最小支持度&#xff0c;由候选1-项集的集合C1产生频繁1-项集的集合L1&#xff1b; &#xff08;3&#xff09;对k>1&#xff0c;重复执行步骤…

泰晤士报下载_《泰晤士报》和《星期日泰晤士报》新闻编辑室中具有指标的冒险活动-第1部分:问题

泰晤士报下载TLDR: Designing metrics that help you make better decisions is hard. In The Times and The Sunday Times newsrooms, we have spent a lot of time trying to tackle three particular problems.TLDR &#xff1a;设计度量标准以帮助您做出更好的决策非常困难…

背景消除的魔力

图片的功能非常强大&#xff0c;有一图胜千言的效果&#xff0c;所以在文档或演示文稿中使用图片来增加趣味性是一种很棒的想法。但问题是&#xff0c;图片通常会变为文字中间的独立矩形&#xff0c;而不是真正与内容融合在一起。您可以在图片中放置边框或效果&#xff0c;使其…