期权数据 获取_我如何免费获得期权数据

期权数据 获取

by Harry Sauers

哈里·绍尔斯(Harry Sauers)

我如何免费获得期权数据 (How I get options data for free)

网页抓取金融简介 (An introduction to web scraping for finance)

Ever wished you could access historical options data, but got blocked by a paywall? What if you just want it for research, fun, or to develop a personal trading strategy?

曾经希望您可以访问历史期权数据,但是却被付费专区阻止了吗? 如果您只是想将其用于研究,娱乐或制定个人交易策略该怎么办?

In this tutorial, you’ll learn how to use Python and BeautifulSoup to scrape financial data from the Web and build your own dataset.

在本教程中,您将学习如何使用Python和BeautifulSoup从Web刮取财务数据并构建自己的数据集。

入门 (Getting Started)

You should have at least a working knowledge of Python and Web technologies before beginning this tutorial. To build these up, I highly recommend checking out a site like codecademy to learn new skills or brush up on old ones.

在开始本教程之前,您应该至少具有Python和Web技术的工作知识。 要建立这些基础,我强烈建议您访问codecademy之类的网站,以学习新技能或学习旧技能。

First, let’s spin up your favorite IDE. Normally, I use PyCharm but, for a quick script like this Repl.it will do the job too. Add a quick print (“Hello world”) to ensure your environment is set up correctly.

首先,让我们启动您最喜欢的IDE。 通常,我使用PyCharm,但是对于像Repl.it这样的快速脚本也可以完成此工作。 添加快速打印(“ Hello world”)以确保正确设置您的环境。

Now we need to figure out a data source.

现在我们需要找出一个数据源。

Unfortunately, Cboe’s awesome options chain data is pretty locked down, even for current delayed quotes. Luckily, Yahoo Finance has solid enough options data here. We’ll use it for this tutorial, as web scrapers often need some content awareness, but it is easily adaptable for any data source you want.

不幸的是,即使对于当前的延迟报价, Cboe令人敬畏的期权链数据也已被锁定。 幸运的是,Yahoo Finance 在这里拥有足够可靠的期权数据。 我们将在本教程中使用它,因为网络抓取工具通常需要一些内容意识,但是它很容易适应您想要的任何数据源。

依存关系 (Dependencies)

We don’t need many external dependencies. We just need the Requests and BeautifulSoup modules in Python. Add these at the top of your program:

我们不需要很多外部依赖。 我们只需要Python中的Requests和BeautifulSoup模块。 将这些添加到程序顶部:

from bs4 import BeautifulSoupimport requests

Create a main method:

创建一个main方法:

def main():  print(“Hello World!”)if __name__ == “__main__”:  main()

刮HTML (Scraping HTML)

Now you’re ready to start scraping! Inside main(), add these lines to fetch the page’s full HTML:

现在您就可以开始抓取了! 在main()内部,添加以下行以获取页面的完整HTML

data_url = “https://finance.yahoo.com/quote/SPY/options"data_html = requests.get(data_url).contentprint(data_html)

This fetches the page’s full HTML content, so we can find the data we want in it. Feel free to give it a run and observe the output.

这将获取页面的完整HTML内容,因此我们可以在其中找到所需的数据。 随意运行并观察输出。

Feel free to comment out print statements as you go — these are just there to help you understand what the program is doing at any given step.

随时随地注释打印语句-这些语句可以帮助您了解程序在任何给定步骤中的操作。

BeautifulSoup is the perfect tool for working with HTML data in Python. Let’s narrow down the HTML to just the options pricing tables so we can better understand it:

BeautifulSoup是在Python中处理HTML数据的理想工具。 让我们将HTML的范围缩小到期权定价表,以便我们可以更好地理解它:

content = BeautifulSoup(data_html, “html.parser”) # print(content)
options_tables = content.find_all(“table”) print(options_tables)

That’s still quite a bit of HTML — we can’t get much out of that, and Yahoo’s code isn’t the most friendly to web scrapers. Let’s break it down into two tables, for calls and puts:

那仍然是HTML大部分-我们不能从中得到很多,而且Yahoo的代码对网络抓取工具并不是最友好的。 让我们将其分解为两个表,用于看涨期权和看跌期权:

options_tables = [] tables = content.find_all(“table”) for i in range(0, len(content.find_all(“table”))):   options_tables.append(tables[i])
print(options_tables)

Yahoo’s data contains options that are pretty deep in- and out-of-the-money, which might be great for certain purposes. I’m only interested in near-the-money options, namely the two calls and two puts closest to the current price.

雅虎的数据包含大量的价内和价外选项,对于某些用途而言可能非常有用。 我只对近价期权感兴趣,即最接近当前价格的两个看涨期权和两个看跌期权。

Let’s find these, using BeautifulSoup and Yahoo’s differential table entries for in-the-money and out-of-the-money options:

让我们使用BeautifulSoup和Yahoo的差异表条目来选择价内和价外选项,以找到这些:

expiration = datetime.datetime.fromtimestamp(int(datestamp)).strftime(“%Y-%m-%d”)
calls = options_tables[0].find_all(“tr”)[1:] # first row is header
itm_calls = []otm_calls = []
for call_option in calls:    if “in-the-money” in str(call_option):  itm_calls.append(call_option)  else:    otm_calls.append(call_option)
itm_call = itm_calls[-1]otm_call = otm_calls[0]
print(str(itm_call) + “ \n\n “ + str(otm_call))

Now, we have the table entries for the two options nearest to the money in HTML. Let’s scrape the pricing data, volume, and implied volatility from the first call option:

现在,我们有了最接近HTML的money的两个选项的表条目。 让我们从第一个看涨期权中抓取定价数据,数量和隐含波动率:

itm_call_data = [] for td in BeautifulSoup(str(itm_call), “html.parser”).find_all(“td”):   itm_call_data.append(td.text)
print(itm_call_data)
itm_call_info = {‘contract’: itm_call_data[0], ‘strike’: itm_call_data[2], ‘last’: itm_call_data[3],  ‘bid’: itm_call_data[4], ‘ask’: itm_call_data[5], ‘volume’: itm_call_data[8], ‘iv’: itm_call_data[10]}
print(itm_call_info)

Adapt this code for the next call option:

将此代码改编为下一个调用选项:

# otm callotm_call_data = []for td in BeautifulSoup(str(otm_call), “html.parser”).find_all(“td”):  otm_call_data.append(td.text)
# print(otm_call_data)
otm_call_info = {‘contract’: otm_call_data[0], ‘strike’: otm_call_data[2], ‘last’: otm_call_data[3],  ‘bid’: otm_call_data[4], ‘ask’: otm_call_data[5], ‘volume’: otm_call_data[8], ‘iv’: otm_call_data[10]}
print(otm_call_info)

Give your program a run!

运行您的程序!

You now have dictionaries of the two near-the-money call options. It’s enough just to scrape the table of put options for this same data:

现在,您将拥有两个近乎全额认购期权的字典。 只需为这些相同的数据刮入看跌期权表即可:

puts = options_tables[1].find_all("tr")[1:]  # first row is header
itm_puts = []  otm_puts = []
for put_option in puts:    if "in-the-money" in str(put_option):      itm_puts.append(put_option)    else:      otm_puts.append(put_option)
itm_put = itm_puts[0]  otm_put = otm_puts[-1]
# print(str(itm_put) + " \n\n " + str(otm_put) + "\n\n")
itm_put_data = []  for td in BeautifulSoup(str(itm_put), "html.parser").find_all("td"):    itm_put_data.append(td.text)
# print(itm_put_data)
itm_put_info = {'contract': itm_put_data[0],                  'last_trade': itm_put_data[1][:10],                  'strike': itm_put_data[2], 'last': itm_put_data[3],                   'bid': itm_put_data[4], 'ask': itm_put_data[5], 'volume': itm_put_data[8], 'iv': itm_put_data[10]}
# print(itm_put_info)
# otm put  otm_put_data = []  for td in BeautifulSoup(str(otm_put), "html.parser").find_all("td"):    otm_put_data.append(td.text)
# print(otm_put_data)
otm_put_info = {'contract': otm_put_data[0],                  'last_trade': otm_put_data[1][:10],                  'strike': otm_put_data[2], 'last': otm_put_data[3],                   'bid': otm_put_data[4], 'ask': otm_put_data[5], 'volume': otm_put_data[8], 'iv': otm_put_data[10]}

Congratulations! You just scraped data for all near-the-money options of the S&P 500 ETF, and can view them like this:

恭喜你! 您只需收集S&P 500 ETF所有近价期权的数据,就可以像这样查看它们:

print("\n\n") print(itm_call_info) print(otm_call_info) print(itm_put_info) print(otm_put_info)

Give your program a run — you should get data like this printed to the console:

运行您的程序-您应该将这样的数据打印到控制台:

{‘contract’: ‘SPY190417C00289000’, ‘last_trade’: ‘2019–04–15’, ‘strike’: ‘289.00’, ‘last’: ‘1.46’, ‘bid’: ‘1.48’, ‘ask’: ‘1.50’, ‘volume’: ‘4,646’, ‘iv’: ‘8.94%’}{‘contract’: ‘SPY190417C00290000’, ‘last_trade’: ‘2019–04–15’, ‘strike’: ‘290.00’, ‘last’: ‘0.80’, ‘bid’: ‘0.82’, ‘ask’: ‘0.83’, ‘volume’: ‘38,491’, ‘iv’: ‘8.06%’}{‘contract’: ‘SPY190417P00290000’, ‘last_trade’: ‘2019–04–15’, ‘strike’: ‘290.00’, ‘last’: ‘0.77’, ‘bid’: ‘0.75’, ‘ask’: ‘0.78’, ‘volume’: ‘11,310’, ‘iv’: ‘7.30%’}{‘contract’: ‘SPY190417P00289000’, ‘last_trade’: ‘2019–04–15’, ‘strike’: ‘289.00’, ‘last’: ‘0.41’, ‘bid’: ‘0.40’, ‘ask’: ‘0.42’, ‘volume’: ‘44,319’, ‘iv’: ‘7.79%’}

设置定期数据收集 (Setting up recurring data collection)

Yahoo, by default, only returns the options for the date you specify. It’s this part of the URL: https://finance.yahoo.com/quote/SPY/options?date=1555459200

默认情况下,Yahoo仅返回您指定日期的选项。 这是URL的这一部分: https: //finance.yahoo.com/quote/SPY/options ? date = 1555459200

This is a Unix timestamp, so we’ll need to generate or scrape one, rather than hardcoding it in our program.

这是Unix时间戳,因此我们需要生成或刮取一个时间戳,而不是在程序中对其进行硬编码。

Add some dependencies:

添加一些依赖项:

import datetime, time

Let’s write a quick script to generate and verify a Unix timestamp for our next set of options:

让我们编写一个快速脚本来为下一组选项生成并验证Unix时间戳:

def get_datestamp():  options_url = “https://finance.yahoo.com/quote/SPY/options?date="  today = int(time.time())  # print(today)  date = datetime.datetime.fromtimestamp(today)  yy = date.year  mm = date.month  dd = date.day

The above code holds the base URL of the page we are scraping and generates a datetime.date object for us to use in the future.

上面的代码保存了我们要抓取的页面的基本URL,并生成了datetime.date对象供我们将来使用。

Let’s increment this date by one day, so we don’t get options that have already expired.

让我们将此日期增加一天,这样我们就不会得到已经到期的选项。

dd += 1

Now, we need to convert it back into a Unix timestamp and make sure it’s a valid date for options contracts:

现在,我们需要将其转换回Unix时间戳,并确保它是期权合约的有效日期:

options_day = datetime.date(yy, mm, dd) datestamp = int(time.mktime(options_day.timetuple())) # print(datestamp) # print(datetime.datetime.fromtimestamp(options_stamp))
# vet timestamp, then return if valid for i in range(0, 7):   test_req = requests.get(options_url + str(datestamp)).content   content = BeautifulSoup(test_req, “html.parser”)   # print(content)   tables = content.find_all(“table”)
if tables != []:   # print(datestamp)   return str(datestamp) else:   # print(“Bad datestamp!”)   dd += 1   options_day = datetime.date(yy, mm, dd)   datestamp = int(time.mktime(options_day.timetuple()))  return str(-1)

Let’s adapt our fetch_options method to use a dynamic timestamp to fetch options data, rather than whatever Yahoo wants to give us as the default.

让我们调整fetch_options方法以使用动态时间戳来获取选项数据,而不是Yahoo想要给我们的默认值。

Change this line:

更改此行:

data_url = “https://finance.yahoo.com/quote/SPY/options"

To this:

对此:

datestamp = get_datestamp()data_url = “https://finance.yahoo.com/quote/SPY/options?date=" + datestamp

Congratulations! You just scraped real-world options data from the web.

恭喜你! 您只是从网上抓取了真实的期权数据。

Now we need to do some simple file I/O and set up a timer to record this data each day after market close.

现在,我们需要执行一些简单的文件I / O,并设置一个计时器,以在收市后每天记录此数据。

改善程序 (Improving the program)

Rename main() to fetch_options() and add these lines to the bottom:

main()重命名为fetch_options()并将这些行添加到底部:

options_list = {‘calls’: {‘itm’: itm_call_info, ‘otm’: otm_call_info}, ‘puts’: {‘itm’: itm_put_info, ‘otm’: otm_put_info}, ‘date’: datetime.date.fromtimestamp(time.time()).strftime(“%Y-%m-%d”)}return options_list

Create a new method called schedule(). We’ll use this to control when we scrape for options, every twenty-four hours after market close. Add this code to schedule our first job at the next market close:

创建一个名为schedule()的新方法。 市场收盘后每隔24小时,我们将使用它来控制何时刮取期权。 添加以下代码以安排我们在下一个市场收盘时的第一份工作:

from apscheduler.schedulers.background import BackgroundScheduler
scheduler = BackgroundScheduler()
def schedule():  scheduler.add_job(func=run, trigger=”date”, run_date = datetime.datetime.now())  scheduler.start()

In your if __name__ == “__main__”: statement, delete main() and add a call to schedule() to set up your first scheduled job.

if __name__ == “__main__”:语句中,删除main()并添加对schedule()的调用以设置您的第一个计划作业。

Create another method called run(). This is where we’ll handle the bulk of our operations, including actually saving the market data. Add this to the body of run():

创建另一个名为run()方法。 我们将在这里处理大部分业务,包括实际保存市场数据。 将此添加到run()的主体中:

today = int(time.time()) date = datetime.datetime.fromtimestamp(today) yy = date.year mm = date.month dd = date.day
# must use 12:30 for Unix time instead of 4:30 NY time next_close = datetime.datetime(yy, mm, dd, 12, 30)
# do operations here “”” This is where we’ll write our last bit of code. “””
# schedule next job scheduler.add_job(func=run, trigger=”date”, run_date = next_close)
print(“Job scheduled! | “ + str(next_close))

This lets our code call itself in the future, so we can just put it on a server and build up our options data each day. Add this code to actually fetch data under “”” This is where we’ll write our last bit of code. “””

这样一来,我们的代码就可以在将来自行调用,因此我们可以将其放在服务器上,并每天建立选项数据。 添加此代码以实际获取“”” This is where we'll write our last bit of code. “””下的数据。 “”” This is where we'll write our last bit of code. “”” “”” This is where we'll write our last bit of code. “””

options = {}
# ensures option data doesn’t break the program if internet is out try:   if next_close > datetime.datetime.now():     print(“Market is still open! Waiting until after close…”)   else:     # ensures program was run after market hours     if next_close < datetime.datetime.now():      dd += 1       next_close = datetime.datetime(yy, mm, dd, 12, 30)       options = fetch_options()       print(options)       # write to file       write_to_csv(options)except:  print(“Check your connection and try again.”)

保存数据 (Saving data)

You may have noticed that write_to_csv isn’t implemented yet. No worries — let’s take care of that here:

您可能已经注意到write_to_csv尚未实现。 不用担心-让我们在这里解决:

def write_to_csv(options_data):  import csv  with open(‘options.csv’, ‘a’, newline=’\n’) as csvfile:  spamwriter = csv.writer(csvfile, delimiter=’,’)  spamwriter.writerow([str(options_data)])

打扫干净 (Cleaning up)

As options contracts are time-sensitive, we might want to add a field for their expiration date. This capability is not included in the raw HTML we scraped.

由于期权合约对时间敏感,因此我们可能想为其到期日添加一个字段。 此功能未包含在我们抓取的原始HTML中。

Add this line of code to save and format the expiration date towards the top of fetch_options():

添加以下代码行以保存到期日期并将其格式化为fetch_options()的顶部:

expiration =  datetime.datetime.fromtimestamp(int(get_datestamp())).strftime("%Y-%m-%d")

Add ‘expiration’: expiration to the end of each option_info dictionary like so:

在每个option_info字典的末尾添加'expiration': expiration ,如下所示:

itm_call_info = {'contract': itm_call_data[0],  'strike': itm_call_data[2], 'last': itm_call_data[3],   'bid': itm_call_data[4], 'ask': itm_call_data[5], 'volume': itm_call_data[8], 'iv': itm_call_data[10], 'expiration': expiration}

Give your new program a run — it’ll scrape the latest options data and write it to a .csv file as a string representation of a dictionary. The .csv file will be ready to be parsed by a backtesting program or served to users through a webapp. Congratulations!

运行您的新程序-它会刮擦最新的选项数据,并将其作为字典的字符串表示形式写入.csv文件。 .csv文件将可以通过回测程序进行解析,也可以通过网络应用程序提供给用户。 恭喜你!

翻译自: https://www.freecodecamp.org/news/how-i-get-options-data-for-free-fba22d395cc8/

期权数据 获取

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/391602.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

显示与删除使用工具

右击工具菜单栏中的空白处选择自定义 在弹出的自定义菜单中选择命令选项在选择想要往里面添加工具的菜单&#xff0c;之后在选择要添加的工具 若想要删除工具栏中的某个工具&#xff0c;在打开自定义菜单后&#xff0c;按住鼠标左键拖动要删除工具到空白处 例如 转载于:https:/…

js值的拷贝和值的引用_到达P值的底部:直观的解释

js值的拷贝和值的引用介绍 (Introduction) Welcome to this lesson on calculating p-values.欢迎参加有关计算p值的课程。 Before we jump into how to calculate a p-value, it’s important to think about what the p-value is really for.在我们开始计算p值之前&#xff…

leetcode 115. 不同的子序列(dp)

给定一个字符串 s 和一个字符串 t &#xff0c;计算在 s 的子序列中 t 出现的个数。 字符串的一个 子序列 是指&#xff0c;通过删除一些&#xff08;也可以不删除&#xff09;字符且不干扰剩余字符相对位置所组成的新字符串。&#xff08;例如&#xff0c;“ACE” 是 “ABCDE…

监督学习-KNN最邻近分类算法

分类&#xff08;Classification&#xff09;指的是从数据中选出已经分好类的训练集&#xff0c;在该训练集上运用数据挖掘分类的技术建立分类模型&#xff0c;从而对没有分类的数据进行分类的分析方法。 分类问题的应用场景&#xff1a;用于将事物打上一个标签&#xff0c;通常…

istio 和 kong_如何启动和运行Istio

istio 和 kongby Chris Cooney克里斯库尼(Chris Cooney) 如何启动和运行Istio (How to get Istio up and running) 而一旦完成&#xff0c;您就可以做的疯狂的事情。 (And the crazy stuff you can do once it is.) The moment you get Istio working on your cluster, it fee…

js练习--贪吃蛇(转)

最近一直在看javascript&#xff0c;但是发现不了动力。就开始想找动力&#xff0c;于是在网上找到了一个用js写的贪吃蛇游戏。奈何还不会用git&#xff0c;就只能先这样保存着。哈哈哈&#xff0c;这也算第一篇博客了&#xff0c;以后会坚持用自己的代码写博客的&#xff0c;下…

bzoj千题计划169:bzoj2463: [中山市选2009]谁能赢呢?

http://www.lydsy.com/JudgeOnline/problem.php?id2463 n为偶数时&#xff0c;一定可以被若干个1*2 矩形覆盖 先手每次从矩形的一端走向另一端&#xff0c;后手每次走向一个新的矩形 所以先手必胜 n为奇数时&#xff0c;先手走完一步后&#xff0c;剩下同n为偶数 所以先手必败…

无监督学习-主成分分析和聚类分析

聚类分析&#xff08;cluster analysis&#xff09;是将一组研究对象分为相对同质的群组&#xff08;clusters&#xff09;的统计分析技术&#xff0c;即将观测对象的群体按照相似性和相异性进行不同群组的划分&#xff0c;划分后每个群组内部各对象相似度很高&#xff0c;而不…

struts实现分页_在TensorFlow中实现点Struts

struts实现分页If you want to get started on 3D Object Detection and more specifically on Point Pillars, I have a series of posts written on it just for that purpose. Here’s the link. Also, going through the Point Pillars paper directly will be really help…

封装jQuery下载文件组件

使用jQuery导出文档文件 jQuery添加download组件 jQuery.download function(url, data, method){if( url && data ){data typeof data string ? data : paramEdit(data);     function paramEdit(obj){        var temStr "",tempStr"…

7.13. parallel - build and execute shell command lines from standard input in parallel

并行执行shell命令 $ sudo apt-get install parallel 例 7.5. parallel - build and execute shell command lines from standard input in parallel $ cat *.csv | parallel --pipe grep 13113 设置块大小 $ cat *.csv | parallel --block 10M --pipe grep 131136688 原…

MySQL-InnoDB索引实现

联合索引提高查询效率的原理 MySQL会为InnoDB的每个表建立聚簇索引&#xff0c;如果表有索引会建立二级索引。聚簇索引以主键建立索引&#xff0c;如果没有主键以表中的唯一键建立&#xff0c;唯一键也没会以隐式的创建一个自增的列来建立。聚簇索引和二级索引都是一个b树&…

Go语言-基本的http请求操作

Go发起GET请求 基本的GET请求 //基本的GET请求 package mainimport ("fmt""io/ioutil""net/http" )func main() {resp, err : http.Get("http://www.hao123.com")if err ! nil {fmt.Println(err)return}defer resp.Body.Close()body, …

钉钉设置jira机器人_这是当您机器学习JIRA票证时发生的事情

钉钉设置jira机器人For software developers, one of the most-debated and maybe even most-hated questions is “…and how long will it take?”. I’ve experienced those discussions myself, which oftentimes lacked precise information on the requirements. What I…

python的赋值与参数传递(python和linux切换)

1&#xff0c;python模式切回成linux模式------exit&#xff08;&#xff09; linux模式切换成python模式------python 2,在linux里运行python的复合语句&#xff08;得在linux创建.py文件&#xff09; touch le.py vim le.py----在le文件里输入python语句 #!/usr/bin/python …

vscode 标准库位置_如何在VSCode中使用标准

vscode 标准库位置I use Visual Studio Code as my text editor. When I write JavaScript, I follow JavaScript Standard Style.Theres an easy way to integrate Standard in VS Code—with the vscode-standardjs plugin. I made a video for this some time ago if youre …

leetcode 1603. 设计停车系统

请你给一个停车场设计一个停车系统。停车场总共有三种不同大小的车位&#xff1a;大&#xff0c;中和小&#xff0c;每种尺寸分别有固定数目的车位。 请你实现 ParkingSystem 类&#xff1a; ParkingSystem(int big, int medium, int small) 初始化 ParkingSystem 类&#xf…

IBM量子计算新突破:成功构建50个量子比特原型机

本文来自AI新媒体量子位&#xff08;QbitAI&#xff09;IBM去年开始以云计算服务的形式提供量子计算能力。当时&#xff0c;IBM发布了包含5个量子比特的计算机。在短短18个月之后&#xff0c;IBM周五宣布&#xff0c;将发布包含20个量子比特的计算机。 IBM还宣布&#xff0c;该…

ChromeDriver与chrome对应关系

http://chromedriver.storage.googleapis.com/index.html 转载于:https://www.cnblogs.com/gcgc/p/11387605.html

快速排序和快速选择(quickSort and quickSelect)算法

排序算法&#xff1a;快速排序(quicksort)递归与非递归算法 TopK问题&#xff1a;快速选择(quickSelect)算法 import java.util.*; import java.lang.*;public class Demo {// 非递归 using stackpublic static void quickSortStack(int[] nums, int left, int right) {if (lef…