算法 从 数中选出_算法可以选出胜出的nba幻想选秀吗

算法 从 数中选出

Note from Towards Data Science’s editors: While we allow independent authors to publish articles in accordance with our rules and guidelines, we do not endorse each author’s contribution. You should not rely on an author’s works without seeking professional advice. See our Reader Terms for details.

Towards Data Science编辑的注意事项: 尽管我们允许独立作者按照我们的 规则和指南 发表文章 ,但我们不认可每位作者的贡献。 您不应在未征求专业意见的情况下依赖作者的作品。 有关 详细信息, 请参见我们的 阅读器条款

I enjoy basketball. It’s a fast-paced competitive game and I’ve enjoyed both playing and watching it for a long time. The NBA is famous for generating very clean data, which has long been used by enthusiasts (like myself) for data visualizations, modeling and game predictions.

我喜欢篮球。 这是一款快节奏的竞技游戏,很长时间以来我都喜欢玩和观看它。 NBA以生成非常干净的数据而闻名,长期以来,发烧友(如我自己)一直将其用于数据可视化 , 建模和游戏预测 。

Recently, I was contacted by DraftKings regarding an interview for a potential job. As part of my preparations for the same, I started using their platform and competing in mock competitions to get acquainted with the DraftKings (DK) contest process. It was during this time period that I really started getting into the idea of using data to model and predict a winning roster.

最近, DraftKings就某项潜在工作的面试与我联系。 作为准备工作的一部分,我开始使用他们的平台并参加模拟竞赛,以了解DraftKings(DK)竞赛过程。 正是在这段时间里,我才真正开始使用数据建模和预测获胜者名单的想法。

I built the algorithm iteratively, and from scratch- starting with a naive version 1, a more robust version 2 and currently I’m working on a winning version 3.

我迭代地构建了算法,从零开始,从朴素的版本1开始,是功能更强大的版本2,目前我正在开发获奖的版本3。

I built the algorithm iteratively, and from scratch

我从头开始迭代构建算法

You can follow along my algorithm design journey in the rest of the article.

您可以在本文的其余部分中继续我的算法设计过程。

快速级别设置:评分和规则 (Quick Level Set: Scoring and Rules)

DK’s rules and scoring for their NBA classic fantasy contests are fairly intuitive, even if you have no prior basketball knowledge. In a nutshell, the objective is to:

即使您没有篮球知识,DK的NBA经典幻想比赛规则和得分也非常直观。 简而言之,目标是:

Create an 8-player lineup while staying under the $50,000 salary cap.

创建一个8人游戏阵容,同时将工资保持在50,000美元以下。

Players get different points for different actions (more details below) and the draft with the most number of points, at the end of all games in a night, wins. Sounds simple enough :)

玩家在不同的操作中获得不同的分数(更多详细信息,请参见下文),并且在一夜内所有游戏结束时,得分最高的选秀会获胜。 听起来很简单:)

The breakdown for different actions that result in positive (or negative) points can be seen below.

导致正(或负)分的不同动作的细分如下所示。

Image for post
NBA Fantasy points breakdown- DraftKings. Photo by Author.
NBA Fantasy点数分解-DraftKings。 图片由作者提供。

One last constraint which makes drafting slightly more complicated is player positions. According to DK: Lineups will consist of 8 players and must include players from at least 2 different NBA games.

最后一个使选秀稍微复杂一些的约束是球员位置。 根据DK: 阵容将由8名球员组成,并且必须包括至少2场不同NBA游戏中的球员。

Further, the 8 players are broken down by positions, which can be seen below.

此外,这8个玩家按位置细分,如下所示。

Image for post
NBA Fantasy player positions- DraftKings. Photo by Author.
NBA Fantasy球员位置-DraftKings。 图片由作者提供。

There you have it! A simple optimization problem with a set of constraints. Sounds like something an algorithms would excel at. Or would it?

你有它! 具有一组约束的简单优化问题。 听起来像算法会擅长的事情。 还是会?

算法版本1-天真 (Algorithm Version 1- Naive)

My goal with this algorithm was to build it as fast as possible, with little to no hopes of winning. Mainly because I was interested in setting up a strong foundation, without worrying about building complex logic early in the process. To do this, I downloaded a player dataset from DK and started a Jupyter notebook. If you’re interested, you can find the full raw data here and my notebook here.

我使用此算法的目标是尽可能快地构建它,几乎没有希望获胜。 主要是因为我有兴趣建立一个强大的基础,而不必担心在此过程的早期就构建复杂的逻辑。 为此,我从DK下载了播放器数据集并启动了Jupyter笔记本。 如果你有兴趣,你可以找到完整的原始数据, 在这里 ,我的笔记本电脑在这里 。

Let’s see what our data looks like.

让我们看看我们的数据是什么样的。

Image for post
Players dataset- DraftKings. Photo by Author.
玩家数据集-DraftKings。 图片由作者提供。

Right off the bat, we can tell that for a simple algorithm, given our requirements and constraints, we’ll find the following columns useful: ID, Salary and AvgPointsPerGame (fantasy points). This would allow us to pick the “best” players while staying under the $50,000 salary cap. Sure, without positional information we could have overlaps etc. but that’s an issue for a later version. Remember, version 1 should be the simplest implementation of your product.

马上,我们可以说出,对于一个简单的算法,鉴于我们的要求和约束,我们将发现以下几栏有用:ID,Salary和AvgPointsPerGame(幻想点)。 这将使我们能够选择“最佳”球员,同时保持在50,000美元的薪金上限以下。 当然,如果没有位置信息,我们可能会有重叠等,但这对于更高版本是一个问题。 请记住,版本1应该是产品的最简单实现。

Given this data, our first pass optimization algorithm can be broken up into the following simple steps:

有了这些数据,我们的首过优化算法可以分解为以下简单步骤:

  1. Randomly select 8 players from the dataset.

    从数据集中随机选择8个玩家。
  2. If the sum of the salaries of the players is greater than $50,000: go back to step 1 (too expensive).

    如果玩家的薪金总和超过50,000美元:请返回步骤1(太贵)。
  3. Otherwise, sum the AvgPointsPerGame of each of the players in the roster and compare with a master maximum value. If greater, replace maximum value and roster.

    否则,对名册中每个玩家的AvgPointsPerGame求和,然后与主最大值进行比较。 如果更大,则替换最大值和花名册。
  4. Unless all possible combinations have been explored, return to step 1. Once no more combinations, return the maximum value and the roster.

    除非已探究所有可能的组合,否则请返回步骤1。不再组合时,请返回最大值和花名册。

There we have it: a simple naive algorithm that picks 8 players in random that will have the maximum expected fantasy points while staying under the $50,000 salary cap. But this algorithm has a few glaring issues:

我们有一个简单的天真的算法,该算法随机选择8个玩家,这些玩家将具有最大的预期幻想积分,同时保持在50,000美元的薪金上限以下。 但是此算法存在一些明显的问题:

  • No control regarding the position of the players. Hence the algorithm could generate a roster which consists of >3 of one position (G/F), in which case the roster would be invalid.

    无法控制玩家的位置。 因此,该算法可以生成由一个位置(G / F)> 3组成的花名册,在这种情况下,该花名册将无效。
  • No check on players who are injured or not scheduled to play. This would result in a most definitive loss as all player points are important for a winning draft.

    不检查受伤或未安排比赛的球员。 这将导致最确定的损失,因为所有球员得分对获胜选秀都很重要。
  • Lastly, the algorithm is very inefficient. Considering that we need to check each possible roster: for a given number of players n and roster size r, the number of possible rosters would be-

    最后,该算法效率很低。 考虑到我们需要检查每个可能的名册:对于给定数量的n个玩家和名册大小r,可能的名册数量为-

C( n , r ) = n! / (n — r)! . r!

C(n,r)= n! /(n-r)! 。 !

To get an appreciation of this complexity, take a look at the table below which shows the number of checks if the total number of available players is 100.

要了解这种复杂性,请查看下表,该表显示了可用球员总数为100时的检查次数。

Image for post
Time complexity magnitude for first algorithm. Photo by Author.
第一种算法的时间复杂度大小。 图片由作者提供。

It’s safe to assume that our algorithm will take a VERY long time to output a roster of 8 players. But, because this is a first pass algorithm, we‘re happy with what we got. You can see the algorithm in action below, picking the top 5 players for a combined salary of $35,000. Not bad.

可以肯定地说,我们的算法将花费很长时间才能输出8名球员的花名册。 但是,由于这是首过算法,因此我们对所获得的结果感到满意。 您可以在下面看到该算法的运行情况,以最高薪水35,000美元选出前5名球员。 不错。

Image for post
Output from algorithm 1- Top 5 players with the maximum expected points under $35,000 combined salary. Note: first row shows combined expected fantasy points, second row is combined salary and third are the IDs of the player, followed by the names. Photo by Author.
算法1的输出-最高预期得分低于35,000美元的前5名球员的总工资。 注意:第一行显示组合的预期幻想积分,第二行显示组合的薪水,第三行显示玩家的ID,后跟姓名。 图片由作者提供。

Because we’re on a mission to build a winning algorithm, let’s talk about version 2 optimizations.

因为我们肩负着构建成功算法的使命,所以让我们谈谈版本2优化。

算法版本2-中级体育博彩者 (Algorithm Version 2- Intermediate Sports Bettor)

Now, this is where our algorithm goes from being a naive optimizer to an intermediate-level sports bettor. Based on the drawbacks of version 1, and the factorial time complexity, I decided to implement a few data and algorithm level optimizations.

现在,这是我们的算法从单纯的优化器发展为中级体育博彩者的地方。 基于版本1的缺点和阶乘时间复杂度,我决定实施一些数据和算法级别的优化

First, I cleaned the data to only include players who’re confirmed to play games. This was an easy way to decrease the total number of available players from ~100 to ~85. This might look like a small increase, but in reality, for a roster of 8 players, our number of checks drastically decreases when the total number of players decreases. The change in the number of checks can be seen below.

首先,我清除了数据,只包括经确认可以玩游戏的玩家。 这是将可用玩家总数从100个减少至85个的简便方法。 这看似有点增加,但实际上,对于名额8人的名单,当总人数减少时,我们的支票数会急剧减少。 支票数量的变化可以在下面看到。

  • C (100, 8) = 186,087,894,300

    C(100,8)= 186,087,894,300
  • C (85, 8) = 48,124,511,370

    C(85,8)= 48,124,511,370

Our total number of operations (or checks) in the algorithm went down by ~75%!

我们在算法中的操作(或检查)总数下降了约75%!

Next up, I modified the algorithm itself to pick specific positions. Now, instead of picking every possible roster from the total number of players available, the algorithm picks 3 guards from only all available guards, followed by 3 forwards and lastly 1 center. As you can see, the total here is only 7 players and leaves the last pick to the user. This is a quick way to save some additional time on the algorithm as the user can manually find the best remaining player (highest expected points given the salary remaining).

接下来,我修改了算法本身以选择特定位置。 现在,该算法不再从可用球员总数中选择所有可能的花名册,而是仅从所有可用后卫中挑选3个后卫,然后是3个前锋和最后1个中锋。 如您所见,此处的总数仅为7位玩家,而最后的选择权留给了用户。 这是一种节省算法上额外时间的快速方法,因为用户可以手动找到剩余的最好的球员(给定剩余的薪水,可以获得最高的期望积分)。

This was a huge optimization because the number of guards vs the total number of players is ~40 vs 85. The number is similar for forwards and even less for centers. Note, there’s a slight overlap between the players in each category as some players play multiple positions but this was easy to deal with: I removed played who were already picked as Guards, before picking Forwards etc. The performance boost as a result of the above changes can be seen below:

这是一个巨大的优化,因为后卫人数与球员总数之比约为40比85。前锋的人数相似,中锋的人数更少。 请注意,每个类别中的玩家之间都有一点重叠,因为有些玩家扮演多个职位,但是这很容易解决:我删除了已经被选为后卫的角色,然后再选择Forwards等。由于上述原因,性能提升更改如下所示:

  • C (85 , 8) = 48,124,511,370

    C(85,8)= 48,124,511,370
  • C (40 , 3) x C (40 , 3) x C (20, 1) = 1,952,288,000

    C(40,3)x C(40,3)x C(20,1)= 1,952,288,000

This is huge. Now, the algorithm is conducting almost ~95% fewer operations and we have the best possible roster broken up by positions and under our salary cap. Let’s test our results!

这是巨大的。 现在,该算法的运算量减少了约95%,并且按职位和工资帽划分的人员名单可能最好。 让我们测试一下结果!

实际结果 (Real World Results)

If you’ve made it so far, congratulations. You’ve worked through the technical stuff, now it’s time for the results! I tried the algorithm’s pick over the course of three days on DK’s classic multiplier contests. Each time my entry fee was $1 and the payoff was $3 for the top 30% of the finishers. You can see the lineups generated by the algorithm and the results below.

到目前为止,如果您做到了,那就祝贺您。 您已经完成了技术性工作,现在是时候取得成果了! 我在DK的经典乘数比赛中尝试了3天的算法选择。 每次我的报名费是1美元,而前30%的完成者的回报是3美元。 您可以在下面查看算法生成的阵容和结果。

Image for post
Day 1 — Oh no, finished at rank 26 and lost money. DK screenshot by Author.
第一天-哦,不,以第26名的成绩结束并输了钱。 作者的DK屏幕截图。
Image for post
Day 2 — Oh no again, finished at rank 26 and lost another dollar. DK screenshot by Author.
第2天–哦,没有了,排名第26,又损失了1美元。 作者的DK屏幕截图。
Image for post
Day 3- Woohoo! Finished at rank 3 and made $3. DK screenshot by Author.
第3天-哇! 排名第3,并赚了$ 3。 作者的DK屏幕截图。

As you can see from the above results, the real world outcomes of the competition have been good! Out of the three days that I created lineups using the algorithm, we lost twice and won once. Our intermediate-level sports bettor algorithm has done better than I expected, but there’s still a long way to go.

从以上结果可以看出,比赛的真实结果是不错的! 在我使用算法创建阵容的三天内,我们输了两次,赢了一次。 我们的中级体育博彩算法比我预期的要好,但是还有很长的路要走。

I noticed few nuances about the results, including that our algorithm (before the v2 optimization) made a mistake on day 1 where an injured player was drafted into the team (P. Beverley) which resulted in a weak draft. This was fixed in version 2 and will not be repeated again. Additionally, once cool thing has been that despite the mixed results, the algorithm has consistently created lineups which get >200 fantasy points, which is pretty high!

我注意到关于结果的细微差别,包括我们的算法(在v2优化之前)在第1天犯了一个错误,即一名受伤的球员被征召入队(P. Beverley),导致选秀不力。 此问题在版本2中已修复,将不再重复。 此外,一旦出现有趣的结果是,尽管得到了混合结果,该算法仍会持续创建阵容,获得超过200个幻想点,这是非常高的!

下一步是什么? (What’s next?)

Well, there you have it. So far, I’ve spent $3 on entree fees and made $3 on winnings, for a grand total of $0 change! I have $25 left to spend on this project before my inner alarm bells start ringing, so I clearly need to improve this algorithm. After talking to some of my friends, who know a lot more about basketball than myself, I have a few hypotheses to test out. Some of these include:

好吧,那里有。 到目前为止,我已经在主菜费用上花费了$ 3,并在奖金中赚了$ 3,总共有$ 0的找零! 在我的内部警钟开始鸣响之前,我还有25美元可用于该项目,所以我显然需要改进此算法。 与我的一些朋友交谈后,我比我更了解篮球,我有一些假设可以检验。 其中一些包括:

  • Using additional player data over the last n games. This way the model would have more context, instead of just a snapshot value

    在过去n场比赛中使用其他玩家数据。 这样,模型将具有更多上下文,而不仅仅是快照值

  • Using prior team match-up data to adjust weights placed on certain games. For example, this could help avoid picking a player in a match-up where (based on previous meets) the player has failed to perform

    使用先前的球队比赛数据来调整某些游戏的权重。 例如,这可以帮助避免在对战中选择一名球员(基于先前的见面)而该球员未能完成比赛
  • Exploring dual optimization strategies

    探索双重优化策略

And more! If you have any ideas about how to improve this project please feel free to reach out to me on LinkedIn or over email which you can find on my Website. Additionally, all the data and code for this project can be found on my Github repository, so feel free to clone/fork it and test your own hypotheses! And, as always, any and all feedback is greatly appreciated.

和更多! 如果您对如何改善此项目有任何想法,请随时通过LinkedIn或通过我的网站上找到的电子邮件与我联系。 此外,该项目的所有数据和代码都可以在我的Github存储库中找到,因此随时可以克隆/分叉它并测试自己的假设! 而且,一如既往,我们非常感谢任何反馈。

Stay safe out there everyone and keep building cool stuff.

每个人都应该保持安全,并继续制作有趣的东西。

翻译自: https://towardsdatascience.com/can-an-algorithm-pick-a-winning-nba-fantasy-draft-c05342f130f2

算法 从 数中选出

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389780.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

django-rest-framework第一次使用使用常见问题

2019独角兽企业重金招聘Python工程师标准>>> 记录在第一次使用django-rest-framework框架使用时遇到的问题,为了便于理解在这里创建了Person和Grade这两个model from django.db import models class Person(models.Model):SHIRT_SIZES ((S, Small),(M, …

插入脚注把脚注标注删掉_地狱司机不应该只是英国电影历史数据中的脚注,这说明了为什么...

插入脚注把脚注标注删掉Cowritten by Andie Yam由安迪(Andie Yam)撰写 Hell Drivers”, 1957地狱司机 》电影海报 Data visualization is a great way to celebrate our favorite pieces of art as well as reveal connections and ideas that were previously invisible. Mor…

贝叶斯统计 传统统计_统计贝叶斯如何补充常客

贝叶斯统计 传统统计For many years, academics have been using so-called frequentist statistics to evaluate whether experimental manipulations have significant effects.多年以来,学者们一直在使用所谓的常客统计学来评估实验操作是否具有significant效果。…

saltstack二

配置管理 haproxy的安装部署 haproxy各版本安装包下载路径https://www.haproxy.org/download/1.6/src/,跳转地址为http,改为https即可 创建相关目录 # 创建配置目录 [rootlinux-node1 ~]# mkdir /srv/salt/prod/pkg/ [rootlinux-node1 ~]# mkdir /srv/sa…

319. 灯泡开关

319. 灯泡开关 初始时有 n 个灯泡处于关闭状态。第一轮,你将会打开所有灯泡。接下来的第二轮,你将会每两个灯泡关闭一个。 第三轮,你每三个灯泡就切换一个灯泡的开关(即,打开变关闭,关闭变打开&#xff0…

因为你的电脑安装了即点即用_即你所爱

因为你的电脑安装了即点即用Data visualization is a great way to celebrate our favorite pieces of art as well as reveal connections and ideas that were previously invisible. More importantly, it’s a fun way to connect things we love — visualizing data and …

2074. 反转偶数长度组的节点

2074. 反转偶数长度组的节点 给你一个链表的头节点 head 。 链表中的节点 按顺序 划分成若干 非空 组,这些非空组的长度构成一个自然数序列(1, 2, 3, 4, …)。一个组的 长度 就是组中分配到的节点数目。换句话说: 节点 1 分配给…

团队管理新思考_需要一个新的空间来思考讨论和行动

团队管理新思考andrew wong安德鲁黄 Follow跟随 Sep 4 九月4 There is a need for a new space to think, discuss, and act. This need are being felt by the majority of AI / ML / Data Product Managers out there. They are exhausted by the ever increasing data volum…

2075. 解码斜向换位密码

2075. 解码斜向换位密码 字符串 originalText 使用 斜向换位密码 ,经由 行数固定 为 rows 的矩阵辅助,加密得到一个字符串 encodedText 。 originalText 先按从左上到右下的方式放置到矩阵中。 先填充蓝色单元格,接着是红色单元格&#xff…

微服务实战(六):落地微服务架构到直销系统(事件存储)

在CQRS架构中,一个比较重要的内容就是当命令处理器从命令队列中接收到相关的命令数据后,通过调用领域对象逻辑,然后将当前事件的对象数据持久化到事件存储中。主要的用途是能够快速持久化对象此次的状态,另外也可以通过未来最终一…

时间序列数据的多元回归_清理和理解多元时间序列数据

时间序列数据的多元回归No matter what kind of data science project one is assigned to, making sense of the dataset and cleaning it always critical for success. The first step is to understand the data using exploratory data analysis (EDA)as it helps us crea…

vue-cli搭建项目的目录结构及说明

vue-cli基于webpack搭建项目的目录结构 build文件夹 ├── build // 项目构建的(webpack)相关代码 │ ├── build.js // 生产环境构建代码(在npm run build的时候会用到这个文件夹)│ ├── check-versions.js // 检查node&am…

391. 完美矩形

391. 完美矩形 给你一个数组 rectangles ,其中 rectangles[i] [xi, yi, ai, bi] 表示一个坐标轴平行的矩形。这个矩形的左下顶点是 (xi, yi) ,右上顶点是 (ai, bi) 。 如果所有矩形一起精确覆盖了某个矩形区域,则返回 true ;否…

bigquery 教程_bigquery挑战实验室教程从数据中获取见解

bigquery 教程This medium article focusses on the detailed walkthrough of the steps I took to solve the challenge lab of the Insights from Data with BigQuery Skill Badge on the Google Cloud Platform (Qwiklabs). I got access to this lab in the Google Cloud R…

学习linux系统到底有没捷径?

2019独角兽企业重金招聘Python工程师标准>>> 说起linux操作系,可能对于很多不了解的人来说,第一个想到的就是类似于黑客帝国中的黑框框以及一串串不知所云的代码,总之这些感觉都可以总结成为一个字,那就是——酷&#…

wxpython实现界面跳转

wxPython实现Frame之间的跳转/更新的一种方法 wxPython是Python中重要的GUI框架,下面通过自己的方法实现模拟类似PC版微信登录,并跳转到主界面(朋友圈)的流程。 (一)项目目录 【说明】 icon : 保存项目使用…

java职业技能了解精通_如何通过精通数字分析来提升职业生涯的发展,第8部分...

java职业技能了解精通Continuing from the seventh article in this series, we are going to explore ways to present data. Over the past few years, Marketing and SEO field has become more data-driven than in the past thanks to tools like Google Webmaster Tools …

kfc流程管理炸薯条几秒_炸薯条成为数据科学的最后前沿

kfc流程管理炸薯条几秒In February, our Data Science team had an argument about which restaurant we went to made the best French Fry.2月,我们的数据科学团队对我们去哪家餐厅做得最好的炸薯条产生了争议。 We decided to make it a competition throughout…

bigquery_到Google bigquery的sql查询模板,它将您的报告提升到另一个层次

bigqueryIn this post, we’re sharing report templates that you can build with SQL queries to Google BigQuery data.在本文中,我们将分享您可以使用SQL查询为Google BigQuery数据构建的报告模板。 First, you’ll find out about what you can calculate wit…

分类树/装袋法/随机森林算法的R语言实现

原文首发于简书于[2018.06.12] 本文是我自己动手用R语言写的实现分类树的代码,以及在此基础上写的袋装法(bagging)和随机森林(random forest)的算法实现。全文的结构是: 分类树 基本知识predginisplitrules…