rcp rapido_Rapido使用数据改善乘车调度

rcp rapido

Given our last blog post of the series, which can be found here :

鉴于我们在该系列中的最后一篇博客文章,可以在这里找到:

We thought it would be helpful to explain how we implemented all of the above into an on-ground experiment. We mentioned above about how the lack of a logical time-based control group forced us to pivot to geo-temporal control formation. I would like to take this opportunity to talk about an experiment we ran as part of the Dispatch team @ Rapido.

我们认为将上述所有内容如何实施到地面实验中会有所帮助。 上面我们提到了缺乏基于时间的逻辑控制组如何迫使我们转向地时控制结构。 我想借此机会谈论作为Dispatch团队@ Rapido的一部分进行的一项实验。

什么是乘车调度? (What is a Ride Dispatch?)

The system that decides which order request (when you tap the Request Rapido button, aka the Book my Ride button, on your app) should be sent to which particular Captain(s) to ensure that the Captain reaches the customer in the quickest and most efficient way possible, is called ‘Dispatch’. It is an homage to the days of old when Taxi services were run over the telephone and a Customer who had called in for a pickup would be patched through to an Agent who would find a willing cabbie (often after multiple calls) and that driver was “dispatched” for that order.

决定哪个订单请求(当您点击应用上的Request Rapido按钮, 也就是“预订我的乘车”按钮时 )的系统应该发送给哪个特定船长,以确保船长以最快,最快捷的方式到达客户高效的方法称为“调度”。 这是对过去的日子的敬意,当时出租车服务是通过电话运行的,而要求接机的客户会被派遣到一个代理商,该代理商会找到愿意的出租车司机(通常是在多次打电话之后),而那个司机是“派遣”该订单。

Dispatch is one of the key levers of a ride-hailing marketplace. It is one of those systems that EVERY ride request has to propagate through, hence the room for error is low, with the stakes being very high.

调度是乘车市场的关键杠杆之一。 它是每个乘车请求都必须传播的系统之一,因此错误空间很小,风险很高。

One of the first questions we had to answer while even thinking of a product to build was, “What metrics do we look at to see if marketplace conditions are being improved”? Is the ETA the gold metric for this system, or do we look at other things like Matching Time, Distance Driven by the captain to get to the customer, and cancellations from both the demand and supply sides? We definitely had to be cognizant of these metrics while evaluating any changes to our system.

我们甚至在考虑要生产的产品时,必须回答的第一个问题是:“ 我们看什么指标才能确定市场条件是否正在改善 ”? ETA是该系统的黄金指标,还是我们要考虑其他方面,例如比赛时间,由船长驾驶到达客户的距离以及需求方和供应方的取消? 在评估我们系统的任何更改时,我们绝对必须意识到这些指标。

在我们开始重建它之前,Dispatch @ Rapido是什么样的? (What was Dispatch @ Rapido like before we started rebuilding it?)

Going into the rebuilding process, the current dispatch system was a simple radial system, where a customer requests a ride on the app, and the system draws a circle of radius say 2 km, and looks at all the captains in that area, calculates the crow-flying distance to the customer, and propagates the ping in order.

进入重建过程,当前的调度系统是一个简单的放射状系统,客户请求在应用程序上乘车,该系统绘制一个半径为2 km的圆并查看该区域中的所有机长,乌鸦飞到客户的距离,并按顺序传播ping。

As a first solution, this is fine, but discerning data enthusiasts can probably find many issues with this system — how to design the optimal radius, what happens if there is a huge divider like a ring-road or a railway crossing that results in a short euclidean distance but long route based distance. In the latter case, this would be categorized as a sub-optimal match, as now the captain has to spend more time driving empty kilometers to reach the customer, and the customer gets frustrated about being matched to a captain who looks close by but takes twice the time to reach the pickup location.

作为第一个解决方案,这很好,但是有眼光的数据爱好者可能会发现此系统存在许多问题-如何设计最佳半径,如果存在像环形公路或铁路交叉路口这样的巨大分隔线而导致行驶速度变慢,会发生什么情况?欧几里得距离短,但基于路径的距离长。 在后一种情况下,这将归类为次优比赛 ,因为现在船长不得不花更多时间驾驶空旷的里程才能到达客户,并且客户对与看上去很近但是却要走近的船长感到沮丧到达取件地点的时间两次。

This specific use case can be reduced to a higher-level question: for a given pickup location, is there a corresponding nearby area that should be geo-fenced when considering it to be a part of the “dispatch radius”?

可以将这个特定的用例简化为一个更高层次的问题:对于给定的取货地点,在将其视为“派发半径”的一部分时,是否应该对相应的附近区域进行地理围栏?

Furthermore, is there a location that is potentially further away in a euclidean sense, but closer by in terms of driving time?

此外,是否存在一个可能在欧几里得距离更远但在行车时间更近的位置?

通过支付maps API可以缓解这个问题吗? (Won’t this problem be alleviated by paying for a maps API?)

Too expensive at a per-request level. Right now, even though we are at 20% of our pre-COVID levels (and recovering every week!), servicing each request via google-maps API would be prohibitively expensive for a growing startup like Rapido, especially in these times where innovation is warranted. The goal was to deploy a smart solution, without breaking the bank, that would still have a high impact on the ground.

在每个请求级别上太贵了。 现在,即使我们的使用率达到了COVID认证前的水平的20%(并且每周都在恢复!),对于像Rapido这样的新兴创业公司而言,通过google-maps API服务每个请求的费用实在是太高了,尤其是在这些创新的时代保证。 我们的目标是在不中断资金的情况下部署智能解决方案,这仍然会对现场产生重大影响。

建立行车时间估算 (Building the driving time estimates)

The most crucial component of a smart Dispatch system is having reliable driving time estimates. This is essentially built by leveraging the huge store of data available to us from our historical rides. As part of our internal logging, we record the time taken from :

智能调度系统最关键的部分是可靠的行驶时间估算。 这本质上是通过利用我们过去的经验为我们提供的大量数据来构建的。 作为内部日志记录的一部分,我们记录以下时间:

  1. The captain to the customer aka the ETA

    客户的船长又称ETA

  2. The customer’s pickup to the customer’s drop aka the Ridetime

    顾客接送顾客的乘车时间

Each part of this gives us more coverage within a city in terms of pickup-to-drop driving times. The ETA gives us short-distance coverage, and the Ridetime gives us longer-distance coverage. We combine the two sources of data and group-by at a time-of-day and a day-of-week level, remove outliers, add a few filters for the minimum amount of rides being done in that bucket to be considered valid, and store the output in a dataset to be consumed by any concerned team.

从接送车的时间来看,每个部分都为我们提供了更多城市覆盖范围。 ETA给我们短距离覆盖,而Ridetime给我们长距离覆盖。 我们将两种数据来源结合在一起,并按一天 中的某天一周中某天进行分组,删除异常值,添加一些过滤器以使在该存储分区中执行的最少乘车次数被视为有效,并将输出存储在数据集中以供任何相关团队使用。

设计实验 (Designing the experiment)

Once we have a pickup-to-drop driving time map, at a time-of-day and day-of-week level, we now get to the dirty work of actually designing an experiment. The first step was to answer the question of, “for a pickup location, can we find a close-by area that has a worse driving time to the source than a further away area”. I will use this segue to introduce some of the terminologies we use in this regard :

一旦有了一天中和一周中某天的上下班驾驶时间图,我们就可以开始实际设计实验的工作了。 第一步是回答以下问题:“对于接送地点,我们能找到距离源头行驶时间比远离区域更差的附近区域”的问题。 我将使用这种方法来介绍我们在这方面使用的一些术语:

source_hex : the Uber h3 derived hex8 in which the ride request originates

source_hex :Uber h3派生的hex8,乘车请求起源于此

bad_hex : the Uber h3 derived hex8, which is closer to the source_hex geometrically, but not while driving

bad_hex :Uber h3派生的hex8,在几何上更接近source_hex,但在行驶时不是

good_hex : the Uber h3 derived hex8, which is further away from the source_hex geometrically, but has a faster driving time than the bad_hex

good_hex :Uber h3派生的hex8,在几何上距离source_hex较远,但是驱动时间比bad_hex快

We do this analysis at a time_of_day and day_of_week level, so a trio of HexA HexB and HexC could be mapped as : Source_hex -> HexA, Bad_hex -> HexB, Good_hex -> HexC on a Monday morning, but on a Sunday evening, it is not necessary that HexB and HexC’s relative driving times to HexA are the same. We were cognizant to not make too many dangerous assumptions here.

我们在time_of_day和day_of_week级别进行此分析,因此可以将HexA HexB和HexC的三个映射为: Source_hex-> HexA,Bad_hex-> HexB,Good_hex-> HexC在星期一的早晨,但是在周日的晚上, HexB和HexC相对于HexA的相对行驶时间不必相同。 我们意识到在这里不要做太多危险的假设。

Image for post
An example of a source_hex, bad_hex and good_hex
source_hex,bad_hex和good_hex的示例

Here the brown hex is the source_hex, the yellow hex is the bad-level-1-hex and the orange one is the good-level-2-hex. Now, from this map, it is not clear what is the reason for the increase in ridetime from yellow to brown as opposed to orange to brown. But when we look at the google maps view it becomes evident :

这里棕色的十六进制是source_hex,黄色的十六进制是坏1进制,而橙色的是好2进制。 现在,从这张地图上,不清楚从黄色到棕色而不是橙色到棕色的行驶时间增加的原因是什么。 但是,当我们查看谷歌地图视图时,它变得显而易见:

Image for post

We see that the brown and orange hex8s are bifurcated by a huge railway track ( Vijayawada is one of the biggest railway junctions in the country and regularly reports trains crossing road tracks ). On the other hand, the orange hex8 has clear unfettered access to the source_hex.

我们看到棕色和橙色的hex8s被一条巨大的铁路轨道分叉(维贾亚瓦达(Vijayawada)是该国最大的铁路枢纽之一,并定期报告火车穿越道路)。 另一方面,橙色的hex8可以不受限制地访问source_hex。

Once we have the universe of such hex trios, we are back to the problem or how to do a test-control split. Given that time-based control is not an option, we tried to use features ( relevant to dispatch) of each hex-trio and passed it through a vector-similarity measure to calculate the similarity scores of each pair restricted by both of them having the same day and time at which the hex-trio is valid ( aka both test and control source hexes have bad and good hexes on the same day and time period ).

一旦有了这样的十六进制三重奏的宇宙,我们就回到了问题或如何进行测试控制拆分。 鉴于基于时间的控制不是一种选择,我们尝试使用每个十六进制三元组的功能(与分派有关),并将其通过向量相似度度量传递,以计算受其限制的每对相似度得分。十六进制三重奏有效的同一天和同一时间(又称测试和对照源十六进制在同一天和同一时间段内有坏和好十六进制)。

Image for post
Example of a test group测试组示例
Image for post
Example of a control group
对照组的例子

It doesn’t make a lot of sense to say HexA on a Monday morning is similar to HexB on a Wednesday afternoon. So we only do the split if HexA and HexB are both source_hexes on the same day and time period.

在星期一早上说HexA与在星期三下午说HexB并没有多大意义。 因此,仅当HexA和HexB均为同一日期和时间段的source_hexes时,才进行拆分。

Once the above is done for each pair in the universe, we start building the test control split to ensure that no hex in the test group is also in the control group through some other mapping, as this would contaminate the experiment results.

一旦对Universe中的每个对完成上述操作,我们便开始构建测试控件组,以确保通过其他映射,测试组中的十六进制也不会出现在对照组中,因为这会污染实验结果。

Given that we now have our test-control split, the measure we take is that the test-group source_hexes will have the good_hex included and bad_hex excluded when creating the “dispatch radius”, whereas the control-group will not have the bad_hex excluded and good_hex included. Given that everything else remains the same, the test group should show a reduced ETA compared to the control group post-experiment.

鉴于我们现在已经进行了测试控制拆分,因此我们采取的措施是,在创建“分发半径”时,测试组source_hexes将包含good_hex,而bad_hex将被排除,而控制组将不排除bad_hex,并且包括good_hex。 考虑到其他所有条件均保持不变,与实验后的对照组相比,测试组的ETA应当降低。

We then ran this experiment for 2 weeks and tried to get 1000+ orders cumulatively in both the test group and the control group, so we don’t suffer from data-sparsity while analysing what happened.

然后,我们进行了2周的实验,并试图在测试组和对照组中累计获得1000多个订单,因此在分析发生的情况时,我们不会遭受数据稀疏的困扰。

实验结果 (Experiment Results)

We ran this experiment in Hyderabad where we saw an ETA reduction in test group vs control group of around 9% when comparing the Median ETAs and almost 13% when comparing Mean ETAs. Pre experiment the test and control groups had a difference of only about 3% when looking at both Mean and Median ETAs, thus showing us that the changes we made actually added value to on-ground ETAs.

我们在海得拉巴进行了该实验,与中位数ETA相比,测试组与对照组的ETA降低了约9% ,而平均ETA则降低了近13% 。 实验前,测试组和对照组在平均和中位数ETA上的差异仅为3% ,因此向我们表明,所做的更改实际上为地面ETA增值。

We know that no experiment can be called successful without statistical tests of significance, so we went into the experiment having defined our hypothesis as follows:

我们知道,没有显着性统计检验就不能说成功就是实验,因此我们按照以下假设进行假设的实验:

H0 ( Null Hypothesis ) : Hex based swaps have no effect on realized ETAs

H0(零假设) :基于十六进制的交换对已实现的ETA无效

H1 ( Alternate Hypothesis ) : Hex based swaps DO have an effect on realized ETAs

H1(备用假设) :基于十六进制的互换确实会影响已实现的ETA

Rejecting the null hypothesis at a significance level of above 95% is the gold standard that we were striving for, and we are happy to report that we achieved statistically significant results at a level of around 98%, with p-values in the 0.01 range when using a few statistical tests of significance.

我们一直追求的金标准是拒绝高于95%的显着性水平的零假设,并且我们很高兴地报告我们在98%左右的水平上取得了统计学上显着的结果,p值在0.01范围内在使用一些有意义的统计检验时。

When viewed visually, what we got was something similar to this :

从视觉上看,我们得到的类似于以下内容:

Image for post
Test vs Control group change visualized -> Blue vertical line represents the mean of the test-group, and the Purple vertical line represents the mean of the control-group
测试组与对照组的变化可视化->蓝色竖线表示测试组的平均值,紫色竖线表示对照组的平均值

What this image is telling us, is that when viewed on a relative scale and after adjusting for pre-experiment ETA delta, we have shifted the center of the test group ETA distribution towards the lower side when compared to the control ETA group, thus showing that our changes have made an impact in lowering ETAs as we expected.

这张图片告诉我们的是,以相对比例查看并调整了实验前的ETA增量后,与对照组ETA组相比,我们已将测试组ETA分布的中心向下方移动,我们的更改对降低ETA产生了预期的影响。

结论 (Conclusion)

The high-level goal as mentioned at the start was to improve a key aspect of dispatch: ETA. We wanted to add a good amount of value by doing something that was not cost-intensive, rather by doing something that leveraged the technology and information we already had. This is the hallmark of any data-science team, to use common sense and best practices to uncover hidden insights using as simple an approach as possible.

一开始提到的高级目标是改进调度的一个关键方面:ETA。 我们想通过做一些不耗费成本的事情,而是通过利用我们已经拥有的技术和信息,来增加很多价值。 这是任何数据科学团队的标志,可以使用常识和最佳实践以尽可能简单的方法来发现隐藏的见解。

If you enjoyed this blog post, check out what we’ve posted so far over here, and keep an eye out on the same space for some really cool upcoming blogs in the near future. If you have any questions about the problems we face as Data Scientists at Rapido, about transitioning to a start-up after a few years in a different field, or about anything else, please reach out to me on LinkedIn or on siddharth.p@rapido.bike, I look forward to answering any questions!

如果您喜欢这篇博客文章,请查看我们到目前为止在这里发布的内容,并在不久的将来留意相同的空间来关注一些即将发布的非常酷的博客。 如果您对我们在Rapido担任数据科学家时遇到的问题,在其他领域工作几年后要过渡到初创公司或其他任何问题有任何疑问,请通过LinkedInsiddharth.p @与我联系。 Rapido.bike ,我期待回答任何问题!

翻译自: https://medium.com/rapido-labs/improving-dispatch-with-data-6a307dab7ecc

rcp rapido

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/391820.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

LeetCode 695. Max Area of Island javascript解决方案

题意: 寻找最大岛。leetcode.com/problems/ma… 传入: [[0,0,1,0,0,0,0,1,0,0,0,0,0], [0,0,0,0,0,0,0,1,1,1,0,0,0], [0,1,1,0,1,0,0,0,0,0,0,0,0], [0,1,0,0,1,1,0,0,1,0,1,0,0], [0,1,0,0,1,1,0,0,1,1,1,0,0], [0,0,0,0,0,0,0,0,0,0,1,0,0], [0…

Mybatis—代理开发和核心配置文件深入

代理开发方式介绍 采用 Mybatis 的代理开发方式实现 DAO 层的开发,这种方式是我们后面进入企业的主流。 Mapper 接口开发方法只需要程序员编写Mapper 接口(相当于Dao 接口),由Mybatis 框架根据接口定义创建接口的动态代理对象&a…

mysql 位操作支持

mysql 支持位操作。 & 位与 | 位或 例如:update car_ins_fee_entity set change_status(change_status | 1) where id12356转载于:https://www.cnblogs.com/sign-ptk/p/7278225.html

SSRS:之为用户“NT AUTHORITY\NETWORK SERVICE”授予的权限不足,无法执行此操作。 (rsAccessDenied)...

错误信息:为用户“NT AUTHORITY\NETWORK SERVICE”授予的权限不足,无法执行此操作。 (rsAccessDenied)如图:解决方案之检查顺序:1.检查报表的执行服务帐户。使用“ Reporting Services 配置管理器”。2.检查数据库安全 - 登录名 中…

javascript函数式_如何以及为什么在现代JavaScript中使用函数式编程

javascript函数式by PALAKOLLU SRI MANIKANTA通过PALAKOLLU SRI MANIKANTA In this article, you will get a deep understanding of functional programming and its benefits.在本文中,您将对函数式编程及其好处有深入的了解。 函数式编程简介 (Introduction To…

飞机上的氧气面罩有什么用_第2部分—另一个面罩检测器……(

飞机上的氧气面罩有什么用This article is part of a series where I will be documenting my journey on the development of a social distancing feedback system for the blind as part of the OpenCV Spatial Competition. Check out the full series: Part 1, Part 2.本文…

Laravel 5 4 实现前后台登录

在官网下载 Laravel 5.4 配置并能在访问 php artisan make:auth 复制代码生成后路由文件 routes/web.php ,自动有 Auth::routes();Route::get(/home, HomeControllerindex); 复制代码运行 php artisan migrate 复制代码执行命令后会生成 users 表和 password_resets 表&#xf…

leetcode 561. 数组拆分 I(排序)

给定长度为 2n 的整数数组 nums ,你的任务是将这些数分成 n 对, 例如 (a1, b1), (a2, b2), …, (an, bn) ,使得从 1 到 n 的 min(ai, bi) 总和最大。 返回该 最大总和 。 示例 1: 输入:nums [1,4,3,2] 输出:4 解释…

经典网络流题目模板(P3376 + P2756 + P3381 : 最大流 + 二分图匹配 + 最小费用最大流)...

题目来源 P3376 【模板】网络最大流P2756 飞行员配对方案问题P3381 【模板】最小费用最大流最大流 最大流问题是网络流的经典类型之一,用处广泛,个人认为网络流问题最具特点的操作就是建反向边,这样相当于给了反悔的机会,不断地求…

Tensorflow笔记(基础): 图与会话,变量

图与会话 import tensorflow as tf import os# 取消打印 cpu,gpu选择等的各种警告 # 设置TF_CPP_MIN_LOG_LEVEL 的等级,1.1.0以后设置2后 只不显示警告,之前需要设置3,但设置3不利于调试 os.environ[TF_CPP_MIN_LOG_LEVEL] 2 import time# 创建一个常量 op, 产生一个 1x2 矩阵…

css左右布局代码_如何使用CSS位置来布局网站(带有示例代码)

css左右布局代码Using CSS position to layout elements on your website can be hard to figure out. What’s the difference between absolute, relative, fixed, and sticky? It can get confusing pretty quickly.使用CSS位置来布局网站上的元素可能很困难。 绝对&#x…

redis memcached MongoDB

我们现在使用的模式是,对于直接的key value对需缓存的直接用memcached。对于collection类型就使用Redis。对于大数据量的内容性的东西,我们打算尝试用mongoDB。也正在学习neo4j,来应对深度搜索,推荐功能。 1.Memcached单个key-val…

线性代数-矩阵-转置 C和C++的实现

原理解析: 本节介绍矩阵的转置。矩阵的转置即将矩阵的行和列元素调换,即原来第二行第一列(用C21表示,后同)与第一行第二列(C12)元素调换位置,原来c31与C13调换。即cij与cji调换 。 &…

数字经济的核心是对大数据_大数据崛起为数字世界的核心润滑剂

数字经济的核心是对大数据“Information is the oil of the 21st century, and analytics is the combustion engine”.“信息是21世纪的石油,分析是内燃机”。 — Peter Sondergaard, Senior Vice President of Gartner Research.— Gartner研究部高级副总裁Peter…

乞力马扎罗山 海明威_我如何对海明威编辑器(一种流行的写作应用程序)进行反向工程,并从泰国的海滩上构建了自己的数据库

乞力马扎罗山 海明威I’ve been using the Hemingway App to try to improve my posts. At the same time I’ve been trying to find ideas for small projects. I came up with the idea of integrating a Hemingway style editor into a markdown editor. So I needed to fi…

leetcode 566. 重塑矩阵

在MATLAB中,有一个非常有用的函数 reshape,它可以将一个矩阵重塑为另一个大小不同的新矩阵,但保留其原始数据。 给出一个由二维数组表示的矩阵,以及两个正整数r和c,分别表示想要的重构的矩阵的行数和列数。 重构后的…

制作简单的WIFI干扰器

原教程链接:http://www.freebuf.com/geek/133161.htmlgithub 1.准备材料 制作需要的材料有 nodemcu开发版IIC通信 128*64 OLED液晶屏电线按钮开关万能板排针(自选)双面胶(自选)参考2.准备焊接 引脚焊接参考 oled按钮效果3.刷入固件 下载烧录工具:ESP8266Flasher.exe 下载固件:…

Snipaste截图

绘图绘色,描述加图片能更加说明问题的本质。今天推荐一款多功能的截图snipaste... 欣赏绘色 常见报错 解决方案: 下载相关的DLL即可解决, 请根据你操作系统的版本(32位/64位),下载并安装相应的微软 Visual …

azure第一个月_MLOps:两个Azure管道的故事

azure第一个月Luuk van der Velden and Rik Jongerius卢克范德费尔登(Luuk van der Velden)和里克 琼格里乌斯( Rik Jongerius) 目标 (Goal) MLOps seeks to deliver fresh and reliable AI products through continuous integration, continuous training and continuous del…

firebase auth_如何使用auth和实时数据库构建Firebase Angular应用

firebase authby Zdravko Kolev通过Zdravko Kolev 如何使用auth和实时数据库构建Firebase Angular应用 (How to build a Firebase Angular app with auth and a real-time database) For a long time, I was looking for a good Portfolio web app that can help me to easily…