数据分布策略_有效数据项目的三种策略

数据分布策略

Many data science projects do not go into production, why is that? There is no doubt in my mind that data science is an efficient tool with impressive performances. However, a successful data project is also about effectiveness: doing the right things as Russell Ackoff would write in “A systemic view of transformational leadership”.

许多数据科学项目没有投入生产 ,为什么呢? 毫无疑问,数据科学是一种具有出色性能的有效工具。 但是,一个成功的数据项目也与有效性有关:如罗素·阿科夫(Russell Ackoff)在“ 变革型领导的系统观点 ”中所写, 做正确的事

Successful problem solving requires finding the right solution to the right problem. We fail more often because we solve the wrong problem than because we get the wrong solution to the right problem — Russell L. Ackoff (1974)

成功的问题解决需要找到正确问题的正确解决方案。 我们失败的原因更多是因为我们解决了错误的问题,而不是因为我们没有解决正确的问题— Russell L. Ackoff(1974)

How do you focus on your projects and make sure they will bring value to the company? Are you strategically thinking about how to bring your project to fruition?

您如何专注于您的项目并确保它们将为公司带来价值? 您是否在战略上考虑如何使您的项目实现?

NB: I will use golf — a strategic sport — as an illustrative analogy here.

注意:在这里,我将使用高尔夫这一具有战略意义的运动作为比喻。

OKR:设定您致力于实现的目标 (OKR: Setting objectives that you commit to achieve)

Objectives and Key Results (OKR) have been adopted in successful organisations to drive tremendous growth (Intel, Google, …). They were initially introduced by John Doerr to increase focus that produces value.

目标和关键结果(OKR)已通过 成功的组织来推动巨大的增长(英特尔,谷歌等)。 它们最初是由约翰·多尔(John Doerr)引入的,以增加对创造价值的关注。

The general idea is to set Objectives that motivate you. Imagine you are passionate about golf and next Friday there is a big competition. In the last few years, nobody won it performing well on more than 15 holes out of the 18 on the course. Setting yourself to win it is a good objective — it is specific, ambitious, and happens at a given time. You then set Key Results that can measure how you are doing on this objective. In this golf example, they could be:

一般的想法是设定激励您的目标 。 想象一下您对高尔夫充满热情,下周五将进行一场激烈的比赛。 在过去的几年中,没有人赢得过比赛中18个洞中超过15个洞的出色表现。 让自己赢得胜利是一个很好的目标-这是特定的,雄心勃勃的,并且在特定时间发生 。 然后,您可以设置关键结果 ,以衡量您在此目标上的表现 在这个高尔夫示例中,它们可能是:

  • Hit a par (ideal number of shots to get into a hole) on at least 16 out of the 18 holes.

    在18个洞中的至少16个洞上击出标准杆(理想的开枪次数)。
  • Avoid dropping the ball in a sand trap more than three times — because you know that you are bad at getting out of them.

    避免将球掉入沙坑中超过3次-因为您知道自己很难摆脱掉它们。
  • Go for a 20 min practice session before the competition — as you usually make a few bad shots with cold muscles.

    比赛前进行20分钟的练习-因为您通常会因肌肉冰冷而做出一些不好的动作。

Checking all the key results are then a good indicator that you could win.

检查所有关键结果便是您可能会获胜的良好指示。

In another scenario, working for a large bank, picture you are tasked to build a loan risk model with 80% accuracy. Here are some possible key results:

在另一种情况下(为一家大型银行工作),您需要负责建立准确性为80%的贷款风险模型。 以下是一些可能的关键结果:

  • Get 80% client repayment behaviour data by XX/YY/ZZZZ.

    通过XX / YY / ZZZZ获取80%的客户还款行为数据。
  • Test three explainable model types by AA/BB/CCCC.

    通过AA / BB / CCCC测试三种可解释的模型类型。
  • Define and track four metrics to follow the model’s performances and understand where the model is wrong.

    定义并跟踪四个指标,以跟踪模型的性能并了解该模型在哪里出错。

OKRs can be used to drive focus on anything. I find them useful to define my goals on a project: building a model or an application, when will it be good enough? Aiming for the key results brings clarity. Failing becomes a learning experience that stimulates better OKRs definitions and work. On the other hand, success is then crystal clear, and you should enjoy it.

OKR可用于推动对任何事物的关注。 我发现它们对于定义项目目标很有用:建立模型或应用程序,什么时候足够好? 瞄准关键结果会带来清晰度。 失败成为一种学习经验,可以激发更好的OKR定义和工作。 另一方面,成功是显而易见的,您应该享受成功。

Must read on the topic: Measure what matters by John Doerr.

必须阅读以下主题: 衡量 约翰·杜尔的重要性。

传动系统方法 (The Drivetrain Approach)

A drivetrain approach is a comprehensive strategy to data products definition. The following diagram shows its essential steps:

传动系统方法是数据产品定义的综合策略。 下图显示了其基本步骤:

Image for post
iStock under license to M. Koutero.iStock的元素已获M. Koutero许可。

In a new project we might ask ourselves:

在一个新项目中,我们可能会问自己:

  • Objectives

    目标

Setting objectives includes answering questions such as: Does it add value to the business? Is it aligned with the current roadmap? When should it be done? Is it opening new perspectives?

设定目标包括回答以下问题:是否能为企业增加价值? 它与当前路线图一致吗? 什么时候应该做? 它开辟了新的视角吗?

  • Levers

    杠杆

What elements in the final product are under my control? Can I change the price of the product? The ranking on the recommendation page? …

我可以控制最终产品中的哪些元素? 我可以更改产品的价格吗? 推荐页面上的排名? …

  • Data

    数据

Given objectives and levers, what kind of data could I use? What are the compliance issues?

给定目标和杠杆,我可以使用哪种数据? 有哪些合规性问题?

  • Model / Simulation

    模型/模拟

Simulations should indicate if there is enough information in your data combined with your levers to get to your objectives. Could you drive more sales with fewer risks in the loan model example?

模拟应表明您的数据中是否有足够的信息与您的杠杆相结合以实现目标。 在贷款模型示例中,您能否以更少的风险推动更多的销售?

Every step is also an exit point. If you can’t find a solution alone or collectively, it might be an indication that it is not worth your time and should focus on something else.

每一步也是一个出口点。 如果您不能单独或集体找到解决方案,则可能表明它不值得您花时间,而应专注于其他方面。

Must read on the topic: Designing great data products by Jeremy Howard, Margit Zwemer and Mike Loukides.

必须阅读的题目是: 设计大数据产品 杰里米·霍华德 玛吉特池维谋 麦克Loukides

决策智能 (Decision intelligence)

Decision intelligence is a more general discipline that tackles how to build a strategy given objectives in complex situations. The general process integrates notions such as external causes, multiple causal links, and feedback loops. Teams creating causal diagrams can then rationally decide upon a strategy with a clear perception of the problem at hand. One might understand decision intelligence as an extended merger between OKRs and the drive train approach.

决策智能是一门比较通用的学科,致力于解决复杂情况下给定目标的战略制定方法。 常规过程集成了诸如外部原因,多个因果链接和反馈循环之类的概念。 然后,创建因果图的团队可以合理地决定策略,并清楚地了解当前的问题。 人们可能将决策智能理解为OKR与动力传动系统方法之间的扩展合并。

Image for post
iStock under license to M. Koutero.iStock的元素,并已获得M. Koutero的许可。

In the small example above, once you select a club, whether the ball will fly high (and hopefully far) or stay rolling on the ground means the wind is more or less likely to affect. Staying on the ground might be safer, but making only small shots, you will need more of them. Having a good strategy means you will find a reasonable equilibrium to achieve your objectives and goals.

在上面的小示例中,一旦选择了一个球杆,球会飞高(并希望远飞)还是保持在地面上滚动,这意味着或多或少会影响风。 留在地面上可能会更安全,但只拍摄一点,您将需要更多。 拥有良好的策略意味着您将找到一个合理的平衡点来实现自己的目标。

In the OKR example about the risk loan model, we would make here deeper inquiries. Would having a loan model that makes mistakes on certain types of customers be a hazard on equity? Is it possible that employees in charge of validating loans would only rely on the model, become less critical thinkers and be less likely to adjust their behaviour when delicate cases occur? Causal diagrams enable you to understand the indirect consequences of your decisions. If you consider that getting the right clean data and building a model ready for production is often a task that takes months, is it not worth spending some time on the reasons you are doing it?

在有关风险贷款模型的OKR示例中,我们将在这里进行更深入的查询。 具有在某些类型的客户上犯错误的贷款模型会危害股本吗? 负责发生贷款问题的员工是否可能仅依靠模型,变得不那么批判性的思想家并且在发生细微情况时不太可能调整其行为? 因果图使您能够理解决策的间接后果。 如果您认为获取正确的干净数据并为生产做好准备的模型构建通常需要花费数月的时间,难道不应该花一些时间在做这些事情的原因上吗?

For engineers and scientists, it is not extremely different from specifying a classical digital product with its constraints and target performances but broadening the perspective. What is interesting to me, is the focus on the decision making (“should I build this product and how?”) putting both business and technical people together to make sure that at the scale of a whole ecosystem, the next move is the right one.

对于工程师和科学家而言,它与指定具有约束条件和目标性能的经典数字产品并没有什么不同,但是拓宽了视野。 对我而言,有趣的是将重点放在决策上(“我应该制造这种产品以及如何制造吗?”),将业务人员和技术人员放在一起,以确保在整个生态系统的规模上,下一步行动是正确的之一。

Must read on the topic: Link by Lorien Pratt

必须阅读以下主题: Lorien Pratt的 链接

Strategy is not limited to a top/down practice falling under the umbrella of leaders, managers, product managers, etc. I think it is part of any job to meet halfway and have some strategical thinking under the hood. Maybe these frameworks are sometimes too elaborated, but at its core, they start with a simple question that we can ask ourselves: why should I do this project?

战略不仅限于领导,经理,产品经理等领导下的自上而下的实践。我认为,中途开会并有一些战略思想是任何工作的一部分。 也许这些框架有时过于复杂,但从根本上讲,它们以一个简单的问题开始,我们可以问自己:我为什么要进行这个项目?

As a field, growing beyond the AI hype, we cannot stay in an isolated system, extending our level of specialisation without clearly showing its value. Missing middle professionals are likely to be of importance in this task (Paul R. Daugherty — CTIO at Accenture and Lorien Pratt). Whether they will be decision intelligence specialists, data strategists or data product managers will be a matter of semantics and establishing new practices in the field.

作为一个超越AI炒作的领域,我们不能停留在孤立的系统中,无法在没有清楚显示其价值的情况下扩展专业化水平。 缺少中层专业人员可能对这项任务很重要( Paul R. Daugherty-埃森哲公司的CTIO和Lorien Pratt)。 他们将是决策情报专家数据战略家还是数据产品经理,将取决于语义并确定该领域的新实践。

___________

___________

References :

参考文献:

  • Ackoff, R, L: 1998, A Systemic View of Transformational Leadership (Systemic Practice and Action Research).

    Ackoff,R,L:1998年, 《变革型领导的系统观点》 (系统实践与行动研究)。

  • Ackoff, R. L.: 1974, Redesigning the Future: A Systems Approach to Societal Problems (John Wiley & Sons).

    RL,阿科夫(Ackoff),1974年,《 重新设计未来:社会问题的系统方法》 (约翰·威利父子)。

  • Doerr, J: 2018, Measure what matters: How Google, Bono, and the Gates Foundation rock the world with OKRs (Portfolio Penguin).

    Doerr,J:2018, 衡量重要的事情:Google,Bono和盖茨基金会如何利用OKR (Portfolio Penguin) 震撼整个世界

  • Pratt, L: 2019, Link (Emerald Publishing Limited).

    普拉特,L:2019, Link (Emerald Publishing Limited)。

翻译自: https://towardsdatascience.com/three-strategies-towards-effective-data-projects-eed29ad05ded

数据分布策略

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/391881.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

java基础学习——5、HashMap实现原理

一、HashMap的数据结构 数组的特点是:寻址容易,插入和删除困难;而链表的特点是:寻址困难,插入和删除容易。那么我们能不能综合两者的特性,做出一种寻址容易,插入删除也容易的数据结构&#xff1…

看懂nfl定理需要什么知识_NFL球队为什么不经常通过?

看懂nfl定理需要什么知识Debunking common NFL myths in an analytical study on the true value of passing the ball在关于传球真实价值的分析研究中揭穿NFL常见神话 Background背景 Analytics are not used enough in the NFL. In a league with an abundance of money, i…

29/07/2010 sunrise

** .. We can only appreciate the miracle of a sunrise if we have waited in the darkness .. 人们在黑暗中等待着,那是期盼着如同日出般的神迹出现 .. 附:27/07/2010 sunrise ** --- 31 July 改动转载于:https://www.cnblogs.com/orderedchaos/archi…

密度聚类dbscan_DBSCAN —基于密度的聚类方法的演练

密度聚类dbscanThe idea of having newer algorithms come into the picture doesn’t make the older ones ‘completely redundant’. British statistician, George E. P. Box had once quoted that, “All models are wrong, but some are useful”, meaning that no model…

嵌套路由

父组件不能用精准匹配,否则只组件路由无法展示 转载于:https://www.cnblogs.com/dianzan/p/11308146.html

从完整的新手到通过TensorFlow开发人员证书考试

I recently graduated with a bachelor’s degree in Civil Engineering and was all set to start with a Master’s degree in Transportation Engineering this fall. Unfortunately, my plans got pushed to the winter term because of COVID-19. So as of January this y…

【转】PHP面试题总结

PHP面试总结 PHP基础1:变量的传值与引用。 2:变量的类型转换和判断类型方法。 3:php运算符优先级,一般是写出运算符的运算结果。 4:PHP中函数传参,闭包,判断输出的echo,print是不是函…

移动平均线ma分析_使用动态移动平均线构建交互式库存量和价格分析图

移动平均线ma分析I decided to code out my own stock tracking chart despite a wide array of freely available tools that serve the same purpose. Why? Knowledge gain, it’s fun, and because I recognize that a simple project can generate many new ideas. Even t…

静态变数和非静态变数_统计资料:了解变数

静态变数和非静态变数Statistics 101: Understanding the different type of variables.统计101:了解变量的不同类型。 As we enter the latter part of the year 2020, it is safe to say that companies utilize data to assist in making business decisions. F…

Zabbix3.2安装

一、环境 OS: CentOS7.0.1406 Zabbix版本: Zabbix-3.2 下载地址: http://repo.zabbix.com/zabbix/3.2/rhel/7/x86_64/zabbix-release-3.2-1.el7.noarch.rpm MySQL版本: 5.6.37 MySQL: http://repo.mysql.com/mysql-community-release-el7-5.noarch.r…

Warensoft Unity3D通信库使用向导4-SQL SERVER访问组件使用说明

Warensoft Unity3D通信库使用向导4-SQL SERVER访问组件使用说明 (作者:warensoft,有问题请联系warensoft163.com) 在前一节《warensoft unity3d通信库使用向导3-建立WarensoftDataService》中已经说明如何配置Warensoft Data Service,从本节开始,将说明…

不知道输入何时停止_知道何时停止

不知道输入何时停止In predictive analytics, it can be a tricky thing to know when to stop.在预测分析中,知道何时停止可能是一件棘手的事情。 Unlike many of life’s activities, there’s no definitive finishing line, after which you can say “tick, I…

掌握大数据数据分析师吗?_要掌握您的数据吗? 这就是为什么您应该关心元数据的原因...

掌握大数据数据分析师吗?Either you are a data scientist, a data engineer, or someone enthusiastic about data, understanding your data is one thing you don’t want to overlook. We usually regard data as numbers, texts, or images, but data is more than that.…

docker在Centos上的安装

Centos6安装docker 系统:centos6.5 内核:3.10.107-1(已升级),docker对RHEL/Centos的最低内核支持是2.6.32-431,epel源的docker版本推荐内核为3.10版本。 内核升级可参考:https://www.jslink.org/linux/centos-kernel-u…

Lambda表达式的前世今生

Lambda 表达式 早在 C# 1.0 时,C#中就引入了委托(delegate)类型的概念。通过使用这个类型,我们可以将函数作为参数进行传递。在某种意义上,委托可理解为一种托管的强类型的函数指针。 通常情况下,使用委托来…

matplotlib柱状图、面积图、直方图、散点图、极坐标图、箱型图

一、柱状图 1.通过obj.plot() 柱状图用bar表示,可通过obj.plot(kindbar)或者obj.plot.bar()生成;在柱状图中添加参数stackedTrue,会形成堆叠图。 fig,axes plt.subplots(2,2,figsize(10,6)) s pd.Series(np.random.randint(0,10,15),index …

微信支付商业版 结算周期_了解商业周期

微信支付商业版 结算周期Economics is an inexact science, finance and investing even more so (some would call them art). But if there’s one thing in economics that you can consistently count on over the long run, it’s the tendency of things to mean revert …

Bootstrap——可拖动模态框(Model)

还是上一个小项目,o(╥﹏╥)o,要实现点击一个div或者button或者一个东西然后可以弹出一个浮在最上面的弹框。网上找了找,发现Bootstrap的Model弹出框可以实现该功能,因此学习了一下,实现了基本弹框功能(可拖…

mfcc中的fft操作_简化音频数据:FFT,STFT和MFCC

mfcc中的fft操作What we should know about sound. Sound is produced when there’s an object that vibrates and those vibrations determine the oscillation of air molecules which creates an alternation of air pressure and this high pressure alternated with low …

PHP绘制3D图形

PEAR提供了Image_3D Package来创建3D图像。图像或光线在3D空间中按照X、Y 、Z 坐标定位。生成的图像将呈现在2D空间中,可以存储为 PNG、SVG 格式,或输出到Shell。通过Image_3D可以很方便生成一些简单的3D对象,例如立方体、锥体、球体、文本和…