When you were growing up, did you ever play the name game? The modern data organization has something similar, and it’s called the “Bad Data Blame Game.” Unlike the name game, however, the Bad Data Blame Game is played when data downtime strikes and no amount of rhyming and dancing can save the day.
当您长大时,您玩过名字游戏吗? 现代的数据组织也有类似的东西,它被称为“不良数据责备游戏”。 但是,与名称游戏不同,当数据停机时会玩坏数据责备游戏 罢工,没有任何押韵和跳舞可以挽救一天。
Data downtime refers to periods of time when your data is partial, erroneous, missing, or otherwise inaccurate, and nine times out of ten, you have no idea what caused it. All you know is that it’s 3 a.m., your CEO is pissed, your dashboards are wrong, and you need to fix it — stat.
数据停机时间是指数据不完整,错误,丢失或以其他方式不准确的时间段,十分之九,您不知道是什么原因造成的。 您所知道的是现在是凌晨3点,您的CEO生气,您的仪表板错误,您需要修复它-统计信息。
After speaking to over 200 data teams, we’ve identified the major data personas involved in the Bad Data Blame Game. Maybe you recognize one or two?
在与200多个数据团队进行了交谈之后,我们已经确定了Bad Data Blame游戏中涉及的主要数据角色。 也许您认识一两个?
In this article, we’ll introduce these roles, zero in on their hopes, dreams, and fears, and share our approach to conquering data reliability at your company.
在本文中,我们将介绍这些角色,零落他们的希望,梦想和恐惧,并分享我们在贵公司征服数据可靠性的方法。
首席数据官 (Chief Data Officer)
This is Ophelia, your Chief Data Officer (CDO). Although she’s probably not (wo)manning your company’s data pipelines or Looker dashboards, Ophelia’s impact is hitched to the consistency, accuracy, relevance, interpretability, and reliability of the data her team provides.
这是Ophelia,您的首席数据官(CDO)。 尽管Ophelia可能不负责公司的数据管道或Looker仪表板,但其团队的影响力在于一致性,准确性,相关性,可解释性和可靠性。
Ophelia wakes up every day and asks herself two things. First, are different departments getting the data they need to be effective? And second, are we managing risk around that data effectively?
奥菲莉亚每天醒来,问自己两件事。 首先,不同部门是否在获取有效数据? 其次,我们是否围绕该数据有效地管理风险?
She would sleep much easier with a clear, bird’s-eye view showing that her data ecosystem is operating as it should. At the end of the day, if bad data gets in front of the CEO, out to the public, or to any other data consumer, she’s on the line.
通过清晰的鸟瞰图可以看出她的数据生态系统正在按预期运行,从而使她的睡眠更加轻松。 归根结底,如果不良数据出现在CEO面前,向公众或其他任何数据消费者传播,那么她就可以上线了。
商业智能分析师 (Business Intelligence Analyst)
Betty, the business intelligence lead or data analyst, wants a punchy and insightful dashboard she can share with her stakeholders in marketing, sales, and operations to answer their multifarious questions about how their business functions are performing. When things go wrong at the practitioner-level, Betty is the first one paged.
商业智能主管或数据分析师Betty希望她能与市场,销售和运营部门的利益相关者共享一个强大而有见地的仪表板,以回答他们有关其业务功能如何执行的各种问题。 当从业者层面出现问题时,Betty是第一页。
To ensure reliable data, she needs to answer these questions:
为了确保数据可靠,她需要回答以下问题:
- Are we translating data into metrics and insights that are meaningful to the business? 我们是否将数据转换为对业务有意义的指标和见解?
- Are we confident that the data is reliable and means what we think it means? 我们是否有信心数据可靠并能代表我们认为的意义?
- Is it easy for others to access and understand these insights? 其他人是否容易获得和理解这些见解?
Null values and duplicated entries are Betty’s arch-nemeses and she’s a fan of anything that can prevent data downtime from compromising her peace of mind. She’s fatigued by business stakeholders that ask her to investigate a funny value in a report — it’s a long process to chase the data upstream and validate if it’s right!
空值和重复的条目是Betty的主要敌人,她是任何可以防止数据停机影响她安心的事物的支持者。 她对业务涉众感到疲倦,他们要求她调查报告中的一个有趣的值-追逐上游数据并验证是否正确是一个漫长的过程!
数据科学家 (Data Scientist)
Sam, the data scientist, studied Forestry in undergrad, but decided to make the jump to industry to pay off his student loans. Somewhere between a line of Python code and a data visualization, he fell in love with data science. And the rest was history.
数据科学家萨姆(Sam)在本科学习了林业,但是他决定跳入工业界以偿还学生贷款。 在一段Python代码和数据可视化之间的某个地方,他爱上了数据科学。 剩下的就是历史了。
To do his job well, Sam needs to know 1) where the data comes from and 2) that it’s reliable, because if it’s not, his team’s A/B tests won’t work and all downstream consumers (analysts, managers, executives, and customers) will suffer.
为了做好自己的工作,Sam需要知道1)数据来自何处以及2)可靠,因为如果不可靠,他的团队的A / B测试将无法正常工作,并且所有下游消费者(分析师,经理,管理人员,和客户)将遭受损失。
Sam’s team spends roughly 80 percent of their time scrubbing, cleaning, and understanding the context of the data, so they need tools and solutions that can make their lives easier.
Sam的团队花费了大约80%的时间来清理,清理和理解数据的上下文,因此他们需要可以简化生活的工具和解决方案。
数据治理主管 (Data Governance Lead)
Proud owner of a seven-month old puppy, Gerald is the company’s very first data governance specialist. He started off on the legal team, and then, when GDPR and CCPA entered the picture, eventually focused his efforts exclusively on data compliance. It’s a novel role, but becoming increasingly important as the organization grows.
杰拉尔德(Gerald)骄傲地拥有一只七个月大的小狗,是公司的第一位数据治理专家。 他开始加入法律团队,然后当GDPR和CCPA介入时,最终将他的工作完全集中在数据合规性上。 这是一个新颖的角色,但随着组织的发展而变得越来越重要。
When it comes to data reliability, Gerald cares about 1) unified definitions of data and metrics across the company and 2) understanding who has access and visibility to what data.
关于数据可靠性,Gerald关注的是:1)公司中数据和指标的统一定义,以及2)了解谁可以访问和查看哪些数据。
For Gerald, bad data can mean costly fines, erosion of customer trust, and lawsuits. Despite the criticality of his role, he sometimes jests that it’s like accounting: “you’re only front and center if something has gone wrong!”
对于杰拉尔德(Gerald)而言,不良数据可能意味着高昂的罚款,客户信任度的下降以及诉讼。 尽管他扮演的角色很关键,但有时他还是开玩笑说这就像会计:“如果出了问题,您只会处于中心位置!”
数据工程师 (Data Engineer)
When it comes to data reliability, Emerson, the data engineer, is at the crux of the equation.
在数据可靠性方面,数据工程师艾默生(Emerson)处于关键所在。
Emerson started out as a full-stack developer at a small e-commerce startup, but then as the company grew, so too did their data needs. Before she knew it, she was responsible not just for building their data product but also integrating the data sources the team relies on to make decisions about the business. Now, she’s a Snowflake expert, PowerBI guru, and general data tooling whiz.
Emerson最初是一家小型电子商务初创公司的全栈开发人员,但是随着公司的发展,他们的数据需求也随之增长。 在不知不觉中,她不仅负责构建其数据产品,还负责集成团队用来制定业务决策所依赖的数据源。 现在,她是Snowflake专家,PowerBI专家和通用数据工具专家。
Emerson and her team are the glue that hold the company’s data ecosystem together. They implement technologies that monitor the reliability of their company’s data, and if something goes awry, she’s the one whose paged by the analytics team at 3 a.m. to fix it. Like Betty, she’s lost countless hours of sleep because of this.
艾默生和她的团队是将公司数据生态系统整合在一起的粘合剂。 他们采用的技术可以监控公司数据的可靠性,如果出现问题,分析小组会在凌晨3点对她进行修复。 像贝蒂一样,她因此失去了数小时的睡眠。
To be successful at her job, Emerson must tackle a lot of things, including:
为了在工作中取得成功,艾默生必须处理很多事情,包括:
- Designing a data platform solution that scales 设计可扩展的数据平台解决方案
- Ensuring that data ingestion is reliable 确保数据提取可靠
- Making the platform accessible to other teams 使其他团队可以访问该平台
- Being able to fix data downtime quickly when it happens 能够在发生故障时快速修复数据停机
- And above all else, making life sustainable for the entire data organization 最重要的是,使整个数据组织的生命可持续
数据产品经理 (Data Product Manager)
This is Peter. He’s a data product manager. Peter got his start as a back-end developer, but made the jump to product management a few years ago. Like Gerald, he’s the company’s first-ever hire in this role, which is simultaneously exciting and challenging.
这是彼得。 他是数据产品经理。 Peter最初是一名后端开发人员,但几年前就跳槽到产品管理领域。 和杰拉尔德一样,他是公司有史以来第一位担任此职位的人,这既令人兴奋又充满挑战。
He’s up to date on all the latest data engineering and data analytics solutions, and is often called upon to make decisions on what offerings his organization needs to invest in to be successful. He knows firsthand how automation and self-serve tooling make all the difference when it comes to delivering an accessible, scalable data product.
他了解所有最新的数据工程和数据分析解决方案,并且经常被要求就其组织为成功需要投资哪些产品做出决策。 他直接了解自动化和自助服务工具在交付可访问的,可扩展的数据产品方面如何发挥作用。
All other data stakeholders, from analysts to social media managers, are dependent on him for building a platform that ingests, unifies, and makes accessible data from a myriad of sources to consumers all over the business. Oh, and did we mention that this data must be compliant with GDPR, CCPA, and other industry regulations? It’s a challenging role and it’s difficult to keep everyone happy — it seems like his platform is always one transformation away from what BI actually wanted.
从分析师到社交媒体经理的所有其他数据利益相关者都依赖他来构建一个平台,该平台可从众多来源向整个企业的消费者提取,统一并提供可访问的数据。 哦,我们是否提到过这些数据必须符合GDPR,CCPA和其他行业法规? 这是一个具有挑战性的角色,很难让每个人都开心–看来他的平台始终是BI 真正想要的一种转变。
谁负责数据可靠性? (Who is responsible for data reliability?)
So, who in your data organization owns the reliability piece of your data ecosystem?
那么,您的数据组织中谁拥有数据生态系统的可靠性?
As you can imagine, the answer isn’t simple. From your company’s CDO to your data engineers, it’s ultimately everyone’s responsibility to ensure data reliability. And although nearly every arm of every organization at every company relies on data, not every data team has the same structure, and various industries have different requirements. (For instance, it’s the norm for financial institutions to hire entire teams of data governance experts, but at a small startup, not so much. And for those startups that do — we commend you!)
您可以想象,答案并不简单。 从公司的CDO到数据工程师,确保数据可靠性最终都是每个人的责任。 尽管每个公司的每个组织的几乎每个部门都依赖数据,但并非每个数据团队都具有相同的结构,并且各个行业都有不同的要求。 (例如,对于金融机构而言,雇用整个团队的数据治理专家是正常的做法,但是对于一家小型初创公司而言,聘请的人数不多。对于那些从事此类工作的初创公司,我们表示赞赏!)
Below, we outline our approach to mapping data responsibilities, from accessibility to reliability, across your data organization using the RACI (Responsible, Accountable, Consulted, and Informed) matrix guidelines:
下面,我们概述了使用RACI (负责,负责,咨询和知情)矩阵准则在整个数据组织中映射数据职责(从可访问性到可靠性)的方法:
At companies that ingest and transform terabytes of data (like Netflix or Uber), we’ve found that it is common for data engineers and data product managers to tackle the responsibility of monitoring and alerting for data reliability issues.
在摄取和转换TB级数据的公司(例如Netflix或Uber ),我们发现数据工程师和数据产品经理通常要承担监视和警告数据可靠性问题的责任。
Barring these behemoths, the responsibility often falls on data engineers and product managers. They must balance the organization’s demand for data with what can be provided reliably. Notably, the brunt of any bad choices made here is often borne by the BI analysts, whose dashboards may wind up containing bad information or break from silent changes. In very early data organizations, these roles are often combined into a jack-of-all-trades data person or a product manager.
除这些庞然大物之外,责任通常落在数据工程师和产品经理身上。 他们必须在组织对数据的需求与可靠提供的数据之间取得平衡。 值得注意的是,BI分析师通常会在这里做出任何错误选择,其中BI分析师的仪表板可能最终包含错误的信息或无法进行静默更改。 在非常早期的数据组织中,这些角色通常被组合为万事通数据人员或产品经理。
Regardless of your team’s situation, you’re not alone.
无论您的团队情况如何,您都不是一个人。
Fortunately, there’s a better way to start trusting your data: data observability. It’s an approach that’s taking off with most innovative companies, no matter who is ultimately responsible for ensuring data reliability in your organization.
幸运的是,有一种更好的方式开始信任您的数据: 数据可观察性 。 无论谁最终负责确保组织中数据的可靠性,这种方法在大多数创新型公司中都在流行。
In fact, with the right data reliability strategy, the Bad Data Blame Game is a thing of the past and full end-to-end observability is in sight.
实际上,有了正确的数据可靠性策略,Bad Data Blame Game已经成为过去,并且可以看到完整的端到端可观察性。
Interested in learning more? Reach out to Barr Moses, Will Robins, and the rest of the Monte Carlo team.
有兴趣了解更多吗? 与 Barr Moses , Will Robins 和 蒙特卡洛团队 的其他 成员接触 。
This article was written by Barr Moses and Will Robins.
本文由 Barr Moses 和 Will Robins 撰写 。
翻译自: https://towardsdatascience.com/which-of-the-six-major-data-personas-are-you-8dbf434b7c9e
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389787.shtml
如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!