工作10年厌倦写代码_厌倦了数据质量讨论?

工作10年厌倦写代码

I have been in tons of meetings where data and results of any sort of analysis have been presented. And most meetings have one thing in common, data quality is being challenged and most of the meeting time is used for discussing potential data quality issues. The number one follow up of this meeting is to verify the open question, and we start all over again. Sounds familiar?

我参加过无数次会议,提出了各种分析的数据和结果。 大多数会议有一个共同点, 数据质量正在受到挑战,而大多数会议时间都用于讨论潜在的数据质量问题 。 这次会议的首要跟进工作是核实悬而未决的问题, 我们从头再来 。 听起来很熟悉?

It can be different. There are meetings where these discussions don’t take place, or perhaps were started, but immediately taken care of. I have seen and been involved in a few. And there was ONE difference between these types of meetings that I have seen over and over again. The person presenting the data was not on top of their data, was not anticipating and not thinking a step further.

可以不同。 在有些会议中,这些讨论没有进行,也可能没有开始,但立即得到了处理。 我已经看到并参与了一些。 我一遍又一遍地看到,这些类型的会议之间只有一个区别 。 提供数据的人不在他们的数据之上, 没有期待 ,也没有进一步思考

The person presenting the data was not on top of their data, was not anticipating and not thinking a step further.

呈现数据的人是 不是自己的数据之上, 没有期待 ,而 不是进一步思考的一个步骤

In fact, many of these data quality discussions are not actually data quality issues but an understanding of the meaning of the data. For example its hierarchy or structure, and the interpretation of the metrics. It is very easy when you don’t understand something to blame the data quality, but usually, the issue lies somewhere else.

实际上,许多数据质量讨论实际上并不是数据质量问题,而是对数据含义理解。 例如,其层次结构或结构以及指标的解释。 这是很容易当你不明白的东西 数据质量 ,但通常情况下,问题在于其他地方

It is very easy when you don’t understand something to blame the data quality, but usually, the issue lies somewhere else.

这是很容易当你不明白的东西 数据质量 ,但通常情况下,问题在于其他地方

Let's assume you are working on some exploratory data analysis that you are doing to get started with AI. The key to success is to really understand the data you are working with. If the quality is not up to standard, make it up to standard or find a way to work with the data nonetheless. Be proactive and then it will find it’s a long way.

让我们假设您正在做一些探索性数据分析,以开始使用AI 。 成功的关键是真正了解正在使用的数据 。 如果质量不符合标准,则使其达到标准或找到一种处理数据的方法。 积极主动,然后发现它还有很长的路要走。

1.从小做起 (1. Start small)

The key here is as with so many things to start small. If you are looking at a handful of features you can actually dig into what these features mean. If you are starting off with hundreds, it will be more difficult. Let’s look at the number of products per customer, which is clearly small.

关键在于从头开始有很多事情。 如果您正在查看一些功能 ,则实际上可以深入了解这些功能的含义。 如果您刚开始有数百个,那将更加困难。 让我们看看每个客户的产品数量,这显然很小。

If you are looking at a handful of features you can actually dig into what these features mean

如果您正在查看一些功能,那么您实际上可以深入了解这些功能的含义

2.确保您了解自己的数据 (2. Make sure you understand your data)

Because you started small, you are able to dig deep. Do your correlation plots, look at the frequencies, and read the documentation on these features.

因为从小开始 ,所以您可以深入研究 。 做相关图,查看频率,并阅读这些功能的文档。

Because you started small, you are able to dig deep and truly understand the data

因为您从小开始 ,所以您能够深入并真正理解数据

In our example, we basically have two features to look at, two features that actually both have a large potential for discussion. I have once taken about three months to define what is meant with customer, an especially difficult question when working in a B2B environment. Depending on the company you work in, there may be different levels of products used, each of them who can be of interest in a different type of role. A product manager can have a different hierarchy of interest than the head of sales of a region.

在我们的示例中,我们基本上要看两个功能,实际上两个功能都有很大的讨论潜力 。 我曾经花了大约三个月的时间来定义客户的含义,这是在B2B环境中工作时特别棘手的问题。 根据您所工作的公司的不同,可能会使用不同级别的产品,每种产品可能会对不同类型的职位感兴趣。 产品经理的兴趣层次与区域销售主管的兴趣层次可能不同。

3.验证数据质量 (3. Verify the data quality)

There may be standard ways already that the data quality is checked, and you should understand and be able to explain these. I recommend going a step beyond the usual checks. Check for inconsistencies from a business perspective, are most of the jobs of your customer “Accountant”? Think again, it may be the top selection of the drop-down list. Another typical quality issue is the inconsistency between systems. Be sure you know these inconsistencies, what drives them, and their implications.

可能已经有检查数据质量的标准方法,您应该理解并能够解释这些方法。 我建议超越常规检查范围。 从业务角度检查不一致之处 ,客户的大部分工作是“会计”吗? 再想一想,它可能是下拉列表的首选。 另一个典型的质量问题是系统之间不一致 。 确保您知道这些不一致之处,驱动它们的原因及其含义。

It may be the top selection of the drop-down list

它可能是下拉列表的首选

4.预测问题 (4. Anticipate the issues)

Quite a few questions and issues you can anticipate. What are the questions you typically get? What KPIs have been reported to your audience? What discussions have taken place in the past? Which words are used in the daily discussions? That should for example give you a good sense of the product split you are looking at (spoiler alert, it may well be none of the splits in your data). Make sure you understand the different levels of why they are used and how.

您可以预期的一些问题。 您通常会遇到什么问题? 向您的听众报告了哪些KPI ? 过去进行了哪些讨论 ? 日常讨论中使用哪些 ? 例如,这应该可以使您很好地了解要查看的产品拆分(扰流板警报,很可能不是您数据中的任何拆分)。 确保您了解为什么使用它们以及如何使用它们的不同层次。

Anticipating the issues will allow you to divert from the data quality discussion

预计问题将使您从数据质量讨论中 转移出来

In my example, there were many different product hierarchies (from different systems) that were used by different audiences. I have built-in both hierarchies in my dashboard and was able to explain the overlap and differences between the two.

在我的示例中,不同的受众使用了许多不同的产品层次结构(来自不同的系统)。 我在仪表板上内置了两个层次结构,并且能够解释两者之间的重叠和差异。

If you find out which systems your audience is using and what data they typically see. Have an upfront discussion with someone you trust to go through the data and results to take out all possible flaws.

如果您找出观众使用的系统以及他们通常看到的数据。 与您信任的人进行前期讨论,以审阅数据和结果以发现所有可能的缺陷

5.了解问题并解决它们 (5. Know the issues and work around them)

Once you know the issues that are there. It’s time to work around them. One way is to tackle the issue at source. It may not be your job but potentially critical for a follow-up project where these features are going to be used.

一旦知道存在的问题。 现在该解决它们了。 一种方法是从源头上解决问题。 这可能不是您的工作,但对于将要使用这些功能的后续项目而言可能至关重要。

If you are still in the exploratory phase, then you could think of making the issues and assumptions clear. Key will be that you are able to explain them and their implications to gain the trust of your audience.

如果您仍处于探索阶段,则可以考虑将问题和假设弄清楚。 关键在于您能够解释它们及其含义,从而赢得听众的信任。

Key will be that you are able to explain the issues and their implications to gain the trust of your audience.

关键在于您能够解释这些问题及其含义,从而赢得听众信任

You are thinking that this is a lot of work? Well think again, once this is sorted you can actually do your job and start creating actionable insights, and take action.

您以为这是很多工作吗? 再想一想 ,一旦解决了这个问题,您就可以真正完成自己的工作并开始创建可行的见解 ,并采取行动

About me: I am an Analytics Consultant and Director of Studies for “AI Management” at a local business school. I am on a mission to help organizations generating business value with AI and creating an environment in which Data Scientists can thrive. Sign up to my newsletter for new articles, insights, and offerings on AI Management here.

关于我:我是当地商学院的分析顾问和“ AI管理”研究总监。 我的使命是帮助组织通过AI创造业务价值,并创造一个数据科学家可以蓬勃发展的环境。 此处 注册我的时事通讯,以获得有关AI Management的新文章,新见解和新产品

翻译自: https://towardsdatascience.com/tired-of-data-quality-discussions-654106ce2e00

工作10年厌倦写代码

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/388461.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Java基础回顾

内容: 1、Java中的数据类型 2、引用类型的使用 3、IO流及读写文件 4、对象的内存图 5、this的作用及本质 6、匿名对象 1、Java中的数据类型 Java中的数据类型有如下两种: 基本数据类型: 4类8种 byte(1) boolean(1) short(2) char(2) int(4) float(4) l…

oracle数据库 日志满了

1、 数据库不能启动SQL> startupORACLE 例程已经启动。Total System Global Area 289406976 bytesFixed Size 1248576 bytesVariable Size 83886784 bytesDatabase Buffers 197132288 bytesRedo Buffers 7139328 byt…

计算机应用基础学生自查报告,计算机应用基础(专科).docx

1.在资源管理器中,如果要选择连续多个文件或文件夹,需要单击第一个文件或文件夹,按下键盘(),再用鼠标单击最后一个文件或文件夹即可。(A)Shift(B)Tab(C)Alt(D)Ctrl分值:2完全正确?得分:2?2.下列数据能被E…

Random随机数

Random 随机数 1 产生随机数 1.1 Random的使用步骤 我们想产生1-100(包含1和100)的随机数该怎么办?我们不需要自己写算法,因为额Java已经为我们提供好了产生随机数的类---Random 作用:用于产生一个随机数 使用步骤(和Scanner类似)&#xff1a…

模拟一个简单计算器_阅读模拟器的简单介绍

模拟一个简单计算器Read simulators are widely being used within the research community to create synthetic and mock datasets for analysis. In this article, I will introduce some recently proposed, commonly used read simulators.阅读模拟器在研究社区中被广泛使…

计算机部分应用显示模糊,win10系统打开部分软件字体总显示模糊的解决方法-电脑自学网...

win10系统打开部分软件字体总显示模糊的解决方法。方法一:win10软件字体模糊1、首先,在Win10的桌面点击鼠标右键,选择“显示设置”。2、在“显示设置”的界面下方,点击“高级显示设置”。3、在“高级显示设置”的界面中&#xff0…

Tomcat调节

Tomcat默认可以使用的内存为128MB,在较大型的应用项目中,这点内存是不够的,需要调大,并且Tomcat本身不能直接在计算机上运行,需要依赖于硬件基础之上的操作系统和一个java虚拟机。 AD: 这里向大家描述一下如何使用Tom…

假如不工作了,你还有源源不断的收入吗?

拥有金山跟银矿,其实不值得羡慕。俗话说:授人以鱼不如授人以渔。与其选择万贯家财,倒不如选择一个会持续冒出钱的杯子。很多人害怕上班的收入不确定,上班族急于寻找双薪,下班之后还要辛勤工作,以为这样就可…

turtle 20秒画完小猪佩奇“社会人”

转载:https://blog.csdn.net/csdnsevenn/article/details/80650456 图片源自网络 作者 丁彦军 如需转载,请联系原作者授权。 今年社交平台上最火的带货女王是谁?范冰冰?杨幂?Angelababy?不,是猪…

最佳子集aic选择_AutoML的起源:最佳子集选择

最佳子集aic选择As there is a lot of buzz about AutoML, I decided to write about the original AutoML; step-wise regression and best subset selection. Then I decided to ignore step-wise regression because it is bad and should probably stop being taught. That…

Java虚拟机内存溢出

最近在看周志明的《深入理解Java虚拟机》,虽然刚刚开始看,但是觉得还是一本不错的书。对于和我一样对于JVM了解不深,有志进一步了解的人算是一本不错的书。注明:不是书托,同样是华章出的书,质量要比《深入剖…

spring boot构建

1.新建Maven工程 1.File-->new-->project-->maven project 2.webapp 3.工程名称 k3 2.Maven 三个常用命令 选 项目右击- >run-> Maven clean,一般新工程,新导入工程用这个命令清理clean Mvaen install, Maven test&#xff0c…

用户输入汉字时计算机首先将,用户输入汉字时,计算机首先将汉字的输入码转换为__________。...

用户的蓄的形能器常见式有。输入时计算机首先输入包括药物具有基的酚羟。汉字换物包腺皮括质激肾上素药。对既荷又有线有相间负负荷时,将汉倍作为等选取相负效三相负荷乘荷最大,将汉相负荷换荷应先将线间负算为,效三相负荷时在计算等&#xf…

从最终用户角度来看外部结构_从不同角度来看您最喜欢的游戏

从最终用户角度来看外部结构The complete python code and Exploratory Data Analysis Notebook are available at my github profile;完整的python代码和Exploratory Data Analysis Notebook可在我的github个人资料中找到 ; Pokmon is a Japanese media franchise,…

apache+tomcat配置

无意间看到tomcat 6集群的内容,就尝试配置了一下,还是遇到很多问题,特此记录。apache服务器和tomcat的连接方法其实有三种:JK、http_proxy和ajp_proxy。本文主要介绍最为常见的JK。 环境:PC2台:pc1(IP 192.168.88.118…

记自己在spring中使用redis遇到的两个坑

本人在spring中使用redis作为缓存时&#xff0c;遇到两个坑&#xff0c;现在记录如下&#xff0c;算是作为自己的备忘吧&#xff0c;文笔不好&#xff0c;望大家见谅&#xff1b; 一、配置文件 1 <!-- 加载Properties文件 -->2 <bean id"configurer" cl…

Azure实践之如何批量为资源组虚拟机创建alert

通过上一篇的简介&#xff0c;相信各位对于简单的创建alert&#xff0c;以及Azure monitor使用以及大概有个印象了。基础的使用总是非常简单的&#xff0c;这里再分享一个常用的alert使用方法实际工作中&#xff0c;不管是日常运维还是做项目&#xff0c;我们都需要知道VM的实际…

南信大滨江学院计算机基础,南京信息工程大学滨江学院计算机基础期末复习知识点...

《计算机基础》期末考试复习知识点第一章计算机基础知识1.第一台电子计算机的名称、诞生时间及运算性能&#xff1b;名称&#xff1a;电子数字积分计算机ENIAC(埃尼阿克)。诞生时间&#xff1a;1946年2月14日。运算性能&#xff1a;运算速度为每秒5000次加法。2.计算机发展四个…

nginx集群

今天看到"基于apache的tomcat负载均衡和集群配置 "这篇文章成为javaEye热点。 略看了一下&#xff0c;感觉太复杂&#xff0c;要配置的东西太多&#xff0c;因此在这里写出一种更简洁的方法。 要集群tomcat主要是解决SESSION共享的问题&#xff0c;因此我利用memc…

管道过滤模式 大数据_大数据管道配方

管道过滤模式 大数据介绍 (Introduction) If you are starting with Big Data it is common to feel overwhelmed by the large number of tools, frameworks and options to choose from. In this article, I will try to summarize the ingredients and the basic recipe to …