Data governance, data literacy, and data quality management:A literature review

注意:这并不是正式发表的论文,只是一篇用来交作业的文章

Note: This is not a formally published paper, but just an essay for homework.

Abstract

With the rise of the data era, data governance, data literacy, and data quality management have emerged as the core pillars of organizational data management. This paper reviews these three areas, examining their definitions, interconnections, and applications in the data middle platform and AI data services. Drawing on practices from Chinese Internet enterprises, it highlights the platform’s role as a key tool in modern data governance, showcasing its benefits in governance, data mining, and intelligent applications.

Key words: data governance, data literacy, data quality management, data middle platform, AI data service

1. Introduction

Driven by big data and artificial intelligence, data has become the core asset of an organization. Data governance, data literacy and data quality management are interdependent, forming the key basis for modern organizations to realize data-driven decision-making and value creation. This paper aims to explore the relationship between the three, and further explain the application value of the data middle platform and AI data service in the modern data governance system.

2. Core concept

The definition and development of these concepts does not arise from a particular individual or institution, but from the development of the data management field, driven by academia, industry standards organizations, and corporate practice. The following are the main sources and development background of each concept:

  1. Data Governance:Data governance refers to the framework and practices for ensuring the quality, integrity, security, and availability of organizational data. It involves policies, roles, and processes that enable efficient and responsible data management.  
  2. Data Literacy: Data literacy is the ability to read, understand, create, and communicate data as information, enabling individuals to make informed decisions in a data-driven environment.  
  3. Data Quality Management:Data quality management refers to the set of practices aimed at maintaining high data quality standards, focusing on accuracy, completeness, reliability, and relevance throughout the data lifecycle.  

3. The relevance of the top three major concepts

3.1 The relationship between the three factors

Data governance offers a strategic, top-level design that standardizes the rules and processes associated with data management. Data quality management, on the other hand, ensures the precision and uniformity of data through advanced technical methodologies. Furthermore, data literacy enhances the capability of individuals and organizations to effectively engage in data governance and quality management activities. Collectively, these components facilitate the promotion of data-driven decision-making and foster innovation within organizations.

3.2 Related cases

(1) Airbnb's Data Literacy and Governance Practices

Airbnb's "Data University" enhances employee data literacy, promoting responsible data processing, governance, and decision-making. The program supports data democratization and a data-driven culture.

(2) Milliman MedInsight's Medical Data Governance

Milliman MedInsight improves data quality and governance through document optimization, query automation, and tailored training. These efforts enhance data literacy, unify governance standards, and improve data usability.

(3) Data Management in Scientific Research

The eagle-i project advances data governance in scientific research by improving data literacy and standardizing biological resource management. It emphasizes early education, community involvement, and institutional support to boost data sharing and utilization.

4. Modern data governance system

With the evolution of The Times, the data lake and data warehouse architecture in the traditional data governance system have faced new alternatives, namely, data middle platform and AI data services. As an iterative product of the data lake, the data middle platform has effectively solved a series of problems faced by the data lake, and maintained good compatibility with AI data services, thus effectively promoting the transformation process of enterprise data governance.

4.1 Role and challenges of Data Lake

As a centralized data storage architecture, data lake provides a basis for diversified data processing and analysis, and provides an effective solution for enterprise data governance, but it also faces the problems of data quality, security and management complexity.

4.1.2 Definition and core characteristics of data lake

A data lake is a centralized architecture for storing large-scale raw data, accommodating structured, semi-structured, and unstructured formats. It offers flexibility, scalability, and cost-effectiveness to meet growing data management demands.And its core features include:

  1. Large storage capacity: Supports various data types and formats.
  2. Cost efficiency: Utilizes low-cost storage media like HDFS or cloud services.
  3. Open architecture: Compatible with tools like Hadoop, Spark, and Flink.
  4. Customizable formats: Adapts to diverse user needs.
  5. Data security: Ensures safety through access control, encryption, and auditing.

In summary, data lakes empower data-intensive organizations to optimize their data assets efficiently.

4.1.3 The role of data lake

  1. Data Integration: Unify data from diverse internal and external sources for streamlined governance.
  2. Data Cleaning: Facilitate cleaning processes, including quality rules, transformation, and calibration, to ensure data accuracy and consistency.
  3. Data Quality Assessment: Store historical data to assess quality and identify issues.
  4. Data Security: Ensure safety through mechanisms like access control, encryption, and auditing.
  5. Data Lifecycle Management: Support archiving, backup, and deletion to optimize costs and enhance availability.

In summary, data lakes enhance the efficiency and reliability of data governance systems.

4.1.4 Challenges facing Data Lake

Data Lake technology has also identified a range of problems in long-term industrial practice, such as data quality and consistency issues, which can lead to duplication and inconsistencies due to the storage of data from multiple sources. The lack of metadata management affects data availability and traceability. Data security and access control are key, especially for sensitive data, where a lack of security measures can lead to compliance issues. Governance and multi-sectoral coordination issues can lead to conflicting data definitions. Finally, without proper governance, the data lake may become a "data swamp" that is difficult to use.

4.2 The Value of data middle platform

The data middle platform combines the strengths of data lakes and data warehouses, serving as a crucial component of modern data governance. By standardizing interfaces, it reduces governance complexity, enhances data mining efficiency, and supports intelligent applications, particularly AI data services. This technology has been widely validated in Chinese enterprises, demonstrating its effectiveness and practicality in improving data governance systems.

  1. Alibaba data middle platform: Alibabas data middle platform is one of its core competitive advantages. It provides strong data support for Alibabas business operations, such as e-commerce business, cloud computing business, digital media and entertainment business.
  2. Tencent data middle platform: Tencent data middle platform provides data support for its social networking, games, finance and other services, such as user portrait, content recommendation, intelligent customer service, etc.
  3. Baidu data middle platform: Baidu data middle platform provides data support for its search, advertising, autonomous driving and other businesses, such as user behavior analysis, advertising optimization, autonomous driving path planning, etc.

4.3 Driving force of AI data services

Artificial intelligence data services refer to services that use artificial intelligence technology to process, analyze and mine data to provide intelligent decision-making support for enterprises. One of the biggest changes in the data center compared to the data lake is that while sorting out the data system, it is fully compatible with AI data services. Artificial intelligence data services play an important role in the modern data governance system, which is mainly reflected in the following aspects:

4.3.1 Intelligent Data Analysis

Automatic Modeling: Build predictive, classification, and clustering models to improve analysis efficiency and accuracy.

Intelligent Recommendation: Suggest goods, content, or services based on user behavior to enhance experience and conversion rates.

Intelligent Forecasting: Predict trends using historical and real-time data to inform strategic decisions.

Intelligent Decision-Making: Offer actionable insights to support scientific decision-making.

4.3.2 Automation of Data Governance

Data Cleaning: Automatically detect and resolve errors or inconsistencies to enhance data quality.

Data Classification: Automate labeling and categorization for easier management.

Data Security: Identify and mitigate risks such as leaks or tampering.

4.3.3 Data Application Innovation

Personalized Services: Deliver tailored recommendations and marketing strategies.

Intelligent Customer Service: Use technologies like speech recognition and NLP to boost service efficiency and quality.

Intelligent Risk Control: Provide tools for fraud detection and credit assessment to reduce losses.

4.3.4 Enhancing the Data Governance System

Data Quality Improvement: Ensure reliable foundations for governance.

Data Security Enhancement: Strengthen protective measures.

Data Value Enhancement: Drive data-centric decision-making and value creation.

5. Conclusion

Data governance, data literacy and data quality management are the three pillars of the modern data management system. They complement each other and promote the data-driven decision-making and innovative development of organizations. data middle platform platform and AI data services provide technical support and application scenarios, and become the core tool of modern data governance.

6. References

1. Koltay, Tibor. "Data Governance, Data Literacy and the Management of Data Quality." *IFLA Journal*, vol. 42, no. 4, 2016, pp. 303–312. https://doi.org/10.1177/0340035216672238.

2. Tableau. "Top Data Literacy Skills for Becoming Data Literate." Tableau Software, LLC, 2023.

3. Data Management Association International. *Data Management Body of Knowledge (DMBOK) Guide*. DAMA International, 2017.

4. Data Literacy Project. “What Is Data Literacy?” *The Data Literacy Project*, Qlik, 2016, www.thedataliteracyproject.org.

5. Wang, Richard Y., and Diane M. Strong. “Beyond Accuracy: What Data Quality Means to Data Consumers.” *Journal of Management Information Systems*, vol. 12, no. 4, 1996, pp. 5–33.  

6. Forrester Research. *Data Literacy Trends*. Cambridge, MA: Forrester Research, 2022.

7. Zha, Di, et al. "Data-centric Artificial Intelligence: A Survey." arXiv, 2023. arXiv:2303.10158.

8. Benaich, Nathan, and Ian Hogarth. State of AI Report 2023. Air Street Capital, 2023. Available at State of AI Report 2023.

9. Wright, T. "Data Quality and Decision Making: The Role of Confidence in Business Data." Journal of Information Management, vol. 14, no. 2, 2006, pp. 72–85.

10. Newman, H. "EIM Governance and Logical Data Models: A Comparative Study." International Journal of Information Systems, vol. 10, no. 4, 2006, pp. 245–260.

11. Atlan. "5 Data Governance Examples: Case Studies, Takeaways & More." Atlan Blog, https://www.atlan.com/data-governance-case-studies. Accessed 18 Nov. 2024.

12. MedInsight. "Analytic Maturity in Data Governance, Quality & Literacy." MedInsight Blog, https://www.medinsight.com/analytic-maturity-data-governance. Accessed 18 Nov. 2024.

13. Palmer, Carole. "Dealing with Data: A Case Study on Information and Data Management Literacy." PLOS Biology, https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1001171. Accessed 18 Nov. 2024.

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/pingmian/61150.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

ReactNative的环境搭建

写在前面 React Native (RN) 是一个由 Facebook 开发的开源框架,用于构建跨平台的移动应用程序。它允许开发者使用 JavaScript 和 React 来创建原生 iOS 和 Android 应用。RN 的出现极大地简化了移动应用的开发过程,使得开发者可以更快速、更高效地构建…

iOS 18 导航栏插入动画会导致背景短暂变白的解决

问题现象 在最新的 iOS 18 系统中,如果我们执行导航栏的插入动画,可能会造成导航栏背景短暂地变为白色: 如上图所示:我们分别向主视图和 Sheet 弹出视图的导航栏插入了消息,并应用了动画效果。可以看到,前者的导航栏背景会在消息插入那一霎那“变白”,而后者则没有任何…

GNU与开源:塑造数字世界的自由基石

引言 在信息技术的浩瀚星空中,GNU(GNUs Not Unix)项目犹如一颗璀璨的星辰,引领着开源运动的浪潮,深刻影响着全球软件开发的格局。自1983年由理查德斯托尔曼(Richard Stallman)创立以来&#xf…

【Swift】可选类型

文章目录 什么是可选类型?nilif 语句以及强制解析可选绑定隐式解析可选类型 什么是可选类型? Swift 的可选(Optional)类型,用于处理值缺失的情况。可选表示"那儿有一个值,并且它等于 x “或者"那…

SPP:空间金字塔池化

今天水一篇博客,讲讲SPP池化结构;那这是个什么东西呢?它的作用又是什么呢?在了解它之前我们先简单了解一下大部分的神经网络; 引入: 在大部分的神经网络中,都将神经网络分为Backbone主干网络、…

网络安全与防范

1.重要性 随着互联网的发达,各种WEB应用也变得越来越复杂,满足了用户的各种需求,但是随之而来的就是各种网络安全的问题。了解常见的前端攻击形式和保护我们的网站不受攻击是我们每个优秀fronter必备的技能。 2.分类 XSS攻击CSRF攻击网络劫…

Python从0到100(七十三):Python OpenCV-OpenCV实现手势虚拟拖拽

前言: 零基础学Python:Python从0到100最新最全教程。 想做这件事情很久了,这次我更新了自己所写过的所有博客,汇集成了Python从0到100,共一百节课,帮助大家一个月时间里从零基础到学习Python基础语法、Pyth…

Spring Cloud Alibaba [Gateway]网关。

1 简介 网关作为流量的入口,常用功能包括路由转发、权限校验、限流控制等。而springcloudgateway 作为SpringCloud 官方推出的第二代网关框架,取代了Zuul网关。 1.1 SpringCloudGateway特点: (1)基于Spring5,支持响应…

是时候谈谈Go的测试了

本篇内容是根据2019年4月份#83 It’s time to talk about testing音频录制内容的整理与翻译 测试是一门艺术还是一门科学?我们应该测试什么以及何时测试?测试的意义何在?测试会不会太过分?我们将在这一充满测试的剧集中探讨所有这…

Spark RDD sortBy算子什么情况会触发shuffle

在 Spark 的 RDD 中,sortBy 是一个排序算子,虽然它在某些场景下可能看起来是分区内排序,但实际上在需要全局排序时会触发 Shuffle。这里我们分析其底层逻辑,结合源码和原理来解释为什么会有 Shuffle 的发生。 1. 为什么 sortBy 会…

ssm139选课排课系统的设计与开发+vue(论文+源码)_kaic

摘 要 互联网的普及,改变了人们正常的生活学习及消费习惯,而且也大大的节省了人们的时间,由于各种管理系统都再不断的增加,更方便了用户,也改良了很多的用户习惯。对于选课排课系统查询方面缺乏系统的管理方式&#x…

网络基础 - NAT 篇

一、全局 IP 地址(公网 IP 地址)和私有 IP 地址 RFC 1918 规定了用于组建局域网的私有 IP 地址: 10.0.0.0 ~ 10.255.255.255172.16.0.0 ~ 172.31.255.255192.168.0.0 ~ 192.168.255.255 包含在以上范围内的 IP 地址都属于私有 IP 地址,而在此之外的 I…

Rust 布尔类型

文章目录 1、基本用法2、实例 bool 代表一个值,它只能是 true 或 false。 如果将 bool 转换为整数,则 true 将为 1,false 将为 0. 1、基本用法 bool 实现了各种 traits,例如 BitAnd、BitOr、Not 等,允许我们使用 &…

Springboot3.3.5 启动流程之 tomcat启动流程介绍

在文章 Springboot3.3.5 启动流程(源码分析) 中讲到 应用上下文(applicationContext)刷新(refresh)时使用模板方法 onRefresh 创建了 Web Server. 本文将详细介绍 ServletWebServer — Embedded tomcat 的启动流程。 首先&…

NPOI 实现Excel模板导出

记录一下使用NPOI实现定制的Excel导出模板&#xff0c;已下实现需求及主要逻辑 所需Json数据 对应参数 List<PurQuoteExportDataCrInput> listData [{"ItemName": "电缆VV3*162*10","Spec": "电缆VV3*162*10","Uom":…

CSDN如何写出”爆款“文章

一、选题策划 关注热点与趋势 时常浏览技术领域的热门话题&#xff0c;比如通过关注各大科技资讯网站&#xff08;如 InfoQ、开源中国等&#xff09;、社交媒体上的技术大 V 动态、行业知名企业发布的技术博客等渠道&#xff0c;了解当下最受关注的编程语言更新、框架应用、新兴…

DAY113代码审计-PHPTP框架微P系统漏审项目等

一、环境安装 导入数据 Debug 版本信息收集 一、不安全写法的sql注入&#xff08;拼接写法绕过预编译机制&#xff09; 1、Good.php的不安全写法 2、查找可以参数 3、找路由关系 application/index/controller/Goods.php http://172.19.1.236:8833/index.php/index/goods/aj…

Flink1.19编译并Standalone模式本地运行

1.首先下载源码 2.本地运行 新建local_conf和local_lib文件夹&#xff0c;并且将编译后的文件放入对应的目录 2.1 启动前参数配置 2.1.2 StandaloneSessionClusterEntrypoint启动参数修改 2.1.3 TaskManagerRunner启动参数修改 和StandaloneSessionClusterEntrypoint一样修改…

Ascend C算子性能优化实用技巧05——API使用优化

Ascend C是CANN针对算子开发场景推出的编程语言&#xff0c;原生支持C和C标准规范&#xff0c;兼具开发效率和运行性能。使用Ascend C&#xff0c;开发者可以基于昇腾AI硬件&#xff0c;高效的实现自定义的创新算法。 目前已经有越来越多的开发者使用Ascend C&#xff0c;我们…

传奇996_23——杀怪掉落,自动捡取,捡取动画

一、杀怪掉落 前置&#xff1a; 添加地图地图刷怪怪物掉落&#xff08;术语叫爆率&#xff0c;掉落叫爆率&#xff0c;而且文档上叫爆率&#xff09; 刷怪步骤&#xff1a;在\MirServer\Mir200\Envir\MonItems文件夹中建立以怪物名字为文件名的txt文件写法案例&#xff1a; …