Data governance, data literacy, and data quality management:A literature review

注意:这并不是正式发表的论文,只是一篇用来交作业的文章

Note: This is not a formally published paper, but just an essay for homework.

Abstract

With the rise of the data era, data governance, data literacy, and data quality management have emerged as the core pillars of organizational data management. This paper reviews these three areas, examining their definitions, interconnections, and applications in the data middle platform and AI data services. Drawing on practices from Chinese Internet enterprises, it highlights the platform’s role as a key tool in modern data governance, showcasing its benefits in governance, data mining, and intelligent applications.

Key words: data governance, data literacy, data quality management, data middle platform, AI data service

1. Introduction

Driven by big data and artificial intelligence, data has become the core asset of an organization. Data governance, data literacy and data quality management are interdependent, forming the key basis for modern organizations to realize data-driven decision-making and value creation. This paper aims to explore the relationship between the three, and further explain the application value of the data middle platform and AI data service in the modern data governance system.

2. Core concept

The definition and development of these concepts does not arise from a particular individual or institution, but from the development of the data management field, driven by academia, industry standards organizations, and corporate practice. The following are the main sources and development background of each concept:

  1. Data Governance:Data governance refers to the framework and practices for ensuring the quality, integrity, security, and availability of organizational data. It involves policies, roles, and processes that enable efficient and responsible data management.  
  2. Data Literacy: Data literacy is the ability to read, understand, create, and communicate data as information, enabling individuals to make informed decisions in a data-driven environment.  
  3. Data Quality Management:Data quality management refers to the set of practices aimed at maintaining high data quality standards, focusing on accuracy, completeness, reliability, and relevance throughout the data lifecycle.  

3. The relevance of the top three major concepts

3.1 The relationship between the three factors

Data governance offers a strategic, top-level design that standardizes the rules and processes associated with data management. Data quality management, on the other hand, ensures the precision and uniformity of data through advanced technical methodologies. Furthermore, data literacy enhances the capability of individuals and organizations to effectively engage in data governance and quality management activities. Collectively, these components facilitate the promotion of data-driven decision-making and foster innovation within organizations.

3.2 Related cases

(1) Airbnb's Data Literacy and Governance Practices

Airbnb's "Data University" enhances employee data literacy, promoting responsible data processing, governance, and decision-making. The program supports data democratization and a data-driven culture.

(2) Milliman MedInsight's Medical Data Governance

Milliman MedInsight improves data quality and governance through document optimization, query automation, and tailored training. These efforts enhance data literacy, unify governance standards, and improve data usability.

(3) Data Management in Scientific Research

The eagle-i project advances data governance in scientific research by improving data literacy and standardizing biological resource management. It emphasizes early education, community involvement, and institutional support to boost data sharing and utilization.

4. Modern data governance system

With the evolution of The Times, the data lake and data warehouse architecture in the traditional data governance system have faced new alternatives, namely, data middle platform and AI data services. As an iterative product of the data lake, the data middle platform has effectively solved a series of problems faced by the data lake, and maintained good compatibility with AI data services, thus effectively promoting the transformation process of enterprise data governance.

4.1 Role and challenges of Data Lake

As a centralized data storage architecture, data lake provides a basis for diversified data processing and analysis, and provides an effective solution for enterprise data governance, but it also faces the problems of data quality, security and management complexity.

4.1.2 Definition and core characteristics of data lake

A data lake is a centralized architecture for storing large-scale raw data, accommodating structured, semi-structured, and unstructured formats. It offers flexibility, scalability, and cost-effectiveness to meet growing data management demands.And its core features include:

  1. Large storage capacity: Supports various data types and formats.
  2. Cost efficiency: Utilizes low-cost storage media like HDFS or cloud services.
  3. Open architecture: Compatible with tools like Hadoop, Spark, and Flink.
  4. Customizable formats: Adapts to diverse user needs.
  5. Data security: Ensures safety through access control, encryption, and auditing.

In summary, data lakes empower data-intensive organizations to optimize their data assets efficiently.

4.1.3 The role of data lake

  1. Data Integration: Unify data from diverse internal and external sources for streamlined governance.
  2. Data Cleaning: Facilitate cleaning processes, including quality rules, transformation, and calibration, to ensure data accuracy and consistency.
  3. Data Quality Assessment: Store historical data to assess quality and identify issues.
  4. Data Security: Ensure safety through mechanisms like access control, encryption, and auditing.
  5. Data Lifecycle Management: Support archiving, backup, and deletion to optimize costs and enhance availability.

In summary, data lakes enhance the efficiency and reliability of data governance systems.

4.1.4 Challenges facing Data Lake

Data Lake technology has also identified a range of problems in long-term industrial practice, such as data quality and consistency issues, which can lead to duplication and inconsistencies due to the storage of data from multiple sources. The lack of metadata management affects data availability and traceability. Data security and access control are key, especially for sensitive data, where a lack of security measures can lead to compliance issues. Governance and multi-sectoral coordination issues can lead to conflicting data definitions. Finally, without proper governance, the data lake may become a "data swamp" that is difficult to use.

4.2 The Value of data middle platform

The data middle platform combines the strengths of data lakes and data warehouses, serving as a crucial component of modern data governance. By standardizing interfaces, it reduces governance complexity, enhances data mining efficiency, and supports intelligent applications, particularly AI data services. This technology has been widely validated in Chinese enterprises, demonstrating its effectiveness and practicality in improving data governance systems.

  1. Alibaba data middle platform: Alibabas data middle platform is one of its core competitive advantages. It provides strong data support for Alibabas business operations, such as e-commerce business, cloud computing business, digital media and entertainment business.
  2. Tencent data middle platform: Tencent data middle platform provides data support for its social networking, games, finance and other services, such as user portrait, content recommendation, intelligent customer service, etc.
  3. Baidu data middle platform: Baidu data middle platform provides data support for its search, advertising, autonomous driving and other businesses, such as user behavior analysis, advertising optimization, autonomous driving path planning, etc.

4.3 Driving force of AI data services

Artificial intelligence data services refer to services that use artificial intelligence technology to process, analyze and mine data to provide intelligent decision-making support for enterprises. One of the biggest changes in the data center compared to the data lake is that while sorting out the data system, it is fully compatible with AI data services. Artificial intelligence data services play an important role in the modern data governance system, which is mainly reflected in the following aspects:

4.3.1 Intelligent Data Analysis

Automatic Modeling: Build predictive, classification, and clustering models to improve analysis efficiency and accuracy.

Intelligent Recommendation: Suggest goods, content, or services based on user behavior to enhance experience and conversion rates.

Intelligent Forecasting: Predict trends using historical and real-time data to inform strategic decisions.

Intelligent Decision-Making: Offer actionable insights to support scientific decision-making.

4.3.2 Automation of Data Governance

Data Cleaning: Automatically detect and resolve errors or inconsistencies to enhance data quality.

Data Classification: Automate labeling and categorization for easier management.

Data Security: Identify and mitigate risks such as leaks or tampering.

4.3.3 Data Application Innovation

Personalized Services: Deliver tailored recommendations and marketing strategies.

Intelligent Customer Service: Use technologies like speech recognition and NLP to boost service efficiency and quality.

Intelligent Risk Control: Provide tools for fraud detection and credit assessment to reduce losses.

4.3.4 Enhancing the Data Governance System

Data Quality Improvement: Ensure reliable foundations for governance.

Data Security Enhancement: Strengthen protective measures.

Data Value Enhancement: Drive data-centric decision-making and value creation.

5. Conclusion

Data governance, data literacy and data quality management are the three pillars of the modern data management system. They complement each other and promote the data-driven decision-making and innovative development of organizations. data middle platform platform and AI data services provide technical support and application scenarios, and become the core tool of modern data governance.

6. References

1. Koltay, Tibor. "Data Governance, Data Literacy and the Management of Data Quality." *IFLA Journal*, vol. 42, no. 4, 2016, pp. 303–312. https://doi.org/10.1177/0340035216672238.

2. Tableau. "Top Data Literacy Skills for Becoming Data Literate." Tableau Software, LLC, 2023.

3. Data Management Association International. *Data Management Body of Knowledge (DMBOK) Guide*. DAMA International, 2017.

4. Data Literacy Project. “What Is Data Literacy?” *The Data Literacy Project*, Qlik, 2016, www.thedataliteracyproject.org.

5. Wang, Richard Y., and Diane M. Strong. “Beyond Accuracy: What Data Quality Means to Data Consumers.” *Journal of Management Information Systems*, vol. 12, no. 4, 1996, pp. 5–33.  

6. Forrester Research. *Data Literacy Trends*. Cambridge, MA: Forrester Research, 2022.

7. Zha, Di, et al. "Data-centric Artificial Intelligence: A Survey." arXiv, 2023. arXiv:2303.10158.

8. Benaich, Nathan, and Ian Hogarth. State of AI Report 2023. Air Street Capital, 2023. Available at State of AI Report 2023.

9. Wright, T. "Data Quality and Decision Making: The Role of Confidence in Business Data." Journal of Information Management, vol. 14, no. 2, 2006, pp. 72–85.

10. Newman, H. "EIM Governance and Logical Data Models: A Comparative Study." International Journal of Information Systems, vol. 10, no. 4, 2006, pp. 245–260.

11. Atlan. "5 Data Governance Examples: Case Studies, Takeaways & More." Atlan Blog, https://www.atlan.com/data-governance-case-studies. Accessed 18 Nov. 2024.

12. MedInsight. "Analytic Maturity in Data Governance, Quality & Literacy." MedInsight Blog, https://www.medinsight.com/analytic-maturity-data-governance. Accessed 18 Nov. 2024.

13. Palmer, Carole. "Dealing with Data: A Case Study on Information and Data Management Literacy." PLOS Biology, https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1001171. Accessed 18 Nov. 2024.

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/pingmian/61150.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

iOS 18 导航栏插入动画会导致背景短暂变白的解决

问题现象 在最新的 iOS 18 系统中,如果我们执行导航栏的插入动画,可能会造成导航栏背景短暂地变为白色: 如上图所示:我们分别向主视图和 Sheet 弹出视图的导航栏插入了消息,并应用了动画效果。可以看到,前者的导航栏背景会在消息插入那一霎那“变白”,而后者则没有任何…

SPP:空间金字塔池化

今天水一篇博客,讲讲SPP池化结构;那这是个什么东西呢?它的作用又是什么呢?在了解它之前我们先简单了解一下大部分的神经网络; 引入: 在大部分的神经网络中,都将神经网络分为Backbone主干网络、…

网络安全与防范

1.重要性 随着互联网的发达,各种WEB应用也变得越来越复杂,满足了用户的各种需求,但是随之而来的就是各种网络安全的问题。了解常见的前端攻击形式和保护我们的网站不受攻击是我们每个优秀fronter必备的技能。 2.分类 XSS攻击CSRF攻击网络劫…

Python从0到100(七十三):Python OpenCV-OpenCV实现手势虚拟拖拽

前言: 零基础学Python:Python从0到100最新最全教程。 想做这件事情很久了,这次我更新了自己所写过的所有博客,汇集成了Python从0到100,共一百节课,帮助大家一个月时间里从零基础到学习Python基础语法、Pyth…

Spring Cloud Alibaba [Gateway]网关。

1 简介 网关作为流量的入口,常用功能包括路由转发、权限校验、限流控制等。而springcloudgateway 作为SpringCloud 官方推出的第二代网关框架,取代了Zuul网关。 1.1 SpringCloudGateway特点: (1)基于Spring5,支持响应…

ssm139选课排课系统的设计与开发+vue(论文+源码)_kaic

摘 要 互联网的普及,改变了人们正常的生活学习及消费习惯,而且也大大的节省了人们的时间,由于各种管理系统都再不断的增加,更方便了用户,也改良了很多的用户习惯。对于选课排课系统查询方面缺乏系统的管理方式&#x…

网络基础 - NAT 篇

一、全局 IP 地址(公网 IP 地址)和私有 IP 地址 RFC 1918 规定了用于组建局域网的私有 IP 地址: 10.0.0.0 ~ 10.255.255.255172.16.0.0 ~ 172.31.255.255192.168.0.0 ~ 192.168.255.255 包含在以上范围内的 IP 地址都属于私有 IP 地址,而在此之外的 I…

Springboot3.3.5 启动流程之 tomcat启动流程介绍

在文章 Springboot3.3.5 启动流程(源码分析) 中讲到 应用上下文(applicationContext)刷新(refresh)时使用模板方法 onRefresh 创建了 Web Server. 本文将详细介绍 ServletWebServer — Embedded tomcat 的启动流程。 首先&…

NPOI 实现Excel模板导出

记录一下使用NPOI实现定制的Excel导出模板&#xff0c;已下实现需求及主要逻辑 所需Json数据 对应参数 List<PurQuoteExportDataCrInput> listData [{"ItemName": "电缆VV3*162*10","Spec": "电缆VV3*162*10","Uom":…

DAY113代码审计-PHPTP框架微P系统漏审项目等

一、环境安装 导入数据 Debug 版本信息收集 一、不安全写法的sql注入&#xff08;拼接写法绕过预编译机制&#xff09; 1、Good.php的不安全写法 2、查找可以参数 3、找路由关系 application/index/controller/Goods.php http://172.19.1.236:8833/index.php/index/goods/aj…

Flink1.19编译并Standalone模式本地运行

1.首先下载源码 2.本地运行 新建local_conf和local_lib文件夹&#xff0c;并且将编译后的文件放入对应的目录 2.1 启动前参数配置 2.1.2 StandaloneSessionClusterEntrypoint启动参数修改 2.1.3 TaskManagerRunner启动参数修改 和StandaloneSessionClusterEntrypoint一样修改…

Ascend C算子性能优化实用技巧05——API使用优化

Ascend C是CANN针对算子开发场景推出的编程语言&#xff0c;原生支持C和C标准规范&#xff0c;兼具开发效率和运行性能。使用Ascend C&#xff0c;开发者可以基于昇腾AI硬件&#xff0c;高效的实现自定义的创新算法。 目前已经有越来越多的开发者使用Ascend C&#xff0c;我们…

计算机编程中的测试驱动开发(TDD)及其在提高代码质量中的应用

&#x1f493; 博客主页&#xff1a;瑕疵的CSDN主页 &#x1f4dd; Gitee主页&#xff1a;瑕疵的gitee主页 ⏩ 文章专栏&#xff1a;《热点资讯》 计算机编程中的测试驱动开发&#xff08;TDD&#xff09;及其在提高代码质量中的应用 计算机编程中的测试驱动开发&#xff08;T…

前后端交互之动态列

一. 情景 在做项目时&#xff0c;有时候后会遇到后端使用了聚合函数&#xff0c;导致生成的对象的属性数量或数量不固定&#xff0c;因此无法建立一个与之对应的对象来向前端传递数据&#xff0c;这时可以采用NameDataListVO向前端传递数据。 Data Builder AllArgsConstructo…

[笔记]L6599的极限工作条件考量

0.名词 OTP over tempature protect.OCP over current protectOVP over voltage protectBrownout Protection Undervoltage Protection可能需要考虑hysteresis response.因为要考虑一些高频干扰 1.基本的过流保护逻辑 参考&#xff1a;ST L6599 器件手册 LLC开关电源&#…

【Pikachu】XML外部实体注入实战

若天下不定&#xff0c;吾往&#xff1b;若世道不平&#xff0c;不回&#xff01; 1.XXE漏洞实战 首先写入一个合法的xml文档 <?xml version "1.0"?> <!DOCTYPE gfzq [<!ENTITY gfzq "gfzq"> ]> <name>&gfzq;</name&…

多模块集成swagger(knife4j-spring-boot-starter)

前言 单体项目、多模块单体项目、微服务项目&#xff0c;集成的方案大同小异&#xff0c;微服务会在网关做个聚合&#xff0c;后面再补充。 依赖版本 目前demo的版本如下&#xff1a; spring boot 2.7.3spring cloud 2021.0.4spring cloud alibaba 2021.0.4.0knife4j-sprin…

DataStream编程模型之数据源、数据转换、数据输出

Flink之DataStream数据源、数据转换、数据输出&#xff08;scala&#xff09; 0.前言–数据源 在进行数据转换之前&#xff0c;需要进行数据读取。 数据读取分为4大部分&#xff1a; &#xff08;1&#xff09;内置数据源&#xff1b; 又分为文件数据源&#xff1b; socket…

CSS盒子的定位>(上篇)#定位属性#相对定位-附练习

一、定位属性 1.定位方式 position属性可以选择4种不同类型的定位方式。 语法格式&#xff1a;position&#xff1a;relation | absolute | fixed参数&#xff1a;①relative生成相对定位的元素&#xff0c;相对于其正常位置进行定位。 ②absolute生成绝对定位的…

Redis/Codis性能瓶颈揭秘:网卡软中断的影响与优化

目录 现象回顾 问题剖析 现场分析 解决方案 总结与反思 1.调整中断亲和性&#xff08;IRQ Affinity&#xff09;&#xff1a; 2.RPS&#xff08;Receive Packet Steering&#xff09;和 RFS&#xff08;Receive Flow Steering&#xff09;&#xff1a; 近期&#xff0c;…