System Design - basic - Sharding in horizontal scaling of databases

Sharding in horizontal scaling of databases is a technique used to distribute data across multiple database servers to enhance performance, scalability, and availability. Here’s a detailed explanation:

What is Sharding?

Sharding involves breaking up a large database into smaller, more manageable pieces called shards. Each shard holds a portion of the total data and runs on a separate database server. The shards work together to form the complete dataset.

Horizontal vs. Vertical Scaling

  • Vertical Scaling (Scaling Up): Adding more resources (CPU, RAM, storage) to a single server.
  • Horizontal Scaling (Scaling Out): Adding more servers to handle the load. Sharding is a form of horizontal scaling.

How Sharding Works

  1. Data Partitioning: Data is divided into shards based on a shard key. The shard key can be a specific column or set of columns that determines how data is distributed.
  2. Shard Key Selection: The choice of shard key is crucial as it impacts data distribution and performance. Common shard keys include:
    • Range-based Sharding: Data is divided into ranges based on the shard key. For example, if sharding by user ID, user IDs 1-1000 might go to shard 1, 1001-2000 to shard 2, and so on.
    • Hash-based Sharding: A hash function is applied to the shard key, and data is distributed based on the hash value. This helps achieve more even data distribution.
    • Geographical Sharding: Data is divided based on geographic regions.
  3. Shard Management: Each shard operates independently but is part of the overall system. Data requests are routed to the appropriate shard based on the shard key.
  4. Query Routing: A middleware or application logic is used to route queries to the correct shard(s). This ensures that the database client doesn’t need to know the details of the underlying sharding.

Benefits of Sharding

  • Scalability: Adding more shards increases the database capacity.
  • Performance: Distributing data across multiple servers can improve read and write performance by reducing the load on each server.
  • Availability: In case of a failure, only the data on the failed shard is affected, not the entire dataset.

Challenges of Sharding

  • Complexity: Managing and maintaining multiple shards can be complex.
  • Data Distribution: Uneven data distribution can lead to hotspots where some shards handle more load than others.
  • Cross-Shard Queries: Queries that span multiple shards can be more complicated and less efficient.
  • Consistency: Ensuring data consistency across shards, especially in transactions, can be challenging.

Example Scenario

Consider an online store with millions of users and transactions:

  • Shard Key: User ID
  • Shards: 4 shards (each on a separate server)
    • Shard 1: User IDs 1-250,000
    • Shard 2: User IDs 250,001-500,000
    • Shard 3: User IDs 500,001-750,000
    • Shard 4: User IDs 750,001-1,000,000

When a user with ID 123,456 logs in, the system routes the request to Shard 1. If another user with ID 678,901 makes a purchase, the request is routed to Shard 3.

Conclusion

Sharding is a powerful technique for horizontally scaling databases to handle large volumes of data and high traffic. By carefully selecting a shard key and managing shards effectively, organizations can achieve significant improvements in performance, scalability, and availability.

It seems there might be a small confusion here. The correct term is “sharding,” not “shading.” Sharding derives from the word “shard,” which means a fragment or piece of a whole. The term is used to describe the process of dividing a database into smaller, more manageable pieces.

Why is it Called Sharding?

  1. Shard: In English, a shard refers to a small part or piece of a larger object, often broken off from the main body. Similarly, in database sharding, the entire database is divided into smaller parts called shards.
  2. Fragmentation: The concept of sharding involves breaking the database into fragments or shards. Each shard is a complete and independent subset of the database that can operate on its own.
  3. Distributed Storage: By distributing these shards across multiple servers, the database can handle more load and store more data than a single server could manage on its own.

Key Concepts:

  • Shard Key: A key that determines how data is divided into shards. The shard key ensures that data is evenly distributed across the shards.
  • Shard: Each individual part of the larger database. Shards can reside on separate servers or even in different geographic locations.
  • Horizontal Scaling: Adding more servers (shards) to handle the increased load, as opposed to vertical scaling, which involves adding more resources (CPU, RAM) to a single server.

Example:

Imagine you have a large book and you tear it into smaller sections, distributing each section to different people to read. Each person has a shard of the book. Together, all the people represent the entire book, but each one holds only a part of it. This way, multiple people can read different sections at the same time, speeding up the process.

Conclusion:

Sharding is called sharding because it involves dividing a large database into smaller, manageable pieces called shards. These shards help distribute the load and data across multiple servers, improving performance and scalability. The term “shard” aptly describes these fragments of the larger whole, making the process of database partitioning both efficient and effective.

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/863377.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

【ubuntu】切换shell并显示git分支名字

y9kp 显示当前shell echo $SHELLwhich bash根据输出,例如 /bin/bash 改变shell: chsh -s /bin/bash退出重新登录 加入函数及覆盖PS1 # Function to return the current Git branch name git_branch() {# Check if the current directory is in a Git …

[数据集][目标检测]猪只状态吃喝睡站检测数据集VOC+YOLO格式530张4类别

数据集格式:Pascal VOC格式YOLO格式(不包含分割路径的txt文件,仅仅包含jpg图片以及对应的VOC格式xml文件和yolo格式txt文件) 图片数量(jpg文件个数):530 标注数量(xml文件个数):530 标注数量(txt文件个数):530 标注类别…

在Redis中使用Lua脚本实现多条命令的原子性操作

Redis作为一个高性能的键值对数据库,被广泛应用于各种场景。然而,在某些情况下,我们需要执行一系列Redis命令,并确保这些命令的原子性。这时,Lua脚本就成为了一个非常实用的解决方案。 问题的提出 假设我们有一个计数…

爱奇艺 Opal 机器学习平台:特征中心建设实践

01 综述 Opal 是爱奇艺大数据团队研发的一站式机器学习平台,旨在提升特征迭代、模型训练效率,帮助业务提高收益。整个平台覆盖了机器学习生命周期中特征生产、样本构建、模型探索、模型训练、模型部署等在内的多个关键环节。其中特征作为模型训练的基石…

基于星火大模型的群聊对话分角色要素提取挑战赛-Lora微调与prompt构造

赛题连接 https://challenge.xfyun.cn/topic/info?typerole-element-extraction&optionphb 数据集预处理 由于赛题官方限定使用了星火大模型,所以只能调用星火大模型的API或者使用零代码微调 首先训练数据很少是有129条,其中只有chat_text和info…

使用Vue CLI创建Vue项目并使用Vue Router进行基本配置的步骤

步骤 1: 安装 Vue CLI 首先,确保你的电脑上已经安装了Node.js和npm。然后,通过以下命令安装Vue CLI(如果已经安装,请跳过此步骤): npm install -g vue/cli步骤 2: 创建新的Vue项目 使用Vue CLI创建一个新…

HPC高性能计算课程(乔治亚理工)

HPC科学计算 Edmond Chow教授主页 学习笔记

【Mac】Listen 1 for Mac(最强的音乐搜索工具)软件介绍

软件介绍 Listen 1 for Mac 是一款非常方便的音乐播放软件,主要功能是集成多个音乐平台,让用户可以方便地搜索、播放和管理音乐。它是一个用 Python 语言开发的免费开源综合音乐搜索工具项目,最大的亮点在于可以搜索和播放来自网易云音乐&am…

实用的vueuseHooks,提高编码效率

文章目录 写在前面vueuse 官网安装HooksuseStorage [地址](https://vueuse.org/core/useStorage/)传统方法数据持久化 举例子传统持久化的弊端useStorage 数据持久化 举例子使用useStorage 更改存储数据使用useStorage 删除存储数据 useScriptTag [地址](https://vueuse.org/co…

matlab中simulink仿真软件的基础操作

(本内容源自《详解MATLAB/SIMULINK 通信系统建模与仿真》 刘学勇编著的第二章内容,有兴趣的可以阅读该书) 例:简单系统输入为两个不同频率的正弦、余弦信号,输出为两信号之和,建立模型。 在…

论文阅读_OpenAI嵌入+Lucene

英文名称: Vector Search with OpenAI Embeddings: Lucene Is All You Need 中文名称: 使用OpenAI嵌入进行向量搜索:只需Lucene 链接: http://arxiv.org/abs/2308.14963v1 作者: Jimmy Lin, Ronak Pradeep, Tommaso Teofili, Jasper Xian 机构: 滑铁卢大学戴维切里顿…

锁机制 -- 概述篇

锁机制 1、概述 ​  加锁是为了解决并发场景下,多个线程对同一资源同时进行操作,而导致同一线程多次操作出现结果不唯一的情况(一次操作包含多条指令)。结果不唯一发生的原因在于指令的错乱,前提条件是多线程环境及…

k8s_如何查看container拉取的镜像

当 Kubernetes (k8s) 使用 containerd 作为容器运行时时,可以通过以下方法查看 Kubernetes 集群中拉取的镜像。可以直接在每个节点上使用 containerd 的命令行工具 ctr 来查看已经拉取的镜像。 方法一:使用 ctr 查看节点上的镜像 确保 containerd 已安装并运行: 在 Kuberne…

全面解析:微软Edge浏览器支持的PDF文件操作功能

微软Edge浏览器,作为Windows 10及更高版本操作系统的默认浏览器,不仅提供了快速、安全的网页浏览体验,还内置了对PDF文件的多种操作功能。本文将详细探讨Edge浏览器支持的PDF文件操作,帮助用户更有效地利用这一功能强大的浏览器。…

双指针算法第一弹(移动零 复写零 快乐数)

目录 前言 1. 移动零 (1)题目及示例 (2)一般思路 (3)双指针解法 2. 复写零 (1)题目及示例 (2)一般解法 (3)双指针解法 3. 快…

61.ThreadLocal认识和使用

ThreadLocal介绍 ThreadLocal类用来提供给线程内部的局部变量。 这种变量在多线程环境下访问(通过get和set方法访问)时能保证各个线程的变量相对独立于其他线程内的变量。 ThreadLocal实例通常来说都是private static类型的,用于关联线程和线程上下文。 ThreadLocal的作用…

MySQL之索引创建原则

索引创建原则有哪些? 1.针对数据量较大,且查询比较频繁的表建立索引。(单表超过10w数据) 2.针对常作为查询条件(where)、排序(order by)、分组(group by)操…

Hadoop 安装与伪分布的搭建

目录 1 SSH免密登录 1.1 修改主机名称 1.2 修改hosts文件 1.3 创建hadoop用户 1.4 生成密钥对免密登录 2 搭建hadoop环境与jdk环境 2.1 将下载好的压缩包进行解压 2.2 编写hadoop环境变量脚本文件 2.3 修改hadoop配置文件,指定jdk路径 2.4 查看环境是否搭建完成 3 …

Clickhouse 常见操作

数据查询 从json array string中解析字段 json array string 为json.dumps(array(dict)) select JSONExtractString(row,"Date") as Date from( select arrayJoin(JSONExtractArrayRaw(Remarks)) as row from table x )JSONExtractArrayRaw: 将JsonS…

python中的相对路径

在Python中,相对路径是相对于当前工作目录(由os.getcwd()返回)的路径。当你想要引用当前目录、父目录或子目录中的文件或目录时,你会使用相对路径。 以下是一些常见的相对路径写法: 引用当前目录下的文件或目录&#…