MYSQL PARTITIONING分区操作和性能测试

PARTITION OR NOT PARTITION IN MYSQl

Bill Karwin says “In most circumstances, you’re better off using indexes instead of partitioning as your main method of query optimization.”
According to RICK JAMES: “It is so tempting to believe that PARTITIONing will solve performance problems. But it is so often wrong.”
let’s find out what’s going on by building a test case

TWO TABLES READY

How many partitions? views from Rick James: Have 20-50 partitions; no more.
In this page, we do 10 partitions
Remember: Always test your real case.

  1. Partition table with 10 partitions
CREATE TABLE points_partition 
(id INT NOT NULL AUTO_INCREMENT,x FLOAT,y FLOAT,z FLOAT,created_time DATETIME,PRIMARY KEY(id, created_time))
PARTITION BY RANGE( YEAR(created_time) ) (PARTITION p16 VALUES less than (2016),PARTITION p17 VALUES less than (2017),PARTITION p18 VALUES less than (2018),PARTITION p19 VALUES less than (2019),PARTITION p20 VALUES less than (2020),PARTITION p21 VALUES less than (2021),PARTITION p22 VALUES less than (2022),PARTITION p23 VALUES less than (2023),PARTITION p24 VALUES less than (2024),PARTITION p25 VALUES less than (2025)
) ;
  1. Normal table
CREATE TABLE points_full_table 
(id INT NOT NULL AUTO_INCREMENT,x FLOAT,y FLOAT,z FLOAT,created_time DATETIME,PRIMARY KEY(id, created_time));

Create millions of rows

For test case, each table holds 10 millions of rows
If using mysql to insert, example 2 is better than example 1

-- sql example 1
INSERT INTO `table1` (`field1`, `field2`) VALUES ("data1", "data2");
INSERT INTO `table1` (`field1`, `field2`) VALUES ("data1", "data2");
INSERT INTO `table1` (`field1`, `field2`) VALUES ("data1", "data2");
-- sql example 2
INSERT INTO `table1` (`field1`, `field2`) VALUES ("data1", "data2"),("data1", "data2"),("data1", "data2");

Add large data with tools

from faker import Faker
import randomdef insert_large_data(nums=10):fake = Faker()data = [(random.random(), random.random(), random.random(),str(fake.date_time_between(start_date='-10y', end_date='now'))) for i in range(nums)]cursor = connection.cursor()sql = f"INSERT INTO points_partition (x, y, z, created_time) VALUES (%s, %s, %s, %s)"# execute sql with your idea tool

DB-status

partition table take extra files to preserve data, also, extra disk space
请添加图片描述
partition table
请添加图片描述

TEST RESULTS WITHOUT EXTRA INDEX(created_time)

test-1
select SQL_NO_CACHE * from sample.points_partition where created_time > '2024-01-01' limit 100;

FROM: explain

idselect_typetablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra
1SIMPLEpoints_partitionp25ALL91162533.33Using where
idselect_typetablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra
1SIMPLEpoints_full_tableALL974720733.33Using where

FROM:mysqlslap

# partition_table
BenchmarkRunning for engine innodbAverage number of seconds to run all queries: 0.156 secondsMinimum number of seconds to run all queries: 0.156 secondsMaximum number of seconds to run all queries: 0.156 secondsNumber of clients running queries: 10Average number of queries per client: 10
# full_table
BenchmarkRunning for engine innodbAverage number of seconds to run all queries: 0.172 secondsMinimum number of seconds to run all queries: 0.172 secondsMaximum number of seconds to run all queries: 0.172 secondsNumber of clients running queries: 10Average number of queries per client: 10

In general, it is expected that fewer touched rows would result in less time for query execution.
since this query only required limit rows under condition without order, mysql optimizer is doing a good job here.
the worse case for the full table is that do a full table scan, but to get just 100 target rows from random data, much less time is needed.

however, if we put a order by in where clause, things will be a huge different.

test-2
select SQL_NO_CACHE * from sample.points_partition where created_time > '2024-01-01' order by created_time limit 100;

FROM: explain

idselect_typetablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra
1SIMPLEpoints_partitionp25ALL91162533.33Using where; Using filesort
idselect_typetablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra
1SIMPLEpoints_full_tableALL974720733.33Using where; Using filesort

FROM:mysqlslap

# partition table
BenchmarkRunning for engine innodbAverage number of seconds to run all queries: 4.931 secondsMinimum number of seconds to run all queries: 4.931 secondsMaximum number of seconds to run all queries: 4.931 secondsNumber of clients running queries: 10Average number of queries per client: 10
# full table
BenchmarkRunning for engine innodbAverage number of seconds to run all queries: 54.652 secondsMinimum number of seconds to run all queries: 54.652 secondsMaximum number of seconds to run all queries: 54.652 secondsNumber of clients running queries: 10Average number of queries per client: 10

A huge time gap between two queries.
what’ going on?
under condition of “order by”
a full table needs a full table-field sort, that’s cost a lot,
a partition table only need to sort a partition after located target partition.
we always say: test your real case, by this way, you find your circumstance to do a partition table.

WHY:In most circumstances, you’re better off using indexes instead of partitioning

the test are not done yet
From mysql explain, the extra field print a message: “Using filesort”
normally, you should considering a index here to improve performance: MYSQL: explain-extra-information

let’s add a index

ALTER TABLE `points_partition` ADD INDEX `created_time_index` (`created_time`);
ALTER TABLE `points_full_table` ADD INDEX `created_time_index` (`created_time`);

TEST RESULTS WITH INDEX

test-3
select SQL_NO_CACHE * from sample.points_partition where created_time > '2024-01-01' limit 100;

FROM: explain

idselect_typetablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra
1SIMPLEpoints_partitionp25rangecreated_time_indexcreated_time_index5455812100.00Using index condition
idselect_typetablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra
1SIMPLEpoints_full_tablerangecreated_time_indexcreated_time_index52641784100.00Using index condition; Using MRR

FROM: mysqlslap

# partition table
BenchmarkRunning for engine innodbAverage number of seconds to run all queries: 0.168 secondsMinimum number of seconds to run all queries: 0.168 secondsMaximum number of seconds to run all queries: 0.168 secondsNumber of clients running queries: 10Average number of queries per client: 10
# full table
BenchmarkRunning for engine innodbAverage number of seconds to run all queries: 0.368 secondsMinimum number of seconds to run all queries: 0.368 secondsMaximum number of seconds to run all queries: 0.368 secondsNumber of clients running queries: 10Average number of queries per client: 10

again: In general, it is expected that fewer touched rows would result in less time for query execution.
new queries cost a little more time than without extra index.
what happens? explain shows “condition index” are being used here.
stop here, it’s not how indexes are introduced.
sometimes, index is not help if the goal was retrieve 100 target rows. the worst case, yes, but not all.

let’s put a “order by” to see the magic

test-4
select SQL_NO_CACHE * from sample.points_partition where created_time > '2024-01-01' order by created_time limit 100;

FROM: explain

idselect_typetablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra
1SIMPLEpoints_partitionp25rangecreated_time_indexcreated_time_index5455812100.00Using index condition
idselect_typetablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra
1SIMPLEpoints_full_tablerangecreated_time_indexcreated_time_index52641784100.00Using index condition

FROM: mysqlslap

# partition table
BenchmarkRunning for engine innodbAverage number of seconds to run all queries: 0.162 secondsMinimum number of seconds to run all queries: 0.162 secondsMaximum number of seconds to run all queries: 0.162 secondsNumber of clients running queries: 10Average number of queries per client: 10
# full table
BenchmarkRunning for engine innodbAverage number of seconds to run all queries: 0.185 secondsMinimum number of seconds to run all queries: 0.185 secondsMaximum number of seconds to run all queries: 0.185 secondsNumber of clients running queries: 10Average number of queries per client: 10

same touched rows as no “order by”.
but the time cost of queries are getting really closed.
makes sense “In this circumstance, you’re better off using indexes instead of partitioning”.
after all, there are different types of queries were influenced and Maintenance of PARTITION is also a big thing.
For example: select count() is much slower for partition tables. unless doing a partition count()

more tests?
let’s stop here

table vs (better view)

key/typepartitionnormalpartition+ordernormal+orderpartition+indexnormal+indexpartition+order+indexnormal+order+index
diskspace~590m~540m~590m~540m~750m~700m~750m~700m
mysqlslap-benchmark0.156s0.172s4.931s54.652s0.168s0.368s0.162s0.185s
mysql-explain-touched-rows9116259747207911625974720745581226417844558122641784
index////created_time_indexcreated_time_indexcreated_time_indexcreated_time_index

POINTS BASED ON TEST(mysqlslap & mysql workbench)

  1. Index works good without partitioning, most of cases even better
  2. Under condition of range query by partition field, partitioning tables works good indeed
  3. drop partitions is much more efficient when doing a big delete
  4. if queries use specific partition, performance will better

Other Points Related & documents & Links:

  1. Partitioning mainly helps when your full table is larger than RAM
  2. No partitioning without million rows, Only BY RANGE provides any performance…
  3. index order(DESC or ASC) is also important
  4. mysqlslap–benchmark tool
  5. questions about partition

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/web/62184.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

深入解析 Loss 减少方式:mean和sum的区别及其在大语言模型中的应用 (中英双语)

深入解析 Loss 减少方式:mean 和 sum 的区别及其在大语言模型中的应用 在训练大语言模型(Large Language Models, LLM)时,损失函数(Loss Function)的处理方式对模型的性能和优化过程有显著影响。本文以 re…

基于 AutoFlow 快速搭建基于 TiDB 向量搜索的本地知识库问答机器人

导读 本文将详细介绍如何通过 PingCAP 开源项目 AutoFlow 实现快速搭建基于 TiDB 的本地知识库问答机器人。如果提前准备好 Docker、TiDB 环境,整个搭建过程估计在 10 分钟左右即可完成,无须开发任何代码。 文中使用一篇 TiDB 文档作为本地数据源作为示…

生信技能63 - 构建gnomAD变异位点的SQLite查询数据库

将数据量巨大的gnomAD数据库,通过SQLite数据库寻找gnomAD中存在的各种变异注释信息(如等位基因计数,深度,次要等位基因频率等),查询300.000个变量的查询需要大约40秒,通过染色体编号+位置+REF+ALT即可进行快速查询。 1. gnomAD变异注释VCF文件字段 gnomAD VCF各版本包…

【前端】将vue的方法挂载到window上供全局使用,也方便跟原生js做交互

【前端】将vue的方法挂载到window上供全局使用&#xff0c;也方便跟原生js做交互 <template><div><el-button click"start">调用方法</el-button></div> </template> <script> // import { JScallbackProc } from ./JScal…

基于XML的AOP开发

AOP 为 Aspect Oriented Programming 的缩写&#xff0c;意思为面向切面编程。 AOP相关术语&#xff1a; 目标对象(Target)&#xff1a; 你要去代理的对象&#xff0c;可以理解为之前很单纯的那个对象。 代理对象(Proxy)&#xff1a; 你把你那个单纯的对象给我&#xff0c…

记录blender学习过程中遇到的问题

物体发射的方向不对 被发射物体&#xff08;例如一棵树&#xff09;n键看旋转归0 切换正视图 将被发射物体的局部坐标的Z轴 指向 全局方向的X轴时 并且把粒子系统设置的物体旋转勾选上 方向就对了 做倒角发现有问题 检查缩放应用、面朝向、有没有重合点&#xff08;融合点&am…

Ubuntu系统中Redis的安装步骤及服务配置

目录 内容概括 系统环境 安装方式 1、apt包管理器安装 &#xff08;1&#xff09;安装redis服务 &#xff08;2&#xff09;安装客户端&#xff08;进入命令行操作使用&#xff0c;包含redis-cli&#xff09; &#xff08;3&#xff09;安装检验 &#xff08;4&#xf…

半导体设备中的微型导轨应如何选择合适的润滑油?

微型导轨的润滑对于保证其高精度和高稳定性至关重要&#xff0c;尤其是在半导体设备中&#xff0c;微型导轨的润滑油选择需要考虑多个因素&#xff0c;以确保设备的最佳性能和寿命。以下是一些关键点&#xff1a; 1、黏度&#xff1a;润滑油的黏度是影响其流动性和润滑效果的重…

RocketMq详解:六、RocketMq的负载均衡机制

上一章&#xff1a;《SpringBootAop实现RocketMq的幂等》 文章目录 1.背景1.1 什么是负载均衡1.2 负载均衡的意义 2.RocketMQ消息消费2.1 消息的流转过程2.2 Consumer消费消息的流程 3.RocketMq的负载均衡策略3.1 Broker负载均衡3.2 Producer发送消息负载均衡3.3 消费端的负载均…

yocto的xxx.bb文件在什么时候会拷贝文件到build目录

在 Yocto 中&#xff0c;.bb 文件用于描述如何构建和安装一个软件包&#xff0c;而文件在构建过程中的拷贝操作通常会在某些特定的步骤中进行。具体来说&#xff0c;文件会在以下几个阶段被拷贝到 build 目录&#xff08;或者更准确地说&#xff0c;拷贝到目标目录 ${D}&#x…

主打极致性价比,AMD RX 8600/8800显卡定了

*以下内容仅为网络爆料及传闻&#xff0c;一切以官方消息为准。 这谁能想到&#xff0c;率先掏出下一代桌面独立显卡的不是老大哥 NVIDIA&#xff0c;也不是 AMD&#xff0c;反而是三家中存在感最弱的 Intel&#xff01; 就在 12 月 3 日&#xff0c;Intel 正式发布了自家第二…

数组哪些方法会触发Vue监听,哪些不会触发监听

发现宝藏 前些天发现了一个巨牛的人工智能学习网站&#xff0c;通俗易懂&#xff0c;风趣幽默&#xff0c;忍不住分享一下给大家。【宝藏入口】。 在 Vue 中&#xff0c;数组的变化是通过 响应式 系统来监听的。Vue 使用 getter 和 setter 来追踪数组的变化&#xff0c;并在数…

npm, yarn, pnpm之间的区别

前言 在现代化的开发中&#xff0c;一个人可能同时开发多个项目&#xff0c;安装的项目越来越多&#xff0c;所随之安装的依赖包也越来越臃肿&#xff0c;而且有时候所安装的速度也很慢&#xff0c;甚至会安装失败。 因此我们就需要去了解一下&#xff0c;我们的包管理器&#…

工业检测基础-工业相机选型及应用场景

以下是一些常见的工业检测相机种类、检测原理、应用场景及选型依据&#xff1a; 2D相机 检测原理&#xff1a;基于二维图像捕获&#xff0c;通过分析图像的明暗、纹理、颜色等信息来检测物体的特征和缺陷.应用场景&#xff1a;广泛应用于平面工件的外观检测&#xff0c;如检测…

C语言连接数据库

文章目录 一、初始化数据库二、创建数据库连接三、执行增删改查语句1、增删改2、查 四、执行增删改查语句 接下来我简单的介绍一下怎么用C语言连接数据库。 初始化数据库创建数据库连接执行增删改查语句关闭数据库连接 一、初始化数据库 // 数据库初始化 MYSQL mysql; MYSQL* r…

优化LabVIEW数据运算效率的方法

在LabVIEW中进行大量数据运算时&#xff0c;提升计算效率并减少时间占用是开发过程中常遇到的挑战。为此&#xff0c;可以从多个角度着手优化&#xff0c;包括合理选择数据结构与算法、并行处理、多线程技术、硬件加速、内存管理和界面优化等。通过采用这些策略&#xff0c;可以…

开源模型应用落地-安全合规篇-用户输入价值观判断(四)

一、前言 在深度合规功能中,对用户输入内容的价值观判断具有重要意义。这一功能不仅仅是对信息合法性和合规性的简单审核,更是对信息背后隐含的伦理道德和社会责任的深刻洞察。通过对价值观的判断,系统能够识别可能引发不当影响或冲突的内容,从而为用户提供更安全、更和谐的…

计算机的错误计算(一百七十六)

摘要 利用某一大语言模型计算 的值&#xff0c;输出为 0 . 例1. 在某一大语言模型下&#xff0c;计算 的值。其中sin中值取弧度。结果保留16位有效数字。 直接贴图吧&#xff1a; 点评&#xff1a; &#xff08;1&#xff09;以上为一个大模型给的答案。从其回答可知&…

数据结构与算法——1204—递归分治法

1、斐波那契数列优化 使用滚动变量&#xff0c;保存当前计算结果和前两项值 (1)RAB (2)更新计算对象&#xff0c;AB&#xff0c;BR #include<iostream> using namespace std;int fun(int n) {if (n 0)return 0;if (n 1 || n 2)return 1;int num11;int num21;int su…

openstack内部rpc消息通信源码分析

我们知道openstack内部消息队列基于AMQP协议&#xff0c;默认使用的rabbitmq 消息队列。谈到rabbitmq&#xff0c;大家或许并不陌生&#xff0c;但或许会对oslo message有些陌生。openstack内部并不是直接使用rabbitmq&#xff0c;而是使用了oslo.message 。oslo.message 后端的…