MYSQL PARTITIONING分区操作和性能测试

PARTITION OR NOT PARTITION IN MYSQl

Bill Karwin says “In most circumstances, you’re better off using indexes instead of partitioning as your main method of query optimization.”
According to RICK JAMES: “It is so tempting to believe that PARTITIONing will solve performance problems. But it is so often wrong.”
let’s find out what’s going on by building a test case

TWO TABLES READY

How many partitions? views from Rick James: Have 20-50 partitions; no more.
In this page, we do 10 partitions
Remember: Always test your real case.

  1. Partition table with 10 partitions
CREATE TABLE points_partition 
(id INT NOT NULL AUTO_INCREMENT,x FLOAT,y FLOAT,z FLOAT,created_time DATETIME,PRIMARY KEY(id, created_time))
PARTITION BY RANGE( YEAR(created_time) ) (PARTITION p16 VALUES less than (2016),PARTITION p17 VALUES less than (2017),PARTITION p18 VALUES less than (2018),PARTITION p19 VALUES less than (2019),PARTITION p20 VALUES less than (2020),PARTITION p21 VALUES less than (2021),PARTITION p22 VALUES less than (2022),PARTITION p23 VALUES less than (2023),PARTITION p24 VALUES less than (2024),PARTITION p25 VALUES less than (2025)
) ;
  1. Normal table
CREATE TABLE points_full_table 
(id INT NOT NULL AUTO_INCREMENT,x FLOAT,y FLOAT,z FLOAT,created_time DATETIME,PRIMARY KEY(id, created_time));

Create millions of rows

For test case, each table holds 10 millions of rows
If using mysql to insert, example 2 is better than example 1

-- sql example 1
INSERT INTO `table1` (`field1`, `field2`) VALUES ("data1", "data2");
INSERT INTO `table1` (`field1`, `field2`) VALUES ("data1", "data2");
INSERT INTO `table1` (`field1`, `field2`) VALUES ("data1", "data2");
-- sql example 2
INSERT INTO `table1` (`field1`, `field2`) VALUES ("data1", "data2"),("data1", "data2"),("data1", "data2");

Add large data with tools

from faker import Faker
import randomdef insert_large_data(nums=10):fake = Faker()data = [(random.random(), random.random(), random.random(),str(fake.date_time_between(start_date='-10y', end_date='now'))) for i in range(nums)]cursor = connection.cursor()sql = f"INSERT INTO points_partition (x, y, z, created_time) VALUES (%s, %s, %s, %s)"# execute sql with your idea tool

DB-status

partition table take extra files to preserve data, also, extra disk space
请添加图片描述
partition table
请添加图片描述

TEST RESULTS WITHOUT EXTRA INDEX(created_time)

test-1
select SQL_NO_CACHE * from sample.points_partition where created_time > '2024-01-01' limit 100;

FROM: explain

idselect_typetablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra
1SIMPLEpoints_partitionp25ALL91162533.33Using where
idselect_typetablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra
1SIMPLEpoints_full_tableALL974720733.33Using where

FROM:mysqlslap

# partition_table
BenchmarkRunning for engine innodbAverage number of seconds to run all queries: 0.156 secondsMinimum number of seconds to run all queries: 0.156 secondsMaximum number of seconds to run all queries: 0.156 secondsNumber of clients running queries: 10Average number of queries per client: 10
# full_table
BenchmarkRunning for engine innodbAverage number of seconds to run all queries: 0.172 secondsMinimum number of seconds to run all queries: 0.172 secondsMaximum number of seconds to run all queries: 0.172 secondsNumber of clients running queries: 10Average number of queries per client: 10

In general, it is expected that fewer touched rows would result in less time for query execution.
since this query only required limit rows under condition without order, mysql optimizer is doing a good job here.
the worse case for the full table is that do a full table scan, but to get just 100 target rows from random data, much less time is needed.

however, if we put a order by in where clause, things will be a huge different.

test-2
select SQL_NO_CACHE * from sample.points_partition where created_time > '2024-01-01' order by created_time limit 100;

FROM: explain

idselect_typetablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra
1SIMPLEpoints_partitionp25ALL91162533.33Using where; Using filesort
idselect_typetablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra
1SIMPLEpoints_full_tableALL974720733.33Using where; Using filesort

FROM:mysqlslap

# partition table
BenchmarkRunning for engine innodbAverage number of seconds to run all queries: 4.931 secondsMinimum number of seconds to run all queries: 4.931 secondsMaximum number of seconds to run all queries: 4.931 secondsNumber of clients running queries: 10Average number of queries per client: 10
# full table
BenchmarkRunning for engine innodbAverage number of seconds to run all queries: 54.652 secondsMinimum number of seconds to run all queries: 54.652 secondsMaximum number of seconds to run all queries: 54.652 secondsNumber of clients running queries: 10Average number of queries per client: 10

A huge time gap between two queries.
what’ going on?
under condition of “order by”
a full table needs a full table-field sort, that’s cost a lot,
a partition table only need to sort a partition after located target partition.
we always say: test your real case, by this way, you find your circumstance to do a partition table.

WHY:In most circumstances, you’re better off using indexes instead of partitioning

the test are not done yet
From mysql explain, the extra field print a message: “Using filesort”
normally, you should considering a index here to improve performance: MYSQL: explain-extra-information

let’s add a index

ALTER TABLE `points_partition` ADD INDEX `created_time_index` (`created_time`);
ALTER TABLE `points_full_table` ADD INDEX `created_time_index` (`created_time`);

TEST RESULTS WITH INDEX

test-3
select SQL_NO_CACHE * from sample.points_partition where created_time > '2024-01-01' limit 100;

FROM: explain

idselect_typetablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra
1SIMPLEpoints_partitionp25rangecreated_time_indexcreated_time_index5455812100.00Using index condition
idselect_typetablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra
1SIMPLEpoints_full_tablerangecreated_time_indexcreated_time_index52641784100.00Using index condition; Using MRR

FROM: mysqlslap

# partition table
BenchmarkRunning for engine innodbAverage number of seconds to run all queries: 0.168 secondsMinimum number of seconds to run all queries: 0.168 secondsMaximum number of seconds to run all queries: 0.168 secondsNumber of clients running queries: 10Average number of queries per client: 10
# full table
BenchmarkRunning for engine innodbAverage number of seconds to run all queries: 0.368 secondsMinimum number of seconds to run all queries: 0.368 secondsMaximum number of seconds to run all queries: 0.368 secondsNumber of clients running queries: 10Average number of queries per client: 10

again: In general, it is expected that fewer touched rows would result in less time for query execution.
new queries cost a little more time than without extra index.
what happens? explain shows “condition index” are being used here.
stop here, it’s not how indexes are introduced.
sometimes, index is not help if the goal was retrieve 100 target rows. the worst case, yes, but not all.

let’s put a “order by” to see the magic

test-4
select SQL_NO_CACHE * from sample.points_partition where created_time > '2024-01-01' order by created_time limit 100;

FROM: explain

idselect_typetablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra
1SIMPLEpoints_partitionp25rangecreated_time_indexcreated_time_index5455812100.00Using index condition
idselect_typetablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra
1SIMPLEpoints_full_tablerangecreated_time_indexcreated_time_index52641784100.00Using index condition

FROM: mysqlslap

# partition table
BenchmarkRunning for engine innodbAverage number of seconds to run all queries: 0.162 secondsMinimum number of seconds to run all queries: 0.162 secondsMaximum number of seconds to run all queries: 0.162 secondsNumber of clients running queries: 10Average number of queries per client: 10
# full table
BenchmarkRunning for engine innodbAverage number of seconds to run all queries: 0.185 secondsMinimum number of seconds to run all queries: 0.185 secondsMaximum number of seconds to run all queries: 0.185 secondsNumber of clients running queries: 10Average number of queries per client: 10

same touched rows as no “order by”.
but the time cost of queries are getting really closed.
makes sense “In this circumstance, you’re better off using indexes instead of partitioning”.
after all, there are different types of queries were influenced and Maintenance of PARTITION is also a big thing.
For example: select count() is much slower for partition tables. unless doing a partition count()

more tests?
let’s stop here

table vs (better view)

key/typepartitionnormalpartition+ordernormal+orderpartition+indexnormal+indexpartition+order+indexnormal+order+index
diskspace~590m~540m~590m~540m~750m~700m~750m~700m
mysqlslap-benchmark0.156s0.172s4.931s54.652s0.168s0.368s0.162s0.185s
mysql-explain-touched-rows9116259747207911625974720745581226417844558122641784
index////created_time_indexcreated_time_indexcreated_time_indexcreated_time_index

POINTS BASED ON TEST(mysqlslap & mysql workbench)

  1. Index works good without partitioning, most of cases even better
  2. Under condition of range query by partition field, partitioning tables works good indeed
  3. drop partitions is much more efficient when doing a big delete
  4. if queries use specific partition, performance will better

Other Points Related & documents & Links:

  1. Partitioning mainly helps when your full table is larger than RAM
  2. No partitioning without million rows, Only BY RANGE provides any performance…
  3. index order(DESC or ASC) is also important
  4. mysqlslap–benchmark tool
  5. questions about partition

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/web/62184.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

基于 AutoFlow 快速搭建基于 TiDB 向量搜索的本地知识库问答机器人

导读 本文将详细介绍如何通过 PingCAP 开源项目 AutoFlow 实现快速搭建基于 TiDB 的本地知识库问答机器人。如果提前准备好 Docker、TiDB 环境,整个搭建过程估计在 10 分钟左右即可完成,无须开发任何代码。 文中使用一篇 TiDB 文档作为本地数据源作为示…

基于XML的AOP开发

AOP 为 Aspect Oriented Programming 的缩写,意思为面向切面编程。 AOP相关术语: 目标对象(Target): 你要去代理的对象,可以理解为之前很单纯的那个对象。 代理对象(Proxy): 你把你那个单纯的对象给我&#xff0c…

记录blender学习过程中遇到的问题

物体发射的方向不对 被发射物体(例如一棵树)n键看旋转归0 切换正视图 将被发射物体的局部坐标的Z轴 指向 全局方向的X轴时 并且把粒子系统设置的物体旋转勾选上 方向就对了 做倒角发现有问题 检查缩放应用、面朝向、有没有重合点(融合点&am…

Ubuntu系统中Redis的安装步骤及服务配置

目录 内容概括 系统环境 安装方式 1、apt包管理器安装 (1)安装redis服务 (2)安装客户端(进入命令行操作使用,包含redis-cli) (3)安装检验 (4&#xf…

半导体设备中的微型导轨应如何选择合适的润滑油?

微型导轨的润滑对于保证其高精度和高稳定性至关重要,尤其是在半导体设备中,微型导轨的润滑油选择需要考虑多个因素,以确保设备的最佳性能和寿命。以下是一些关键点: 1、黏度:润滑油的黏度是影响其流动性和润滑效果的重…

RocketMq详解:六、RocketMq的负载均衡机制

上一章:《SpringBootAop实现RocketMq的幂等》 文章目录 1.背景1.1 什么是负载均衡1.2 负载均衡的意义 2.RocketMQ消息消费2.1 消息的流转过程2.2 Consumer消费消息的流程 3.RocketMq的负载均衡策略3.1 Broker负载均衡3.2 Producer发送消息负载均衡3.3 消费端的负载均…

主打极致性价比,AMD RX 8600/8800显卡定了

*以下内容仅为网络爆料及传闻,一切以官方消息为准。 这谁能想到,率先掏出下一代桌面独立显卡的不是老大哥 NVIDIA,也不是 AMD,反而是三家中存在感最弱的 Intel! 就在 12 月 3 日,Intel 正式发布了自家第二…

npm, yarn, pnpm之间的区别

前言 在现代化的开发中,一个人可能同时开发多个项目,安装的项目越来越多,所随之安装的依赖包也越来越臃肿,而且有时候所安装的速度也很慢,甚至会安装失败。 因此我们就需要去了解一下,我们的包管理器&#…

C语言连接数据库

文章目录 一、初始化数据库二、创建数据库连接三、执行增删改查语句1、增删改2、查 四、执行增删改查语句 接下来我简单的介绍一下怎么用C语言连接数据库。 初始化数据库创建数据库连接执行增删改查语句关闭数据库连接 一、初始化数据库 // 数据库初始化 MYSQL mysql; MYSQL* r…

优化LabVIEW数据运算效率的方法

在LabVIEW中进行大量数据运算时,提升计算效率并减少时间占用是开发过程中常遇到的挑战。为此,可以从多个角度着手优化,包括合理选择数据结构与算法、并行处理、多线程技术、硬件加速、内存管理和界面优化等。通过采用这些策略,可以…

计算机的错误计算(一百七十六)

摘要 利用某一大语言模型计算 的值,输出为 0 . 例1. 在某一大语言模型下,计算 的值。其中sin中值取弧度。结果保留16位有效数字。 直接贴图吧: 点评: (1)以上为一个大模型给的答案。从其回答可知&…

数据结构与算法——1204—递归分治法

1、斐波那契数列优化 使用滚动变量&#xff0c;保存当前计算结果和前两项值 (1)RAB (2)更新计算对象&#xff0c;AB&#xff0c;BR #include<iostream> using namespace std;int fun(int n) {if (n 0)return 0;if (n 1 || n 2)return 1;int num11;int num21;int su…

openstack内部rpc消息通信源码分析

我们知道openstack内部消息队列基于AMQP协议&#xff0c;默认使用的rabbitmq 消息队列。谈到rabbitmq&#xff0c;大家或许并不陌生&#xff0c;但或许会对oslo message有些陌生。openstack内部并不是直接使用rabbitmq&#xff0c;而是使用了oslo.message 。oslo.message 后端的…

Postman自定义脚本Pre-request-script以及Test

这两个都是我们进行自定义script脚本的地方&#xff0c;分别是在请求执行的前后运行。 我们举两个可能经常运用到的场景。 (一)请求A先执行&#xff0c;请求B使用请求A响应结果作为参数。如果我们不用自定义脚本&#xff0c;可能得先执行请求A&#xff0c;然后手动复制响应结果…

总结的一些MySql面试题

目录 一&#xff1a;基础篇 二&#xff1a;索引原理和SQL优化 三&#xff1a;事务原理 四&#xff1a;缓存策略 一&#xff1a;基础篇 1&#xff1a;定义&#xff1a;按照数据结构来组织、存储和管理数据的仓库&#xff1b;是一个长期存储在计算机内的、有组织的、可共享 的…

116. UE5 GAS RPG 实现击杀掉落战利品功能

这一篇&#xff0c;我们实现敌人被击败后&#xff0c;掉落战利品的功能。首先&#xff0c;我们将创建一个新的结构体&#xff0c;用于定义掉落体的内容&#xff0c;方便我们设置掉落物。然后&#xff0c;我们实现敌人死亡时的掉落函数&#xff0c;并在蓝图里实现对应的逻辑&…

Excel技巧:如何批量调整excel表格中的图片?

插入到excel表格中的图片大小不一&#xff0c;如何做到每张图片都完美的与单元格大小相同&#xff1f;并且能够根据单元格来改变大小&#xff1f;今天分享&#xff0c;excel表格里的图片如何批量调整大小。 方法如下&#xff1a; 点击表格中的一个图片&#xff0c;然后按住Ct…

智能合约

06-智能合约 0 啥是智能合约&#xff1f; 定义 智能合约&#xff0c;又称加密合约&#xff0c;在一定条件下可直接控制数字货币或资产在各方之间转移的一种计算机程序。 角色 区块链网络可视为一个分布式存储服务&#xff0c;因为它存储了所有交易和智能合约的状态 智能合约还…

智慧油客:从初识、再识OceanBase,到全栈上线

今天&#xff0c;我们邀请了智慧油客的研发总监黄普友&#xff0c;为我们讲述智慧油客与 OceanBase 初识、熟悉和结缘的故事。 智慧油客自2016年诞生以来&#xff0c;秉持新零售的思维&#xff0c;成功从过去二十年间以“以销售产品为中心”的传统思维模式&#xff0c;转向“以…

【深度学习】手机SIM卡托缺陷检测【附链接】

一、手机SIM卡托用途 SIM卡托是用于固定和保护SIM卡的部件&#xff0c;通过连接SIM卡与手机主板的方式&#xff0c;允许设备访问移动网络&#xff0c;用户可以通过SIM卡进行通话、发送短信和使用数据服务。 二、手机SIM卡托不良影响 SIM卡接触不良&#xff0c;造成信号中断&…