2019独角兽企业重金招聘Python工程师标准>>>
准备工作
1、查看数据库的信息
了解当前的Mysql数据库的版本和平台以及字符集等相关信息
mysql> status
--------------
mysql Ver 14.14 Distrib 5.6.17, for Win64 (x86_64)Connection id: 4
Current database:
Current user: root@localhost
SSL: Not in use
Using delimiter: ;
Server version: 5.6.17-log MySQL Community Server (GPL)
Protocol version: 10
Connection: localhost via TCP/IP
Server characterset: utf8
Db characterset: utf8
Client characterset: gbk
Conn. characterset: gbk
TCP port: 3306
Uptime: 37 min 55 secThreads: 2 Questions: 35 Slow queries: 2 Opens: 294
Flush tables: 1 Open tables: 286 Queries per second avg: 0.015
--------------
2、检查是否支持分区
MySQL从5.1版本开始支持分区的功能。
mysql> show plugins;
+----------------------------+----------+--------------------+---------+---------+
| Name | Status | Type | Library | License |
+----------------------------+----------+--------------------+---------+---------+
| binlog | ACTIVE | STORAGE ENGINE | NULL | GPL |
| mysql_native_password | ACTIVE | AUTHENTICATION | NULL | GPL |
| mysql_old_password | ACTIVE | AUTHENTICATION | NULL | GPL |
| sha256_password | ACTIVE | AUTHENTICATION | NULL | GPL |
| CSV | ACTIVE | STORAGE ENGINE | NULL | GPL |
| MEMORY | ACTIVE | STORAGE ENGINE | NULL | GPL |
| MyISAM | ACTIVE | STORAGE ENGINE | NULL | GPL |
| MRG_MYISAM | ACTIVE | STORAGE ENGINE | NULL | GPL |
| ARCHIVE | ACTIVE | STORAGE ENGINE | NULL | GPL |
| BLACKHOLE | ACTIVE | STORAGE ENGINE | NULL | GPL |
| FEDERATED | DISABLED | STORAGE ENGINE | NULL | GPL |
| InnoDB | ACTIVE | STORAGE ENGINE | NULL | GPL |
// 中间略
| PERFORMANCE_SCHEMA | ACTIVE | STORAGE ENGINE | NULL | GPL |
| partition | ACTIVE | STORAGE ENGINE | NULL | GPL |
+----------------------------+----------+--------------------+---------+---------+
42 rows in set (0.01 sec)
注意:之前版本的MySQL有have_partitioning这个变量,可以查看是否支持分区,但MySQL 5.6.1开始该变量被废弃并经被删除。
mysql> select version();
+------------+
| version() |
+------------+
| 5.6.17-log |
+------------+
1 row in set (0.07 sec)mysql> show variables like 'have_partitioning';
Empty set (0.00 sec)mysql> show variables like 'have_part%';
Empty set (0.00 sec)mysql> select version();
+-----------+
| version() |
+-----------+
| 5.5.19 |
+-----------+
1 row in set (0.00 sec)mysql> show variables like 'have_part%';
+-------------------+-------+
| Variable_name | Value |
+-------------------+-------+
| have_partitioning | YES |
+-------------------+-------+
1 row in set (0.00 sec)
同样可以查询INFORMATION_SCHEMA.PLUGINS表来验证是否支持分区。
mysql> SELECT PLUGIN_NAME, PLUGIN_VERSION, PLUGIN_STATUS
FROM information_schema.PLUGINS WHERE PLUGIN_TYPE = 'STORAGE ENGINE';
+--------------------+----------------+---------------+
| PLUGIN_NAME | PLUGIN_VERSION | PLUGIN_STATUS |
+--------------------+----------------+---------------+
| binlog | 1.0 | ACTIVE |
| CSV | 1.0 | ACTIVE |
| MEMORY | 1.0 | ACTIVE |
| MyISAM | 1.0 | ACTIVE |
| MRG_MYISAM | 1.0 | ACTIVE |
| ARCHIVE | 3.0 | ACTIVE |
| BLACKHOLE | 1.0 | ACTIVE |
| FEDERATED | 1.0 | DISABLED |
| InnoDB | 5.6 | ACTIVE |
| PERFORMANCE_SCHEMA | 0.1 | ACTIVE |
| partition | 1.0 | ACTIVE |
+--------------------+----------------+---------------+
11 rows in set (0.03 sec)
如果输出数据中无Active状态的partition插件,那么说明该MySQL版本不支持分区(partitioning )。
分区类型
分区有利于管理非常大的表,分区键用于根据某个区间值(或者范围值)、特定值列表或者HASH函数值执行数据的聚集,让数据根据规则分布在不同的分区中。
RANGE分区:基于一个给定连续区间范围,把数据分配到不同的分区。
LIST分区:类似RANGE分区,区别在LIST分区是基于枚举出的值列表分区,RANGE是基于给定的连续区间范围分区。
HASH分区:基于给定的分区个数,把数据分配到不同的分区。
KEY分区:类似于HASH分区。
Columns分区:支持多列分区。
注意:
RANGE分区、LIST分区、HASH分区都要求分区键必须是INT类型,或者通过表达式返回INT类型。
无论是哪种MySQL分区类型,要么分区表上没有主键/唯一键,要么分区表的主键/唯一键都必须包含分区键,也就是说不能使用主键/唯一键字段之外的其他字段作为分区键盘。
-- 原始数据
CREATE TABLE `my_sample_bdmain` (`id` int(12) COMMENT '不能使用主键,不可自增',`dmain` varchar(512) DEFAULT NULL COMMENT '不可使用UNIQUE约束',`isip` int(4) DEFAULT NULL COMMENT '1:ip,2:域名',`isImport` int(4) DEFAULT NULL COMMENT '1:手动,2:审核',`isreopen` int(4) DEFAULT NULL COMMENT '1:封堵,2:解封',`blocktime` datetime DEFAULT CURRENT_TIMESTAMP COMMENT '时间',`reason` varchar(200) DEFAULT NULL COMMENT '原因',`remark` varchar(200) DEFAULT NULL COMMENT '备注',`isabroad` varchar(8) DEFAULT NULL COMMENT '归属情况(0-国内,1-国际)',`is_ip_block` int(11) DEFAULT '0' COMMENT '是否为IP封堵(默认为0;1为ip)',`block_user` varchar(16) DEFAULT NULL COMMENT '人员'
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='测试数据(2009-2015),共计500W条'-- 创建测试使用的表
create table my_test_bdmain like my_sample_bdmain;-- 对于域名及封堵时间列分别添加索引
alter table my_test_bdmain add index sample_bdmain_sdmain_idx (dmain(255));
alter table my_test_bdmain add index sample_bdmain_blktime_idx (blocktime);-- 导入数据到test表,后续分区表测试与该表对比,原始数据不变
insert into my_test_bdmain select * from my_sample_bdmain;-- 原始数据
CREATE TABLE `my_test_sdmain` (`id` int(11) NOT NULL DEFAULT 0 COMMENT '不能使用主键,不可自增',`dmain` varchar(512) COMMENT '名称, 不可使用UNIQUE约束',`ipstr` varchar(1024) COMMENT '对应IP',`iplocal` varchar(8) COMMENT '网内网外(0:网内; 1:网外; 2:网内+网外)',`ipabroad` varchar(8) COMMENT '境内境外(0:境内; 1:境外; 2:境内+境外)',`sendtime` datetime COMMENT '上报时间',`dmaintype` int(3) DEFAULT '1' COMMENT '0:正常;1:不良;2:违法:3其他',`dmainsource` varchar(16) COMMENT '来源(1、2、3、4、5、6)',`accpoint` varchar(70) COMMENT '网站类型(0为WAP;1为WWW;2为WAP+WWW)',`serviceip` varchar(150) NOT NULL COMMENT '汇聚服务器IP地址',`sourcename` varchar(150) COMMENT '原始ZIP包名称',`dmain_handle_flag` varchar(150) COMMENT '原始ZIP包标识',`dmainflag` int(11) DEFAULT 4 COMMENT '1:疑似,2:特定名称拨测,3:钓鱼',`flag` int(1) COMMENT '处理标识(1:需处理; 2:过滤; 3:重现; 4:处理中)',`createtime` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间'
) ENGINE=INNODB DEFAULT CHARSET=utf8 COMMENT='测试数据(4月份),共80W条数据'
RANGE分区
http://dev.mysql.com/doc/refman/5.6/en/partitioning-range.html
-- 示例分区表:
CREATE TABLE t_employees_range_part (id INT NOT NULL,name VARCHAR(30),hired DATE NOT NULL DEFAULT '1970-01-01',dept_id INT NOT NULL comment '部门'
) ENGINE=INNODB DEFAULT CHARSET=utf8 COMMENT='测试RANAGE分区'
PARTITION BY RANGE (dept_id) (PARTITION p1 VALUES LESS THAN (6),PARTITION p2 VALUES LESS THAN (11),PARTITION p3 VALUES LESS THAN (16),PARTITION p4 VALUES LESS THAN MAXVALUE
);-- 按年范围创建RANGE分区表
mysql> CREATE TABLE `t_bdmain_year_range_part` (`id` int(12) COMMENT '不能使用主键,不可自增',`dmain` varchar(512) DEFAULT NULL COMMENT '不可使用UNIQUE约束',`isip` int(4) DEFAULT NULL COMMENT '1:ip,2:名称',`isImport` int(4) DEFAULT NULL COMMENT '1:人工,2:非人工',`isreopen` int(4) DEFAULT NULL COMMENT '1:封堵,2:解封',`blocktime` datetime DEFAULT CURRENT_TIMESTAMP COMMENT '日期时间类型,不可为timestamp',`reason` varchar(200) DEFAULT NULL COMMENT '原因',`remark` varchar(200) DEFAULT NULL COMMENT '备注',`isabroad` varchar(8) DEFAULT NULL COMMENT '归属情况(0-国内,1-国际)',`is_ip_block` int(11) DEFAULT '0' COMMENT '是否为IP封堵(默认为0;1表示IP)',`block_user` varchar(16) DEFAULT NULL COMMENT '人员') ENGINE=INNODB DEFAULT CHARSET=utf8 COMMENT='按年份创建的RANGE分区表'PARTITION BY RANGE(YEAR(blocktime)) (PARTITION p1 VALUES LESS THAN (2010),PARTITION p2 VALUES LESS THAN (2011),PARTITION p3 VALUES LESS THAN (2012),PARTITION p4 VALUES LESS THAN (2013),PARTITION p5 VALUES LESS THAN (2014),PARTITION p6 VALUES LESS THAN (2015),PARTITION p7 VALUES LESS THAN MAXVALUE);
Query OK, 0 rows affected (2.70 sec)-- 对于dmain及时间列分别添加索引
alter table t_bdmain_year_range_part add index year_part_bdmain_sdmain_idx (dmain(255));
alter table t_bdmain_year_range_part add index year_part_bdmain_blktime_idx (blocktime);mysql> SELECT TABLE_NAME, TABLE_SCHEMA, PARTITION_NAME, TABLE_ROWS, AVG_ROW_LENGTH, DATA_LENGTHFROM INFORMATION_SCHEMA.PARTITIONSWHERE TABLE_SCHEMA = 'big_data' AND TABLE_NAME ='t_bdmain_year_range_part';
+--------------------------+--------------+----------------+------------+----------------+-------------+
| TABLE_NAME | TABLE_SCHEMA | PARTITION_NAME | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH |
+--------------------------+--------------+----------------+------------+----------------+-------------+
| t_bdmain_year_range_part | big_data | p1 | 0 | 0 | 16384 |
| t_bdmain_year_range_part | big_data | p2 | 0 | 0 | 16384 |
| t_bdmain_year_range_part | big_data | p3 | 0 | 0 | 16384 |
| t_bdmain_year_range_part | big_data | p4 | 0 | 0 | 16384 |
| t_bdmain_year_range_part | big_data | p5 | 0 | 0 | 16384 |
| t_bdmain_year_range_part | big_data | p6 | 0 | 0 | 16384 |
| t_bdmain_year_range_part | big_data | p7 | 0 | 0 | 16384 |
+--------------------------+--------------+----------------+------------+----------------+-------------+
7 rows in set (0.04 sec)-- 添加分区表数据
insert into t_bdmain_year_range_part select * from my_sample_bdmain;-- 验证分区表
mysql> SELECT TABLE_NAME, TABLE_SCHEMA, PARTITION_NAME, TABLE_ROWS, AVG_ROW_LENGTH, DATA_LENGTHFROM INFORMATION_SCHEMA.PARTITIONSWHERE TABLE_SCHEMA = 'big_data' AND TABLE_NAME ='t_bdmain_year_range_part';
+--------------------------+--------------+----------------+------------+----------------+-------------+
| TABLE_NAME | TABLE_SCHEMA | PARTITION_NAME | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH |
+--------------------------+--------------+----------------+------------+----------------+-------------+
| t_bdmain_year_range_part | big_data | p1 | 5055 | 314 | 1589248 |
| t_bdmain_year_range_part | big_data | p2 | 377145 | 112 | 42549248 |
| t_bdmain_year_range_part | big_data | p3 | 1949760 | 77 | 151715840 |
| t_bdmain_year_range_part | big_data | p4 | 784214 | 78 | 61440000 |
| t_bdmain_year_range_part | big_data | p5 | 558226 | 104 | 58294272 |
| t_bdmain_year_range_part | big_data | p6 | 1179288 | 109 | 128614400 |
| t_bdmain_year_range_part | big_data | p7 | 335621 | 145 | 48840704 |
+--------------------------+--------------+----------------+------------+----------------+-------------+
分区表在磁盘上的存储结构为(在我本机上innodb_file_per_table=1)
分区表于基本表查询效率对比
a. 普通表
-- \G后面如果有;号会有错误输出ERROR: No query specified
mysql> explain select * from my_test_bdmain where dmain = 'www.19lou.com'\G
*************************** 1. row ***************************id: 1select_type: SIMPLEtable: my_test_bdmaintype: ref
possible_keys: sample_bdmain_sdmain_idxkey: sample_bdmain_sdmain_idxkey_len: 768ref: constrows: 3Extra: Using where
1 row in set (0.00 sec)mysql>
mysql> explain select * from my_test_bdmain where blocktime >= '2013-01-01 00:00:00' and blocktime <= '2013-01-10 23:59:59'\G
*************************** 1. row ***************************id: 1select_type: SIMPLEtable: my_test_bdmaintype: range
possible_keys: sample_bdmain_blktime_idxkey: sample_bdmain_blktime_idxkey_len: 6ref: NULLrows: 71422Extra: Using index condition
1 row in set (0.00 sec)-- 当blocktime的结束时间到某一特定的时刻时就会出现不走索引的情况
mysql> explain select * from my_test_bdmain where blocktime >= '2013-01-01 00:00:00' and blocktime <= '2013-08-31 23:59:59'\G
*************************** 1. row ***************************id: 1select_type: SIMPLEtable: my_test_bdmaintype: ALL
possible_keys: sample_bdmain_blktime_idxkey: NULLkey_len: NULLref: NULLrows: 5020180Extra: Using where
1 row in set (0.03 sec)
b. 分区表
mysql> explain select * from t_bdmain_year_range_part where dmain = 'www.19lou.com'\G
*************************** 1. row ***************************id: 1select_type: SIMPLEtable: t_bdmain_year_range_parttype: ref
possible_keys: year_part_bdmain_sdmain_idxkey: year_part_bdmain_sdmain_idxkey_len: 768ref: constrows: 6Extra: Using where
1 row in set (0.00 sec)mysql> explain partitions select * from t_bdmain_year_range_partwhere dmain = 'www.19lou.com'\G
*************************** 1. row ***************************id: 1select_type: SIMPLEtable: t_bdmain_year_range_partpartitions: p1,p2,p3,p4,p5,p6,p7type: ref
possible_keys: year_part_bdmain_sdmain_idxkey: year_part_bdmain_sdmain_idxkey_len: 768ref: constrows: 6Extra: Using where
1 row in set (0.00 sec)mysql> explain partitions select * from t_bdmain_year_range_partwhere blocktime >= '2013-01-01 00:00:00' and blocktime <= '2013-01-10 23:59:59'\G
*************************** 1. row ***************************id: 1select_type: SIMPLEtable: t_bdmain_year_range_partpartitions: p5type: range
possible_keys: year_part_bdmain_blktime_idxkey: year_part_bdmain_blktime_idxkey_len: 6ref: NULLrows: 68108Extra: Using where
1 row in set (0.00 sec)-- 分区全扫描,如果是从blocktime查询的时候是从2013-2014则扫描两个分区
mysql> explain partitions select * from t_bdmain_year_range_partwhere blocktime >= '2013-01-01 00:00:00' and blocktime <= '2013-12-20 23:59:59'\G
*************************** 1. row ***************************id: 1select_type: SIMPLEtable: t_bdmain_year_range_partpartitions: p5type: ALL
possible_keys: year_part_bdmain_blktime_idxkey: NULLkey_len: NULLref: NULLrows: 558226Extra: Using where
1 row in set (0.83 sec)
创建月或天分区的分区表
-- 按月范围创建的RANGE分区表
mysql> CREATE TABLE `t_bdmain_month_range_part` (`id` int(12) COMMENT '不能使用主键,不可自增',`dmain` varchar(512) DEFAULT NULL COMMENT '不可使用UNIQUE约束',`isip` int(4) DEFAULT NULL COMMENT '1:ip,2:名称',`isImport` int(4) DEFAULT NULL COMMENT '1:人工,2:非人工',`isreopen` int(4) DEFAULT NULL COMMENT '1:封堵,2:解封',`blocktime` datetime DEFAULT CURRENT_TIMESTAMP COMMENT '日期时间类型,不可为timestamp',`reason` varchar(200) DEFAULT NULL COMMENT '原因',`remark` varchar(200) DEFAULT NULL COMMENT '备注',`isabroad` varchar(8) DEFAULT NULL COMMENT '归属情况(0-国内,1-国际)',`is_ip_block` int(11) DEFAULT '0' COMMENT '是否为IP封堵(默认为0;1为ip)',`block_user` varchar(16) DEFAULT NULL COMMENT '人员') ENGINE=INNODB DEFAULT CHARSET=utf8 COMMENT='按月创建的RANGE分区表'PARTITION BY RANGE(MONTH(blocktime)) (PARTITION p1 VALUES LESS THAN (2),PARTITION p2 VALUES LESS THAN (3),PARTITION p3 VALUES LESS THAN (4),PARTITION p4 VALUES LESS THAN (5),PARTITION p5 VALUES LESS THAN (6),PARTITION p6 VALUES LESS THAN (7),PARTITION p7 VALUES LESS THAN (8),PARTITION p8 VALUES LESS THAN (9),PARTITION p9 VALUES LESS THAN (10),PARTITION p10 VALUES LESS THAN (11),PARTITION p11 VALUES LESS THAN (12),PARTITION p12 VALUES LESS THAN MAXVALUE);
Query OK, 0 rows affected (10.69 sec)mysql>
-- 对于名称及封堵时间列分别添加索引
alter table t_bdmain_month_range_part add index month_part_bdmain_sdmain_idx (dmain(255));
alter table t_bdmain_month_range_part add index month_part_bdmain_blktime_idx (blocktime);mysql> explain select * from t_bdmain_month_range_part where dmain = 'www.19lou.com'\G
*************************** 1. row ***************************id: 1select_type: SIMPLEtable: t_bdmain_month_range_parttype: ref
possible_keys: month_part_bdmain_sdmain_idxkey: month_part_bdmain_sdmain_idxkey_len: 768ref: constrows: 11Extra: Using where
1 row in set (0.00 sec)mysql> explain partitions select * from t_bdmain_month_range_partwhere blocktime >= '2013-01-01 00:00:00' and blocktime <= '2013-01-31 23:59:59'\G
*************************** 1. row ***************************id: 1select_type: SIMPLEtable: t_bdmain_month_range_partpartitions: p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,p11,p12type: range
possible_keys: month_part_bdmain_blktime_idxkey: month_part_bdmain_blktime_idxkey_len: 6ref: NULLrows: 471106Extra: Using where
1 row in set (0.03 sec)mysql> explain partitions select * from t_bdmain_month_range_partwhere blocktime >= '2013-01-01 00:00:00' and blocktime <= '2013-10-31 23:59:59'\G
*************************** 1. row ***************************id: 1select_type: SIMPLEtable: t_bdmain_month_range_partpartitions: p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,p11,p12type: ALL
possible_keys: month_part_bdmain_blktime_idxkey: NULLkey_len: NULLref: NULLrows: 5201742Extra: Using where
1 row in set (0.00 sec)-- 按日期范围创建的RANGE分区表
mysql> CREATE TABLE `t_bdmain_day_range_part` (`id` int(12) COMMENT '不能使用主键,不可自增',`dmain` varchar(512) DEFAULT NULL COMMENT '不可使用UNIQUE约束',`isip` int(4) DEFAULT NULL COMMENT '1:ip,2:名称',`isImport` int(4) DEFAULT NULL COMMENT '1:人工,2:非人工',`isreopen` int(4) DEFAULT NULL COMMENT '1:封堵,2:解封',`blocktime` datetime DEFAULT CURRENT_TIMESTAMP COMMENT '日期时间类型,不可为timestamp',`reason` varchar(200) DEFAULT NULL COMMENT '原因',`remark` varchar(200) DEFAULT NULL COMMENT '备注',`isabroad` varchar(8) DEFAULT NULL COMMENT '归属情况(0-国内,1-国际)',`is_ip_block` int(11) DEFAULT '0' COMMENT '是否为IP封堵(默认为0;1为ip)',`block_user` varchar(16) DEFAULT NULL COMMENT '人员') ENGINE=INNODB DEFAULT CHARSET=utf8 COMMENT='按时间范围创建的RANGE分区表'PARTITION BY RANGE(TO_DAYS(blocktime)) (PARTITION p1 VALUES LESS THAN (TO_DAYS('2010-01-01 00:00:00')),PARTITION p2 VALUES LESS THAN (TO_DAYS('2011-01-01 00:00:00')),PARTITION p3 VALUES LESS THAN (TO_DAYS('2012-01-01 00:00:00')),PARTITION p4 VALUES LESS THAN (TO_DAYS('2013-01-01 00:00:00')),PARTITION p5 VALUES LESS THAN (TO_DAYS('2014-01-01 00:00:00')),PARTITION p6 VALUES LESS THAN (TO_DAYS('2015-01-01 00:00:00')),PARTITION p7 VALUES LESS THAN MAXVALUE);Query OK, 0 rows affected (3.65 sec)-- 对于名称及封堵时间列分别添加索引
alter table t_bdmain_day_range_part add index day_part_bdmain_sdmain_idx (dmain(255));
alter table t_bdmain_day_range_part add index day_part_bdmain_blktime_idx (blocktime);mysql> explain select * from t_bdmain_day_range_part where dmain = 'www.19lou.com'\G
*************************** 1. row ***************************id: 1
select_type: SIMPLEtable: t_bdmain_day_range_parttype: ref
possible_keys: day_part_bdmain_sdmain_idxkey: day_part_bdmain_sdmain_idxkey_len: 768ref: constrows: 4Extra: Using where
1 row in set (0.00 sec)mysql> explain partitions select * from t_bdmain_day_range_partwhere blocktime >= '2013-01-01 00:00:00' and blocktime <= '2014-08-31 23:59:59'\G
*************************** 1. row ***************************id: 1
select_type: SIMPLEtable: t_bdmain_day_range_part
partitions: p1,p2type: ALL
possible_keys: day_part_bdmain_blktime_idxkey: NULLkey_len: NULLref: NULLrows: 4574554Extra: Using where
1 row in set (0.13 sec)--001:blocktime timestamp
-- ERROR 1486 (HY000): Constant, random or timezone-dependent expressions
-- in (sub)partitioning function are not allowed
-- 说明:分区函数不允许为常量、随机数或者与时区相关联的表达式,如RANGE(1970)、RANGE(rand())-- 002:`id` int(11) PRIMARY KEY AUTO_INCREMENT
-- ERROR 1503 (HY000): A PRIMARY KEY must include all columns in the table's partitioning function
-- id不能具有主键属性(PRIMARY KEY AUTO_INCREMENT),分区键必须要包含unique约束的一部分-- 按时间范围创建的RANGE分区表
mysql> CREATE TABLE `t_sdmain_time_range_part` (`id` int(11) NOT NULL DEFAULT 0 COMMENT '不能使用主键,不可自增',`dmain` varchar(512) COMMENT '名称, 不可使用UNIQUE约束',`ipstr` varchar(1024) COMMENT '对应IP',`iplocal` varchar(8) COMMENT '网内网外(0:网内; 1:网外; 2:网内+网外)',`ipabroad` varchar(8) COMMENT '境内境外(0:境内; 1:境外; 2:境内+境外)',`sendtime` datetime COMMENT '上报时间',`dmaintype` int(3) DEFAULT '1' COMMENT '0:正常;1:不良;2:违法:3其他',`dmainsource` varchar(16) COMMENT '来源(1、2、3、4、5、6)',`accpoint` varchar(70) COMMENT '网站类型 0为WAP;1为WWW;2为WAP+WWW',`serviceip` varchar(150) NOT NULL COMMENT '解析入库的汇聚服务器IP地址',`sourcename` varchar(150) COMMENT '原始ZIP包名称',`dmain_handle_flag` varchar(150) COMMENT '原始ZIP包标识',`dmainflag` int(11) DEFAULT 4 COMMENT '1 疑似违规;2 特定拨测;3 疑似钓鱼',`flag` int(1) COMMENT '处理标识(1:需处理; 2:过滤; 3:重现; 4:处理中)',`createtime` timestamp DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间,只能为timestamp' ) ENGINE=INNODB DEFAULT CHARSET=utf8 COMMENT='按时间创建的RANGE分区表'PARTITION BY RANGE(UNIX_TIMESTAMP(createtime)) (PARTITION p1 VALUES LESS THAN (UNIX_TIMESTAMP('2014-06-01 00:00:00')),PARTITION p2 VALUES LESS THAN (UNIX_TIMESTAMP('2015-01-01 00:00:00')),PARTITION p3 VALUES LESS THAN (UNIX_TIMESTAMP('2015-06-01 00:00:00')),PARTITION p4 VALUES LESS THAN (UNIX_TIMESTAMP('2016-01-01 00:00:00')),PARTITION p5 VALUES LESS THAN MAXVALUE);Query OK, 0 rows affected (3.65 sec)Any other expressions involving TIMESTAMP values are not permitted. (See Bug #42849.)
-- Any other expressions involving TIMESTAMP values are not permitted. (See Bug #42849.)
LIST分区
List分区类似于Range分区,只是Range提供了范围,List提供以逗号分割的数值列表。
http://dev.mysql.com/doc/refman/5.6/en/partitioning-list.html
示例程序
CREATE TABLE employees_list_part_by_int (id INT NOT NULL,name VARCHAR(30),hired DATE NOT NULL DEFAULT '1970-01-01',dept_id INT NOT NULL comment '部门'
) ENGINE=INNODB DEFAULT CHARSET=utf8 COMMENT='测试LIST分区-基于整型'
PARTITION BY LIST(dept_id) (PARTITION p1 VALUES IN (1, 2, 3, 4, 5),PARTITION p2 VALUES IN (6, 7, 8, 9 ,10),PARTITION p3 VALUES IN (11, 12, 13, 14, 15),PARTITION p5 VALUES IN (16, 17, 18, 19, 20)
);
-- 说明:List分区无MaxValue的概念,必须指定所有值CREATE TABLE employees_list_part_by_varchar (id INT NOT NULL,name VARCHAR(30),hired DATE NOT NULL DEFAULT '1970-01-01',job_code VARCHAR(30) DEFAULT NULL COMMENT '职务:manager,staff'
) ENGINE=INNODB DEFAULT CHARSET=utf8 COMMENT='测试LIST分区-基于VARCHAR类型'
PARTITION BY LIST(job_code) (PARTITION p1 VALUES IN ('manager'),PARTITION p2 VALUES IN ('staff')
);
ERROR 1697 (HY000): VALUES value for partition 'p1' must have type INT
MySQL 5.6 provides support for LIST COLUMNS partitioning. This is a variant of LIST partitioning that enables you to use columns of types other than integer types for partitioning columns, as well as to use multiple columns as partitioning keys.
CREATE TABLE `t_sdmain_dmainsource_list_part` (`id` int(11) NOT NULL DEFAULT 0 COMMENT '不能使用主键,不可自增',`dmain` varchar(512) COMMENT '名称, 不可使用UNIQUE约束',`ipstr` varchar(1024) COMMENT '对应IP',`iplocal` varchar(8) COMMENT '网内网外(0:网内; 1:网外; 2:网内+网外)',`ipabroad` varchar(8) COMMENT '境内境外(0:境内; 1:境外; 2:境内+境外)',`sendtime` datetime COMMENT '上报时间',`dmaintype` int(3) DEFAULT '1' COMMENT '0:正常;1:不良;2:违法:3其他',`dmainsource` int(3) COMMENT '来源(1、2、3、4、5、6)--必须为int类型',`accpoint` varchar(70) COMMENT '网站类型 0为WAP;1为WWW;2为WAP+WWW',`serviceip` varchar(150) NOT NULL COMMENT '解析入库的汇聚服务器IP地址',`sourcename` varchar(150) COMMENT '原始ZIP包名称',`dmain_handle_flag` varchar(150) COMMENT '原始ZIP包标识',`dmainflag` int(11) DEFAULT 4 COMMENT '1 疑似违规;2 特定拨测;3 疑似钓鱼',`flag` int(1) COMMENT '处理标识(1:需处理; 2:过滤; 3:重现; 4:处理中)',`createtime` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间'
) ENGINE=INNODB DEFAULT CHARSET=utf8 COMMENT='根据大区类型创建的分区'PARTITION BY LIST(dmainsource) (PARTITION p1 VALUES IN (1),PARTITION p2 VALUES IN (2),PARTITION p3 VALUES IN (3),PARTITION p4 VALUES IN (4),PARTITION p5 VALUES IN (5),PARTITION p6 VALUES IN (6)
);-- dmainsource = 1或者dmainsource in (1)或者dmainsource in ('1')均走分区
mysql> explain partitions select * from t_sdmain_dmainsource_list_partwhere dmainsource = 1\G
*************************** 1. row ***************************id: 1select_type: SIMPLEtable: t_sdmain_dmainsource_list_partpartitions: p1type: ALL
possible_keys: NULLkey: NULLkey_len: NULLref: NULLrows: 221144Extra: Using where
1 row in set (0.00 sec)mysql> explain partitions select * from t_source_sdmain_201504 where dmainsource = 1\G
*************************** 1. row ***************************id: 1select_type: SIMPLEtable: t_source_sdmain_201504partitions: NULLtype: ALL
possible_keys: NULLkey: NULLkey_len: NULLref: NULLrows: 700330Extra: Using where
1 row in set (0.00 sec)
COLUMNS分区
包括RANGE COLUMNS 和 LIST COLUMNS两种,分别为RANGE及LIST类型分区的变种。
允许作为分区键数据类型如下:
整型类型
支持TINYINT, SMALLINT, MEDIUMINT, INT (INTEGER), 以及BIGINT,其他数值类型不支持,如DECIMAL、FLOAT
日期类型
支持DATE 和 DATETIME,不支持其他日期、时间类型
字符类型
支持 CHAR, VARCHAR, BINARY, 以及 VARBINARY,不支持TEXT、 BLOB
RANGE COLUMNS分区:
RANGE columns分区类似于range分区,允许基于多列值定义分区范围。另外,可以使用非整型类型来定义分区范围。
RANGE COLUMNS分区主要在如下方面区别于RANGE分区:
a、RANGE COLUMNS不接受表达式,只允许提供列名称;
b、RANGE COLUMNS可以使用一列或多列;
RANGE Columns分区键的比较是基于元组的比较,也就是基于字段组的比较。
c、RANGE COLUMNS不再限定分区键必须为整型(integer),字符串, DATE 以及 DATETIME同样允许使用。
创建RANGE COLUMNS分区表的语法如下:
CREATE TABLE partition_table_name
PARTITIONED BY RANGE COLUMNS(column_list) (PARTITION partition_name VALUES LESS THAN (value_list)[,PARTITION partition_name VALUES LESS THAN (value_list)][,...]
)
column_list:column_name[, column_name][, ...]
value_list:value[, value][, ...] -- 示例如下
CREATE TABLE my_range_column_part_001 (a INT,b INT
)
PARTITION BY RANGE COLUMNS(a,b) (PARTITION p0 VALUES LESS THAN (0,10),PARTITION p1 VALUES LESS THAN (10,20),PARTITION p2 VALUES LESS THAN (10,30),PARTITION p3 VALUES LESS THAN (10,35),PARTITION p4 VALUES LESS THAN (20,40),PARTITION p5 VALUES LESS THAN (MAXVALUE, MAXVALUE));mysql> CREATE TABLE my_range_column_part_002 (a INT,b INT,c CHAR(3),d INT)PARTITION BY RANGE COLUMNS(a,d,c) (PARTITION p0 VALUES LESS THAN (5,10,'ggg'),PARTITION p1 VALUES LESS THAN (10,20,'mmmm'),PARTITION p2 VALUES LESS THAN (15,30,'sss'),PARTITION p3 VALUES LESS THAN (MAXVALUE, MAXVALUE, MAXVALUE));
Query OK, 0 rows affected (0.15 sec) CREATE TABLE my_range_column_part_003 (first_name VARCHAR(25),last_name VARCHAR(25),street_1 VARCHAR(30),street_2 VARCHAR(30),city VARCHAR(15),renewal DATE
)
PARTITION BY RANGE COLUMNS(renewal) (PARTITION pWeek_1 VALUES LESS THAN('2010-02-09'),PARTITION pWeek_2 VALUES LESS THAN('2010-02-15'),PARTITION pWeek_3 VALUES LESS THAN('2010-02-22'),PARTITION pWeek_4 VALUES LESS THAN('2010-03-01')
);
-- 不再需要讲Date类型转换成返回整型的表达式了
LIST COLUMNS分区:
分区键只能使用列名称,而不允许使用表达式。
CREATE TABLE my_list_column_part_001 (first_name VARCHAR(25),last_name VARCHAR(25),street_1 VARCHAR(30),street_2 VARCHAR(30),city VARCHAR(15),renewal DATE
)
PARTITION BY LIST COLUMNS(city) (PARTITION pRegion_1 VALUES IN('Oskarshamn', 'H?gsby', 'M?nster?s'),PARTITION pRegion_2 VALUES IN('Vimmerby', 'Hultsfred', 'V?stervik'),PARTITION pRegion_3 VALUES IN('N?ssj?', 'Eksj?', 'Vetlanda'),PARTITION pRegion_4 VALUES IN('Uppvidinge', 'Alvesta', 'V?xjo')
);CREATE TABLE my_list_column_part_002 (first_name VARCHAR(25),last_name VARCHAR(25),street_1 VARCHAR(30),street_2 VARCHAR(30),city VARCHAR(15),renewal DATE
)
PARTITION BY LIST COLUMNS(renewal) (PARTITION pWeek_1 VALUES IN('2010-02-01', '2010-02-02', '2010-02-03','2010-02-04', '2010-02-05', '2010-02-06', '2010-02-07'),PARTITION pWeek_2 VALUES IN('2010-02-08', '2010-02-09', '2010-02-10','2010-02-11', '2010-02-12', '2010-02-13', '2010-02-14'),PARTITION pWeek_3 VALUES IN('2010-02-15', '2010-02-16', '2010-02-17','2010-02-18', '2010-02-19', '2010-02-20', '2010-02-21'),PARTITION pWeek_4 VALUES IN('2010-02-22', '2010-02-23', '2010-02-24','2010-02-25', '2010-02-26', '2010-02-27', '2010-02-28')
);
HASH分区
http://dev.mysql.com/doc/refman/5.6/en/partitioning-hash.html
指定分区字段,由MySQL数据库来决定数据存储到哪一个分区。
CREATE TABLE () PARTITION BY HASH (expr)
说明:
expr表达式需要返回一个整数值(must return a nonconstant, nonrandom integer value)。
每一次的insert或update操作都会执行expr的计算,因此负责的expr将会引发性能问题,特别是影响大量数据行的操作(如批量添加)。
CREATE TABLE employees_hash_part_by_int (id INT NOT NULL,name VARCHAR(30),hired DATE NOT NULL DEFAULT '1970-01-01',dept_id INT NOT NULL comment '部门'
)
PARTITION BY HASH(dept_id)
PARTITIONS 4;CREATE TABLE employees_hash_part_by_varchar (id INT NOT NULL,name VARCHAR(30),hired DATE NOT NULL DEFAULT '1970-01-01',dept_id INT NOT NULL comment '部门'
)
PARTITION BY HASH(name)
PARTITIONS 4;
-- ERROR 1659 (HY000): Field 'name' is of a not allowed type for this type of partitioning
-- 说明:若未指定PARTITIONS NUM语句,默认的partitions数量为1,如果省略NUM则报错。
最有效的hash功能是基于单个表中值固定增加或减少的列。
MySQL通过HASH函数来确认数据应该存储于哪一个分区,计算方式为N = MOD(expr, num)
CREATE TABLE t1 (col1 INT, col2 CHAR(5), col3 DATE)
PARTITION BY HASH( YEAR(col3) )
PARTITIONS 4;
当向表t1的col3这一列插入值'2015-08-20'时,该行数据存入哪一个分区按照如下语句确认:
MOD(YEAR('2015-08-20'),4)
= MOD(2015,4)
= 3
业务示例
CREATE TABLE `t_sdmain_day_hash_part` (`id` int(11) NOT NULL DEFAULT 0 COMMENT '不能使用主键,不可自增',`dmain` varchar(512) COMMENT '名称, 不可使用UNIQUE约束',`ipstr` varchar(1024) COMMENT '对应IP',`iplocal` varchar(8) COMMENT '网内网外(0:网内; 1:网外; 2:网内+网外)',`ipabroad` varchar(8) COMMENT '境内境外(0:境内; 1:境外; 2:境内+境外)',`sendtime` datetime COMMENT '上报时间',`dmaintype` int(3) DEFAULT '1' COMMENT '0:正常;1:不良;2:违法:3其他',`dmainsource` int(3) COMMENT '来源(1、2、3、4、5、6)--必须为int类型',`accpoint` varchar(70) COMMENT '网站类型 0为WAP;1为WWW;2为WAP+WWW',`serviceip` varchar(150) NOT NULL COMMENT '解析入库的汇聚服务器IP地址',`sourcename` varchar(150) COMMENT '原始ZIP包名称',`dmain_handle_flag` varchar(150) COMMENT '原始ZIP包标识',`dmainflag` int(11) DEFAULT 4 COMMENT '1 疑似违规;2 特定拨测;3 疑似钓鱼',`flag` int(1) COMMENT '处理标识(1:需处理; 2:过滤; 3:重现; 4:处理中)',`createtime` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间'
) ENGINE=INNODB DEFAULT CHARSET=utf8 COMMENT='按天区表'PARTITION BY HASH(DAY(createtime))PARTITIONS 4;-- 按日期查询
mysql> explain partitions select * from t_sdmain_day_hash_partwhere createtime = '2015-04-10 07:49:15'\G
*************************** 1. row ***************************id: 1select_type: SIMPLEtable: t_sdmain_day_hash_partpartitions: p2type: ALL
possible_keys: NULLkey: NULLkey_len: NULLref: NULLrows: 143910Extra: Using where
1 row in set (0.00 sec)-- HASH分区有个问题,就是where条件是指定的一个范围的话,将会扫描所有分区,
-- 这就达不到使用分区表来减少扫描范围获得性能的提高的目的。mysql> explain partitions select * from t_sdmain_day_hash_partwhere createtime >= '2015-03-31 00:03:22' and createtime <= '2015-03-31 01:49:15'\G
*************************** 1. row ***************************id: 1select_type: SIMPLEtable: t_sdmain_day_hash_partpartitions: p0,p1,p2,p3type: ALL
possible_keys: NULLkey: NULLkey_len: NULLref: NULLrows: 789530Extra: Using where
1 row in set (0.00 sec)-- 以上语句就是扫描所有分区.
线性HASH分区
MySQL同样支持线性哈希分区,采用powers-of-two algorithm来确认数据存储于哪一个分区中。
CREATE TABLE employees_linear_part (id INT NOT NULL,name VARCHAR(30),hired DATE NOT NULL DEFAULT '1970-01-01',dept_id INT NOT NULL comment '部门'
)
PARTITION BY LINEAR HASH( YEAR(hired) )
PARTITIONS 4;
对于给定的expr,分区数目num,MySQL计算数据存储于具体哪个分区算法如下:
1、Find the next power of 2 greater than num. We call this value V; it can be calculated as:V = POWER(2, CEILING(LOG(2, num)))(Suppose that num is 13. Then LOG(2,13) is 3.7004397181411. CEILING(3.7004397181411) is 4, and V = POWER(2,4), which is 16.)2、Set N = F(column_list) & (V - 1).3、While N >= num:Set V = CEIL(V / 2)Set N = N & (V - 1)
示例如下:
CREATE TABLE my_linear_part_001 (col1 INT, col2 CHAR(5), col3 DATE)
PARTITION BY LINEAR HASH( YEAR(col3) )
PARTITIONS 6;insert into my_linear_part_001(col1, col2, col3) values(1, 'hello', '2003-04-14');
insert into my_linear_part_001(col1, col2, col3) values(2, 'hello', '1998-10-19');
V = POWER(2, CEILING( LOG(2,6) )) = 8
a. 判定2003-04-14这一行数据存储于哪个分区
N = YEAR('2003-04-14') & (8 - 1)= 2003 & 7= 3(3 >= 6 is FALSE: record stored in partition #3)b. 判定1998-10-19这一行数据存储于哪个分区
N = YEAR('1998-10-19') & (8-1)= 1998 & 7= 6(6 >= 6 is TRUE: additional step required)N = 6 & CEILING(8 / 2)= 6 & 3= 2(2 >= 6 is FALSE: record stored in partition #2)
KEY分区
http://dev.mysql.com/doc/refman/5.6/en/partitioning-key.html
Key分区类似于Hash分区,只是Hash分区是基于用户提供的expr来计算属于哪个分区,而Key分区的hash函数是由MySQL Server提供的,这些函数是基于与PASSWORD()一样的运算法则。
Key分区于Hash分区区别:
1、PARTITION BY KEY而非PARTITION BY HASH
2、KEY只能指定零个或多个列名称。
KEY takes only a list of zero or more column names. Any columns used as the partitioning key must comprise part or all of the table's primary key, if the table has one. Where no column name is specified as the partitioning key, the table's primary key is used, if there is one.
CREATE TABLE k_part_by_pk (id INT NOT NULL PRIMARY KEY,name VARCHAR(20) DEFAULT NULL
)
PARTITION BY KEY() -- 采用主键来分区
PARTITIONS 2;
mysql> ALTER TABLE k_part_by_pk DROP PRIMARY KEY;
ERROR 1488 (HY000): Field in list of fields for partition function not found in tableCREATE TABLE k_part_by_unique_key (id INT NOT NULL,name VARCHAR(20) DEFAULT NULL,UNIQUE KEY (name)
)
PARTITION BY KEY() -- 采用UNIQUE KEY来分区
PARTITIONS 2;CREATE TABLE k_part_by_unique_key_with_null (id INT,name VARCHAR(20) DEFAULT NULL,UNIQUE KEY (id)
)
PARTITION BY KEY() -- UNIQUE KEY未加NOT NULL将报错
PARTITIONS 2;
ERROR 1488 (HY000): Field in list of fields for partition function not found in tableCREATE TABLE k_part_by_varchar (id INT NOT NULL,name VARCHAR(20) DEFAULT NULL
)
PARTITION BY KEY(name)
PARTITIONS 2;CREATE TABLE k_part_by_int_and_varchar (id INT NOT NULL,name VARCHAR(20) DEFAULT NULL
)
PARTITION BY KEY(id, name)
PARTITIONS 2;-- MySQL支持Linear Key形式的分区表
CREATE TABLE k_part_of_linear_key (col1 INT NOT NULL,col2 CHAR(5),col3 DATE
)
PARTITION BY LINEAR KEY (col1)
PARTITIONS 3;-- 业务示例
CREATE TABLE `t_sdmain_sdmain_key_part` (`id` int(11) NOT NULL DEFAULT 0 COMMENT '不能使用主键,不可自增',`dmain` varchar(512) COMMENT '名称, 不可使用UNIQUE约束',`ipstr` varchar(4000) COMMENT '对应IP',`iplocal` varchar(8) COMMENT '网内网外(0:网内; 1:网外; 2:网内+网外)',`ipabroad` varchar(8) COMMENT '境内境外(0:境内; 1:境外; 2:境内+境外)',`sendtime` datetime COMMENT '上报时间',`dmaintype` int(3) DEFAULT '1' COMMENT '0:正常;1:不良;2:违法:3其他',`dmainsource` varchar(16) COMMENT '来源(1、2、3、4、5、6)',`accpoint` varchar(70) COMMENT '网站类型 0为WAP;1为WWW;2为WAP+WWW',`serviceip` varchar(150) NOT NULL COMMENT '解析入库的汇聚服务器IP地址',`sourcename` varchar(150) COMMENT '原始ZIP包名称',`dmain_handle_flag` varchar(150) COMMENT '原始ZIP包标识',`dmainflag` int(11) DEFAULT 4 COMMENT '1 疑似违规;2 特定拨测;3 疑似钓鱼',`flag` int(1) COMMENT '处理标识(1:需处理; 2:过滤; 3:重现; 4:处理中)',`createtime` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间'
) ENGINE=INNODB DEFAULT CHARSET=utf8 COMMENT='按名称名称分区表'PARTITION BY KEY(dmain)PARTITIONS 6;-- dmain字段的长度不能超过1023,否则报错如下
-- ERROR 1660 (HY000): The total length of the partitioning fields is too large
-- 说明:该问题暂未解决(对于截取名称作为分区依据同样报错KEY(LEFT(dmain, 50)))。mysql> explain partitions select * from t_sdmain_sdmain_key_partwhere dmain in ('s3-us-west-2.amazonaws.com', 'aaa077.com', 'mydress.com')\G
*************************** 1. row ***************************id: 1select_type: SIMPLEtable: t_sdmain_sdmain_key_partpartitions: p2,p4type: ALL
possible_keys: NULLkey: NULLkey_len: NULLref: NULLrows: 516467Extra: Using where
1 row in set (0.00 sec)-- Key分区和HASH分区一样,有范围扫描的问题。