SQL优化之浅见
- 一、查询优化
- 二、导入大量数据时的优化
- 三、INSERT优化
- 四、字段注释
- 五、说明
用了段mysql/Oracle/hive等数据库,对SQL语言以Mysql为例,总结一下对sql优化方面的见解,欢迎大家补充。
一、查询优化
1.在关联查询中,关联键的数据类型一定要相同,最常见的是字符串类型的数字被当作INT类型与INT类型的键进行关联,隐性类型转换会使性能受到很大影响。
2.模糊查询使用LIKE时尽量不使用左侧模糊,比如“%其实也厉害”,“%其实%”,不如使用右侧模糊查询,如:“软辅其实也%”;
3.在关联或者WHERE条件中,尽量不要对等号左侧的字段进行任何处理,以免索引失效,如将8位数字类型的字段转换为日期:
select * from a
where date_format(a.col1,'%Y-%m-%d') > '2020-09-10';
-- 性能不如:
select * from a
where a.col1 > date_format('2020-09-10','%Y%m%d');
4.如果可以,请限制大表的数据量,包括过滤无效、无意义数据,合理分区等;
5.落表也是一种很好的办法。
6.添加索引,优先考虑where/group by 使用的字段。
7.where条件中注意避免索引失效。
8.合理分区;
9.数据更新一般delete+insert into方式要比update等更新方式要快的多。
10.in 和 exists性能比较:
in:把外层表和内层表做hash连接,以内层表(可以理解为默认in条件中为小表)为驱动表;
exists:用内层表数据对外层表做LOOP循环,以外层表为驱动表。
以哪个表为驱动表就会优先使用哪个表的索引,所以如果外层表较大,内层表较小,我们一般使用in;反之使用exists,这样我们会用到内层表(大表)的索引列;(其实后来的优化器中,如果两种方法都会使用外层表做为驱动,具体还要看执行计划.)
外表大,用in;
内表大,用exists;
现在,虽然优化器的不断完善,两者在相同情况下执行是一样的,都会使用子查询的索引。如下:
mysql> EXPLAIN SELECT * FROM user_buy_product_dtl_f WHERE CNAME IN (SELECT PNAME FROM later_fo_word_record_f );
+----+--------------------+------------------------+----------------+---------------+------------+---------+------+---------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+------------------------+----------------+---------------+------------+---------+------+---------+--------------------------+
| 1 | PRIMARY | user_buy_product_dtl_f | ALL | NULL | NULL | NULL | NULL | 1016485 | Using where |
| 2 | DEPENDENT SUBQUERY | later_fo_word_record_f | index_subquery | name_index | name_index | 63 | func | 3 | Using index; Using where |
+----+--------------------+------------------------+----------------+---------------+------------+---------+------+---------+--------------------------+
2 rows in set (0.00 sec)mysql> EXPLAIN SELECT * FROM later_fo_word_record_f WHERE PNAME IN (SELECT CNAME FROM user_buy_product_dtl_f );
+----+--------------------+------------------------+----------------+---------------+-------------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+------------------------+----------------+---------------+-------------+---------+------+------+--------------------------+
| 1 | PRIMARY | later_fo_word_record_f | ALL | NULL | NULL | NULL | NULL | 36 | Using where |
| 2 | DEPENDENT SUBQUERY | user_buy_product_dtl_f | index_subquery | name_indexc | name_indexc | 63 | func | 641 | Using index; Using where |
+----+--------------------+------------------------+----------------+---------------+-------------+---------+------+------+--------------------------+
2 rows in set (0.00 sec)mysql> EXPLAIN SELECT * FROM user_buy_product_dtl_f F WHERE EXISTS (SELECT 1 FROM later_fo_word_record_f T WHERE F.CNAME = T.PNAME );
+----+--------------------+-------+------+---------------+------------+---------+--------------+---------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+------+---------------+------------+---------+--------------+---------+--------------------------+
| 1 | PRIMARY | F | ALL | NULL | NULL | NULL | NULL | 1016485 | Using where |
| 2 | DEPENDENT SUBQUERY | T | ref | name_index | name_index | 63 | test.F.CNAME | 3 | Using where; Using index |
+----+--------------------+-------+------+---------------+------------+---------+--------------+---------+--------------------------+
2 rows in set (0.00 sec)mysql> EXPLAIN SELECT * FROM later_fo_word_record_f F WHERE EXISTS (SELECT 1 FROM user_buy_product_dtl_f T WHERE F.PNAME = T.CNAME );
+----+--------------------+-------+------+---------------+-------------+---------+--------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+------+---------------+-------------+---------+--------------+------+--------------------------+
| 1 | PRIMARY | F | ALL | NULL | NULL | NULL | NULL | 36 | Using where |
| 2 | DEPENDENT SUBQUERY | T | ref | name_indexc | name_indexc | 63 | test.F.PNAME | 641 | Using where; Using index |
+----+--------------------+-------+------+---------------+-------------+---------+--------------+------+--------------------------+
2 rows in set (0.00 sec)
但是not in和not exists就不一样了:
not in 会对内外表进行全表扫描,不会用到索引;
not exists 依然会用到子查询的索引;
所以not in的效率往往不如not exists.
然而实际操作中,未必是这样的,也可能两种语法中都会使用子查询的表索引,如下later_fo_word_record_f为小表,user_buy_product_dtl_f为大表,执行计划为:
mysql> EXPLAIN SELECT * FROM user_buy_product_dtl_f WHERE CNAME NOT IN (SELECT PNAME FROM later_fo_word_record_f );
+----+--------------------+------------------------+----------------+---------------+------------+---------+------+---------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+------------------------+----------------+---------------+------------+---------+------+---------+------------------------------------+
| 1 | PRIMARY | user_buy_product_dtl_f | ALL | NULL | NULL | NULL | NULL | 1016485 | Using where |
| 2 | DEPENDENT SUBQUERY | later_fo_word_record_f | index_subquery | name_index | name_index | 63 | func | 6 | Using index; Full scan on NULL key |
+----+--------------------+------------------------+----------------+---------------+------------+---------+------+---------+------------------------------------+
2 rows in set (0.00 sec)mysql> EXPLAIN SELECT * FROM later_fo_word_record_f WHERE PNAME NOT IN (SELECT CNAME FROM user_buy_product_dtl_f );
+----+--------------------+------------------------+----------------+---------------+-------------+---------+------+------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+------------------------+----------------+---------------+-------------+---------+------+------+------------------------------------+
| 1 | PRIMARY | later_fo_word_record_f | ALL | NULL | NULL | NULL | NULL | 36 | Using where |
| 2 | DEPENDENT SUBQUERY | user_buy_product_dtl_f | index_subquery | name_indexc | name_indexc | 63 | func | 1282 | Using index; Full scan on NULL key |
+----+--------------------+------------------------+----------------+---------------+-------------+---------+------+------+------------------------------------+
2 rows in set (0.00 sec)mysql> EXPLAIN SELECT * FROM user_buy_product_dtl_f F WHERE NOT EXISTS (SELECT 1 FROM later_fo_word_record_f T WHERE F.CNAME = T.PNAME );
+----+--------------------+-------+------+---------------+------------+---------+--------------+---------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+------+---------------+------------+---------+--------------+---------+--------------------------+
| 1 | PRIMARY | F | ALL | NULL | NULL | NULL | NULL | 1016485 | Using where |
| 2 | DEPENDENT SUBQUERY | T | ref | name_index | name_index | 63 | test.F.CNAME | 3 | Using where; Using index |
+----+--------------------+-------+------+---------------+------------+---------+--------------+---------+--------------------------+
2 rows in set (0.00 sec)mysql> EXPLAIN SELECT * FROM later_fo_word_record_f F WHERE NOT EXISTS (SELECT 1 FROM user_buy_product_dtl_f T WHERE F.PNAME = T.CNAME );
+----+--------------------+-------+------+---------------+-------------+---------+--------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+------+---------------+-------------+---------+--------------+------+--------------------------+
| 1 | PRIMARY | F | ALL | NULL | NULL | NULL | NULL | 36 | Using where |
| 2 | DEPENDENT SUBQUERY | T | ref | name_indexc | name_indexc | 63 | test.F.PNAME | 641 | Using where; Using index |
+----+--------------------+-------+------+---------------+-------------+---------+--------------+------+--------------------------+
2 rows in set (0.00 sec)
11.union all
如果union all要优化的话,落个临时表吧,insert into效率还是不错的。
12.union all 与 union
众所周知,union 包含有去重操作,而union all没有,但
在大数据量的情况下distinct + union all 性能大于 UNION 的性能。
-- distict+union all语法:
select distinct aa,bb,cc from (select aa,bb,cc from t1union allselect aa,bb,cc from t2union allselect aa,bb,cc from t3
);
-- union 语法select aa,bb,cc from t1unionselect aa,bb,cc from t2unionselect aa,bb,cc from t3
当然,如果确认没有重复记录,直接用union all就OK了。
13.有子查询的group by 字段数和group by数都要尽量少。
14.尽量避免使用select *,无效字段会影响查询效率。
15.尽量避免空值判断
空值判断会使引擎放弃索引进行全表扫描,因而降低查询效率。
Mysql中字段要求尽量设置为not null,可以用0或’'代替设计的表模型中的Null;
一定要注意‘’与null的区别:
1.在进行count()统计某列的记录数的时候,如果采用的NULL值,系统会自动忽略掉,但是空值是会进行统计到其中的。
2.MySQL中,对于timestamp数据类型,如果往这个数据类型插入的列插入NULL值,则出现的值是当前系统时间。而插入空值,则会出现 0000-00-00 00:00:00
16.避免使用or,会使索引失效,可以使用union all代替。
17数据量较大时,避免使用where 1=1,虽然便于拼接,但也会导致索引失效。
二、导入大量数据时的优化
1.MylSAM引擎的表可以使用DISABLE KEYS/ENABLE KEYS来打开表非唯一索引校验:
alter table t1 DISABLE KEYS;
loading data...
alter table t1 ENABLE KEYS;
2.InnoDB引擎插入数据前可以提前对自增长主键排序,也会提高数据导入效率。
3.InnoDB引擎开关唯一性校验:
SET UNIQUE_CHECKS = 0;
SET UNIQUE_CHECKS = 1;
4.InnoDB引擎开关自动提交:
SET AUTOCOMMIT =0;
SET AUTOCOMMIT =1;
三、INSERT优化
多值插入比每个insert插入一个值的连接次数要少很多,因此多值插入比多个insert插入要快。
四、字段注释
如果节点性能超级差,在建表时尽量把陪伴我的注释放在ddl语句内,这样只需要连接一次。而注释在DDL之外会连接多次。
五、说明
这也只是我在平时工作中的一些技巧总结,实际的优化还是要根据日志来对慢sql优化,比如配置或物理存储也会影响执行的性能。