前言:今天开发童鞋遇到一个奇怪的问题,在测试环境里面执行drop database dbname发现一直夯住不动,等了很久也没有执行,于是问题就到我这里了
一、什么是MetaData Lock?
MetaData Lock即元数据锁,在数据库中元数据即数据字典信息包括db,table,function,procedure,trigger,event等。metadata lock主要为了保证元数据的一致性,用于处理不同线程操作同一数据对象的同步与互斥问题
二、MetaData Lock的前世今生
mdl锁是为了解决一个有名的bug#989,所以在5.5.3版本引入了MDL锁。其实5.5也有类似保护元数据的机制,只是没有明确提出MDL概念而已。但是5.5之前版本(比如5.1)与5.5之后版本在保护元数据这块有一个显著的不同点是,5.1对于元数据的保护是语句级别的,5.5对于metadata的保护是事务级别的。所谓语句级别,即语句执行完成后,无论事务是否提交或回滚,其表结构可以被其他会话更新;而事务级别则是在事务结束后才释放MDL。引入MDL锁主要是为了解决两个问题:
事务隔离问题:比如在可重复隔离级别下,会话A在2次查询期间,会话B对表结构做了修改,两次查询结果就会不一致,无法满足可重复读的要求。
数据复制问题:比如会话A执行了多条更新语句期间,另外一个会话B做了表结构变更并且先提交,就会导致slave在重做时,先重做alter,再重做update时就会出现复制错误的现象。也就是上面提到的bug#989。
三、Waiting For Table MetaData Lock场景重现(这也是我们今天遇到的问题)
session A:注意这里是显示的提交一个事务
root@localhost:mysql.sock 18:03:49 [tom]>desc test;
+------------+-------------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+-------------+------+-----+-------------------+----------------+
| id | int(10) | NO | PRI | NULL | auto_increment |
| name | varchar(32) | YES | | NULL | |
| age | int(10) | YES | | NULL | |
| createtime | datetime | NO | | CURRENT_TIMESTAMP | |
+------------+-------------+------+-----+-------------------+----------------+
4 rows in set (0.01 sec)
root@localhost:mysql.sock 18:03:43 [tom]>start transaction;
Query OK, 0 rows affected (0.00 sec)
root@localhost:mysql.sock 18:03:46 [tom]>select c99 from test;
ERROR 1054 (42S22): Unknown column 'c99' in 'field list'
session B:执行Online DDL(我这个是MySQL5.7.14官方版本哦)
root@localhost:mysql.sock 18:02:26 [tom]>Start transaction;
Query OK, 0 rows affected (0.00 sec)
root@localhost:mysql.sock 18:04:16 [tom]>alter table test drop column age;
发生阻塞...
session C:processlist看不到任何test表操作,但是有MDL锁
root@localhost:mysql.sock 18:02:31 [tom]>show processlist;
+-------+---------+----------------+------+---------+------+---------------------------------+----------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+-------+---------+----------------+------+---------+------+---------------------------------+----------------------------------+
| 743 | monitor | 10.0.0.6:54020 | NULL | Sleep | 3 | | NULL |
| 92210 | monitor | 10.0.0.6:46778 | NULL | Sleep | 1 | | NULL |
| 93740 | root | localhost | tom | Query | 0 | starting | show processlist |
| 93742 | root | localhost | tom | Sleep | 64 | | NULL |
| 93743 | root | localhost | tom | Query | 8 | Waiting for table metadata lock | alter table test drop column age |
+-------+---------+----------------+------+---------+------+---------------------------------+----------------------------------+
5 rows in set (0.00 sec)
innodb engine监控看不到任何锁冲突信息
------------
TRANSACTIONS
------------
Trx id counter 112477
Purge done for trx's n:o < 112477 undo n:o < 0 state: running but idle
History list length 556
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 421340178270032, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421340178271856, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421340178270944, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
--------
FILE I/O
--------
查看information_schema
root@localhost:mysql.sock 18:18:46 [tom]>select trx_id,trx_state,trx_started,trx_mysql_thread_id from information_schema.innodb_trx;
Empty set (0.00 sec)
这种情况是一个特例,存在一个查询失败的语句,比如查询不存在的列,语句失败返回,但是事务没有提交,此时alter仍然会被堵住。通过show processlist看不到表上有任何操作,在information_schema.innodb_trx中也没有任何进行中的事务。这很可能是因为在一个显式的事务中,对表进行了一个失败的操作(比如查询了一个不存在的字段),这时事务没有开始,但是失败语句获取到的锁依然有效。从performance_schema.events_statements_current表中可以查到失败的语句。
If the server acquires metadata locks for a statement that is syntactically valid but fails during execution, it does not release the locks early. Lock release is still deferred to the end of the transaction because the failed statement is written to the binary log and the locks protect log consistency.
定位问题SQL,然后杀掉对应的SQL。查看每一个session正在执行的sql,然后通过下面语句定位到问题sql,杀掉就可以了
select * from performance_schema.events_statements_current\G
select * from sys.session\G
select * from sys.processlist\G
为了方便大家交流,本人开通了微信公众号,和QQ群291519319。喜欢技术的一起来交流吧