原创 张政俊 老叶茶馆
来自专辑
MySQL修行
作者:张政俊
就职于中欧基金,知数堂粉丝,数据库爱好者,熟悉RDBMS、nosql、new sql等各类数据库。
啃完O'reilly的《高性能mysql》、姜老师的《MySQL技术内幕》,再加上个2,3年的实战经验,就基本可以成为一名能独立处理问题的DBA了。但有些时候遇到些很刁钻的疑难杂症的话,那就束手无策了。
所以要想技术水平更进一步的话,源码调试是避不开的。
GDB 简介
GDB 是 Linux 系统中,非常常见的调试工具,它有以下功能:
- Start your program, specifying anything that might affect its behavior.
- Make your program stop on specified conditions.
- Examine what has happened, when your program has stopped.
- Change things in your program, so you can experiment with correcting the effects of one bug and go on to learn about another.
常用的参数命令:
- info threads:查看全部线程
- thread n:指定某个线程
- b:在某处打断点
- c:继续往下走
- s:执行一行代码,如果代码函数调用,则进入函数
- n:执行一行代码,函数调用不进入
- p:打印某个变量值
- list:打印代码的文本信息
- bt:查看某个线程的栈帧
- info b:查看当前所有断点信息
调试环境搭建
直接在 linux 下面使用 gdb,这种应该是目前市面上最简单有效的方式。
1. 安装gdb
yum install -y cmake make gcc gcc-c++ ncurses-devel bison gdb
2. 下载、解压源码
wget https://dev.mysql.com/get/Downloads/MySQL-5.7/mysql-boost-5.7.25.tar.gztar zxvf mysql-boost-5.7.25.tar.gzmkdir -p /gdb/mysql/mkdir -p /gdb/data/
3. 安装数据库
cmake -DCMAKE_INSTALL_PREFIX=/gdb/mysql/ -DMYSQL_DATADIR=/gdb/data/ -DSYSCONFDIR=/gdb/mysql/ -DWITH_INNOBASE_STORAGE_ENGINE=1 -DWITH_ARCHIVE_STORAGE_ENGINE=1 -DWITH_BLACKHOLE_STORAGE_ENGINE=1 -DWITH_FEDERATED_STORAGE_ENGINE=1 -DWITH_PARTITION_STORAGE_ENGINE=1 -DMYSQL_UNIX_ADDR=/gdb/mysql/mysql3.sock -DMYSQL_TCP_PORT=3306 -DENABLED_LOCAL_INFILE=1 -DEXTRA_CHARSETS=all -DDEFAULT_CHARSET=utf8 -DDEFAULT_COLLATION=utf8_general_ci -DMYSQL_USER=mysql -DWITH_BINLOG_PREALLOC=ON -DWITH_BOOST=/gdb/mysql-5.7.25/boost/boost_1_59_0 -DWITH_DEBUG=1
-DWITH_DEBUG=1 是最关键的,它的作用是开启DBUG
make&&make install
4. 初始化数据库
vim /etc/my.cnf #简易配置下my.cnf文件 [client] port = 3306 socket = /gdb/data/mysqld.sock [mysqld] port = 3306 socket =/gdb/data/mysqld.sock skip-external-locking key_buffer_size = 8M max_allowed_packet = 1M table_open_cache = 64 sort_buffer_size = 512K net_buffer_length = 8K read_buffer_size = 128K read_rnd_buffer_size = 256K myisam_sort_buffer_size = 8M lower_case_table_names=1 innodb_buffer_pool_size=300M log-bin=mysql-bin character_set_server=utf8 binlog_format=row datadir=/gdb/data log-error =/gdb/data/error.log pid-file = /gdb/data/mysql.pid innodb_log_file_size=512M innodb_log_files_in_group = 3 sql_mode='' autocommit=1 server-id = 1 max_connections=1500 wait_timeout=70 interactive_timeout=70 skip-name-resolve [mysqldump] quick max_allowed_packet = 16M [myisamchk] key_buffer_size = 20M sort_buffer_size = 20M read_buffer = 2M write_buffer = 2M
5. 启动数据库
赋权,以便mysql用户有权限在该目录下生成文件:
chown -R mysql:mysql /gdb/data
初始化数据库命令:
cd /gdb/mysql/bin./mysqld --initialize --user=mysql --basedir=/gdb/mysql --datadir=/gdb/data
启动数据库:
cd /gdb/mysql/support-files ./mysql.server start
insert 断点调试
1. 查看 mysql 进程 id
[root@ops sql]# ps aux | grep mysqlroot 629 0.0 0.0 112724 972 pts/2 S+ 14:52 0:00 grep -E --color=auto mysqlroot 20926 0.0 0.0 113312 1628 pts/0 S 11:15 0:00 /bin/sh /gdb/mysql/bin/mysqld_safe --datadir=/gdb/data --pid-file=/gdb/data/mysql.pidmysql 21357 0.0 5.8 1740820 223820 pts/0 Sl 11:15 0:01 /gdb/mysql/bin/mysqld --basedir=/gdb/mysql --datadir=/gdb/data --plugin-dir=/gdb/mysql/lib/plugin --user=mysql --log-error=/gdb/data/error.log --pid-file=/gdb/data/mysql.pid --socket=/gdb/data/mysqld.sock --port=3306
可以看到此时mysql的进程号为:20926
2. gdb 中 attach mysql 进程
[root@ops ~]# gdbGNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-119.el7Copyright (C) 2013 Free Software Foundation, Inc.License GPLv3+: GNU GPL version 3 or later This is free software: you are free to change and redistribute it.There is NO WARRANTY, to the extent permitted by law. Type "show copying"and "show warranty" for details.This GDB was configured as "x86_64-redhat-linux-gnu".For bug reporting instructions, please see:.(gdb) attach 21357Attaching to process 21357Reading symbols from /gdb/mysql/bin/mysqld...done.Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.[New LWP 21617][New LWP 21387][New LWP 21386][New LWP 21384][New LWP 21383][New LWP 21382][New LWP 21381][New LWP 21380][New LWP 21379][New LWP 21378][New LWP 21377][New LWP 21376][New LWP 21375][New LWP 21374][New LWP 21373][New LWP 21369][New LWP 21368][New LWP 21367][New LWP 21366][New LWP 21365][New LWP 21364][New LWP 21363][New LWP 21362][New LWP 21361][New LWP 21360][New LWP 21359][New LWP 21358][Thread debugging using libthread_db enabled]Using host libthread_db library "/lib64/libthread_db.so.1".Loaded symbols for /lib64/libpthread.so.0Reading symbols from /lib64/libcrypt.so.1...(no debugging symbols found)...done.Loaded symbols for /lib64/libcrypt.so.1Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.Loaded symbols for /lib64/libdl.so.2Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done.Loaded symbols for /lib64/librt.so.1Reading symbols from /lib64/libstdc++.so.6...(no debugging symbols found)...done.Loaded symbols for /lib64/libstdc++.so.6Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.Loaded symbols for /lib64/libm.so.6Reading symbols from /lib64/libgcc_s.so.1...(no debugging symbols found)...done.Loaded symbols for /lib64/libgcc_s.so.1Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.Loaded symbols for /lib64/libc.so.6Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.Loaded symbols for /lib64/ld-linux-x86-64.so.2Reading symbols from /lib64/libfreebl3.so...Reading symbols from /lib64/libfreebl3.so...(no debugging symbols found)...done.(no debugging symbols found)...done.Loaded symbols for /lib64/libfreebl3.soReading symbols from /lib64/libnss_files.so.2...(no debugging symbols found)...done.Loaded symbols for /lib64/libnss_files.so.2Reading symbols from /lib64/libnss_sss.so.2...Reading symbols from /lib64/libnss_sss.so.2...(no debugging symbols found)...done.(no debugging symbols found)...done.Loaded symbols for /lib64/libnss_sss.so.20x00002b15ce803f0d in poll () from /lib64/libc.so.6Missing separate debuginfos, use: debuginfo-install glibc-2.17-222.el7.x86_64 libgcc-4.8.5-39.el7.x86_64 libstdc++-4.8.5-39.el7.x86_64 nss-softokn-freebl-3.34.0-2.el7.x86_64 sssd-client-1.16.0-19.el7.x86_64(gdb)
3. 找到断点
这次看的是 insert 插入的流程,找到 sql_insert.cc 文件:
源码中的函数为:Sql_cmd_insert::mysql_insert
4. 设置断点
(gdb) b Sql_cmd_insert::mysql_insertBreakpoint 1 at 0x175aed9: file /gdb/mysql-5.7.25/sql/sql_insert.cc, line 423.
然后查看下线程的栈帧:
(gdb) bt#0 0x00002b15ce803f0d in poll () from /lib64/libc.so.6#1 0x0000000001667f87 in Mysqld_socket_listener::listen_for_connection_event (this=0x3967430) at /gdb/mysql-5.7.25/sql/conn_handler/socket_connection.cc:852#2 0x0000000000eb15cc in Connection_acceptor::connection_event_loop (this=0x4f882e0) at /gdb/mysql-5.7.25/sql/conn_handler/connection_acceptor.h:66#3 0x0000000000ea904a in mysqld_main (argc=38, argv=0x383c248) at /gdb/mysql-5.7.25/sql/mysqld.cc:5149#4 0x0000000000ea01bd in main (argc=9, argv=0x7ffc73765b88) at /gdb/mysql-5.7.25/sql/main.cc:25
5. 数据库登陆
gdb断点设置完后,起个新的数据库连接:
会发现此时无法登陆,在gdb中执行next:
(gdb) nSingle stepping until exit from function poll,which has no line number information.Mysqld_socket_listener::listen_for_connection_event (this=0x3967430) at /gdb/mysql-5.7.25/sql/conn_handler/socket_connection.cc:859859 if (retval < 0 && socket_errno != SOCKET_EINTR)
通过输出可以知道数据库处于获取系统 socket 状态。接下来需要跳过的步骤有些多,我们直接使用 continue (直接到下一段可执行代码)
(gdb) cContinuing.
新起客户端连接成功:
[root@ops bin]# mysql -uroot -pEnter password:Welcome to the MySQL monitor. Commands end with ; or g.Your MySQL connection id is 14Server version: 5.7.25-debug-log Source distributionType 'help;' or 'h' for help. Type 'c' to clear the current input statement.mysql>
6. 数据库插入
插入操作前,切换schema和查询都是没问题的:
mysql> use gdbDatabase changedmysql> show tables;+---------------+| Tables_in_gdb |+---------------+| test |+---------------+1 row in set (0.00 sec)mysql> select * from test;+------+| id |+------+| 1 || 2 |+------+2 rows in set (0.00 sec)
插入一条 id=3 的数据,出现了等待:
mysql> insert into test values (3);
7. 分析断点信息
断点触发,如下:
(gdb) cContinuing.[Switching to Thread 0x2b15faf02700 (LWP 21617)]Breakpoint 1, Sql_cmd_insert::mysql_insert (this=0x2b1614008348, thd=0x2b1614003af0, table_list=0x2b1614007db8) at /gdb/mysql-5.7.25/sql/sql_insert.cc:423423 DBUG_ENTER("mysql_insert");
bt 命令展示栈帧:
(gdb) bt#0 Sql_cmd_insert::mysql_insert (this=0x2b1614008348, thd=0x2b1614003af0, table_list=0x2b1614007db8) at /gdb/mysql-5.7.25/sql/sql_insert.cc:423#1 0x000000000176256e in Sql_cmd_insert::execute (this=0x2b1614008348, thd=0x2b1614003af0) at /gdb/mysql-5.7.25/sql/sql_insert.cc:3118#2 0x000000000153b093 in mysql_execute_command (thd=0x2b1614003af0, first_level=true) at /gdb/mysql-5.7.25/sql/sql_parse.cc:3596#3 0x0000000001540820 in mysql_parse (thd=0x2b1614003af0, parser_state=0x2b15faf01690) at /gdb/mysql-5.7.25/sql/sql_parse.cc:5570#4 0x0000000001536131 in dispatch_command (thd=0x2b1614003af0, com_data=0x2b15faf01df0, command=COM_QUERY) at /gdb/mysql-5.7.25/sql/sql_parse.cc:1484#5 0x0000000001534f9a in do_command (thd=0x2b1614003af0) at /gdb/mysql-5.7.25/sql/sql_parse.cc:1025#6 0x00000000016658dc in handle_connection (arg=0x39610f0) at /gdb/mysql-5.7.25/sql/conn_handler/connection_handler_per_thread.cc:306#7 0x0000000001ced592 in pfs_spawn_thread (arg=0x5508e50) at /gdb/mysql-5.7.25/storage/perfschema/pfs.cc:2190#8 0x00002b15cd699e25 in start_thread () from /lib64/libpthread.so.0#9 0x00002b15ce80ebad in clone () from /lib64/libc.so.6
接下来输入 n 会逐行输出,我们这里直接 continue,阻塞的 insert 也完成了:
mysql> insert into test values (3);Query OK, 1 row affected (2 min 49.57 sec)
发布一个特殊版本的mysql
有些函数所在源码的位置比较好理解,比如上面的 insert 功能,或者 delete 功能。是不是可以通过修改 delete 函数,打包一个数据永不会被删除的 mysql 呢?
定位函数位置
先追踪mysql进程:
(gdb) attach 21357Attaching to process 21357Reading symbols from /gdb/mysql/bin/mysqld...done.Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.[New LWP 5584][New LWP 5583][New LWP 21617][New LWP 21387][New LWP 21386][New LWP 21384][New LWP 21383][New LWP 21382][New LWP 21381][New LWP 21380][New LWP 21379][New LWP 21378][New LWP 21377][New LWP 21376][New LWP 21375][New LWP 21374][New LWP 21373][New LWP 21369][New LWP 21368][New LWP 21367][New LWP 21366][New LWP 21365][New LWP 21364][New LWP 21363][New LWP 21362][New LWP 21361][New LWP 21360][New LWP 21359][New LWP 21358][Thread debugging using libthread_db enabled]Using host libthread_db library "/lib64/libthread_db.so.1".Loaded symbols for /lib64/libpthread.so.0Reading symbols from /lib64/libcrypt.so.1...(no debugging symbols found)...done.Loaded symbols for /lib64/libcrypt.so.1Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.Loaded symbols for /lib64/libdl.so.2Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done.Loaded symbols for /lib64/librt.so.1Reading symbols from /lib64/libstdc++.so.6...(no debugging symbols found)...done.Loaded symbols for /lib64/libstdc++.so.6Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.Loaded symbols for /lib64/libm.so.6Reading symbols from /lib64/libgcc_s.so.1...(no debugging symbols found)...done.Loaded symbols for /lib64/libgcc_s.so.1Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.Loaded symbols for /lib64/libc.so.6Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.Loaded symbols for /lib64/ld-linux-x86-64.so.2Reading symbols from /lib64/libfreebl3.so...Reading symbols from /lib64/libfreebl3.so...(no debugging symbols found)...done.(no debugging symbols found)...done.Loaded symbols for /lib64/libfreebl3.soReading symbols from /lib64/libnss_files.so.2...(no debugging symbols found)...done.Loaded symbols for /lib64/libnss_files.so.2Reading symbols from /lib64/libnss_sss.so.2...Reading symbols from /lib64/libnss_sss.so.2...(no debugging symbols found)...done.(no debugging symbols found)...done.Loaded symbols for /lib64/libnss_sss.so.20x00002b15ce803f0d in poll () from /lib64/libc.so.6Missing separate debuginfos, use: debuginfo-install glibc-2.17-222.el7.x86_64 libgcc-4.8.5-39.el7.x86_64 libstdc++-4.8.5-39.el7.x86_64 nss-softokn-freebl-3.34.0-2.el7.x86_64 sssd-client-1.16.0-19.el7.x86_64
delete函数这里打上断点:
(gdb) b Sql_cmd_delete::mysql_deleteBreakpoint 1 at 0x175198b: file /gdb/mysql-5.7.25/sql/sql_delete.cc, line 50.
数据库执行delete语句触发断点:
mysql> delete from test where id =3;
gbk断点信息:
Breakpoint 1, Sql_cmd_delete::mysql_delete (this=0x2b16040020c8, thd=0x2b1604007e30, limit=18446744073709551615) at /gdb/mysql-5.7.25/sql/sql_delete.cc:5050 DBUG_ENTER("mysql_delete");
查看相关栈帧:
#0 Sql_cmd_delete::mysql_delete (this=0x2b16040020c8, thd=0x2b1604007e30, limit=18446744073709551615) at /gdb/mysql-5.7.25/sql/sql_delete.cc:50#1 0x0000000001755cc6 in Sql_cmd_delete::execute (this=0x2b16040020c8, thd=0x2b1604007e30) at /gdb/mysql-5.7.25/sql/sql_delete.cc:1392#2 0x000000000153b12f in mysql_execute_command (thd=0x2b1604007e30, first_level=true) at /gdb/mysql-5.7.25/sql/sql_parse.cc:3606#3 0x0000000001540820 in mysql_parse (thd=0x2b1604007e30, parser_state=0x2b15faf43690) at /gdb/mysql-5.7.25/sql/sql_parse.cc:5570#4 0x0000000001536131 in dispatch_command (thd=0x2b1604007e30, com_data=0x2b15faf43df0, command=COM_QUERY) at /gdb/mysql-5.7.25/sql/sql_parse.cc:1484#5 0x0000000001534f9a in do_command (thd=0x2b1604007e30) at /gdb/mysql-5.7.25/sql/sql_parse.cc:1025#6 0x00000000016658dc in handle_connection (arg=0x54b6510) at /gdb/mysql-5.7.25/sql/conn_handler/connection_handler_per_thread.cc:306#7 0x0000000001ced592 in pfs_spawn_thread (arg=0x5508e50) at /gdb/mysql-5.7.25/storage/perfschema/pfs.cc:2190#8 0x00002b15cd699e25 in start_thread () from /lib64/libpthread.so.0#9 0x00002b15ce80ebad in clone () from /lib64/libc.so.6
修改源码
可以看到 #1 是 Sql_cmd_delete::execute ,这个就是 delete 处理的函数,去源代码中找到相应函数:
把正真实现删除逻辑的代码给注释掉,返回的 res 值直接赋成 true:
bool Sql_cmd_delete::execute(THD *thd){ DBUG_ASSERT(thd->lex->sql_command == SQLCOM_DELETE); LEX *const lex= thd->lex; SELECT_LEX *const select_lex= lex->select_lex; SELECT_LEX_UNIT *const unit= lex->unit; TABLE_LIST *const first_table= select_lex->get_table_list(); TABLE_LIST *const all_tables= first_table; if (delete_precheck(thd, all_tables)) return true; DBUG_ASSERT(select_lex->offset_limit == 0); unit->set_limit(select_lex); /* Push ignore / strict error handler */ Ignore_error_handler ignore_handler; Strict_error_handler strict_handler; if (thd->lex->is_ignore()) thd->push_internal_handler(&ignore_handler); else if (thd->is_strict_mode()) thd->push_internal_handler(&strict_handler);/*注释以下删除逻辑的代码*//* MYSQL_DELETE_START(const_cast(thd->query().str)); bool res = mysql_delete(thd, unit->select_limit_cnt); MYSQL_DELETE_DONE(res, (ulong) thd->get_row_count_func());*//*直接返回true*/ bool res =true; /* Pop ignore / strict error handler */if (thd->lex->is_ignore() || thd->is_strict_mode()) thd->pop_internal_handler(); return res;}
然后上文的方法去重新编译mysql,启动后就会发现delete语句无法删除数据了。
调试总结
如果想深入学习源码,就可以从栈帧出发,但是这是基于知道函数接口功能的前提下进行的,如果不知道某个功能会调用什么函数,断点调试就很难进行了。
直接去读完 mysql 所有源码,成本太高,而且 mysql 的代码结构并不友好,耗时耗力不值得。最好的方案还是遇到问题时或针对特殊的一个功能点按需去寻找函数入口,然后逐步深入分析。
希望这篇文章可以帮到想接触mysql源码调试的同学,以后我遇到特殊的问题也会通过gdb去调试涉及的相关函数,大家可以持续关注~
全文完。
Enjoy MySQL :)