目录
- 背景
- 参考链接
- 分析Coredump文件获取问题SQL
- 1、查看Coredump文件生成路径
- 2、使用gdb工具读取Coredump文件
- 3、记录崩溃线程堆栈
- 4、记录当前崩溃线程号
- 5、使用dmrdc工具分析Coredump文件
- 6、寻找线程号对应SQL
- 7、重新执行SQL,复现问题
- 记录Coredump文件中所有线程的堆栈信息及对应的SQL语句
背景
环境异常,导致core文件产生。
参考链接
参考链接1: 达梦core文件分析(学习笔记)
参考链接2: Coredump文件分析记录
参考链接3: 达梦core文件分析
分析Coredump文件获取问题SQL
1、查看Coredump文件生成路径
cat /proc/sys/kernel/core_pattern#或者:
cat /etc/sysctl.conf
--查看kernel.core_pattern=/dmdata/core.%e_%p_%t,/dmdata即为core目录
2、使用gdb工具读取Coredump文件
dmdba
用户下,切换到Coredump文件路径(如/dmdata
)下,使用gdb工具(未安装时通过yum -y install gdb
安装)读取Coredump文件:
#简写:
gdb dmserver ${Coredump文件}#全写:
gdb $DM_HOME/bin/dmserver ${Coredump文件}
例:
gdb dmserver corefile-dm_sql_thd.960540
3、记录崩溃线程堆栈
使用bt
或者where
命令记录崩溃线程堆栈:
bt
(gdb) bt
#0 0x0000000001c024d4 in assert_fun ()
#1 0x0000000001c025b4 in sigterm_handler ()
#2 <signal handler called>
#3 0x0000000001006d68 in nsym_node_is_rownum ()
#4 0x0000000001165e78 in pha_rel_is_one_row_low ()
#5 0x0000000001165f30 in pha_rel_is_one_row ()
#6 0x00000000011663a8 in pha_select_item_is_distinct ()
#7 0x0000000001166aa0 in pha2_is_distinct_for_nrel_extend ()
#8 0x00000000011407bc in pha_remove_redundant_tables_in_nrel_low ()
#9 0x0000000001140998 in pha_remove_redundant_tables_in_nrel_recursively ()
#10 0x0000000001140c60 in pha_remove_redundant_tables_in_from_lst ()
#11 0x00000000011410d0 in pha_remove_redundant_tables_single_for_left_join ()
#12 0x0000000001154d88 in pha_remove_redundant_tables_single ()
#13 0x0000000001154ed4 in pha_remove_redundant_tables_in_sels_recursively ()
#14 0x0000000001154f48 in pha_remove_redundant_tables_if_necessary ()
#15 0x000000000114f690 in pha_select_low ()
#16 0x00000000011515d0 in pha_from_normal_tv2 ()
#17 0x0000000001151ab4 in pha_from_normal_tv ()
#18 0x0000000001153494 in pha_from_clause_null ()
#19 0x000000000114dd44 in pha_query_exp ()
#20 0x000000000114e288 in pha_subquery_recursively ()
#21 0x000000000114e8bc in pha_subquery_low ()
#22 0x000000000114f36c in pha_select_low ()
#23 0x00000000011554ac in pha_select2 ()
#24 0x0000000001155e68 in pha_single_sql ()
#25 0x00000000011561f4 in pha_main_low ()
#26 0x00000000011565d0 in pha_main ()
#27 0x0000000001c4ae68 in ntsk_pha_main ()
#28 0x0000000001c4c6f8 in ntsk_process_prepare_low2 ()
#29 0x0000000001c4dbcc in ntsk_process_prepare_low ()
#30 0x0000000001c4dd68 in ntsk_process_prepare ()
#31 0x0000000001c780a4 in ntsk_process_cop ()
#32 0x0000000001b1a278 in uthr_db_main_for_sess ()
--Type <RET> for more, q to quit, c to continue without paging--
#33 0x0000fffddfbb87a0 in ?? () from /usr/lib64/libpthread.so.0
#34 0x0000fffddf83bcbc in ?? () from /usr/lib64/libc.so.6
4、记录当前崩溃线程号
使用info threads
记录当前崩溃线程号:
info threads
带*
号的为当前崩溃线程,LWP后面为线程号,即:960808
5、使用dmrdc工具分析Coredump文件
dmrdc sfile=corefile-dm_sql_thd.960540 dfile=result_corefile-dm_sql_thd.960540.txt#参数详情:
sfile=corefile-dm_sql_thd.960540 --Coredump源文件
file=result_corefile-dm_sql_thd.960540.txt --Coredump分析后的结果文件
6、寻找线程号对应SQL
根据崩溃线程号在Coredump分析后的结果文件
中找到对应的SQL,例如:960808
SQL如下:
--获取模式对象
SELECT * FROM ALL_OBJECTS WHERE OBJECT_TYPE = 'SCH';--切换模式
SET SCHEMA FMS;--执行模式
SELECT COUNT(1) from ( ( select r.RTASK_ID,r.RTASK_BEGIN_TIME, r.RTASK_FINISH_TIME, (SELECT approval_status from biz_approval where manager_id = r.RTASK_ID limit 1) as approval_status, (select val from sys_dic_val where dic_val_no = (SELECT approval_status from biz_approval where manager_id = r.RTASK_ID limit 1)and isdeleted = 'N') as RTASK_CHECK_S,r.RTASK_ITEM, (select s.stationname from sys_workstation s where s.stationid = CAST(r.RTASK_ITEM AS INTEGER)) as STATIONAME,r.RTASK_NAME,r.RTASK_ESTATE,r.RTASK_STIME,r.RTASK_ETIME,CONCAT(r.RTASK_RECEIVER,'') as RTASK_RECEIVER,r.RTASK_CHECK, (select username from sys_users where userno = r.RTASK_RECEIVER) as F_RTASK_RECEIVER, (select WM_CONCAT(rolename) from sys_role where FIND_IN_SET(roleno,r.RTASK_USER)) AS ROLENAME,sdv.val AS D_RTASK_STATEfrom rout_task rLEFT JOIN sys_dic_val sdv ON sdv.dic_val_no = r.RTASK_ESTATELEFT JOIN (select bb.* from (SELECT * FROM biz_approval ORDER BY applicant_time DESC ) bb group by bb.manager_id) ba ON ba.manager_id = r.RTASK_ID and (ba.approval_type = '113' or ba.approval_type = '169')where 1 = 1AND r.RTASK_STIME >= ?AND r.RTASK_ETIME <= ?AND r.RTASK_ITEM = concat((79965),'')AND ba.approval_status = ?and r.ISDELETED = 'N' )
union all ( select b.* from ( select r.RTASK_IDfrom rout_task rLEFT JOIN sys_dic_val sdv ON sdv.dic_val_no = r.RTASK_ESTATELEFT JOIN (select bb.* from (SELECT * FROM biz_approval ORDER BY applicant_time DESC ) bb group by bb.manager_id) ba ON ba.manager_id = r.RTASK_ID and (ba.approval_type = '113' or ba.approval_type = '169')where 1 = 1AND r.RTASK_STIME >= ?AND r.RTASK_ETIME <= ?AND r.RTASK_ITEM = concat((79965),'')AND ba.approval_status = ?and r.ISDELETED = 'N' ) a right join ( select r.RTASK_ID,r.RTASK_BEGIN_TIME, r.RTASK_FINISH_TIME, (SELECT approval_status from biz_approval where manager_id = r.RTASK_ID limit 1) as approval_status, (select val from sys_dic_val where dic_val_no = (SELECT approval_status from biz_approval where manager_id = r.RTASK_ID limit 1)and isdeleted = 'N') as RTASK_CHECK_S,r.RTASK_ITEM, (select s.stationname from sys_workstation s where s.stationid = CAST(r.RTASK_ITEM AS INTEGER)) as STATIONAME,r.RTASK_NAME,r.RTASK_ESTATE,r.RTASK_STIME,r.RTASK_ETIME,CONCAT(r.RTASK_RECEIVER,'') as RTASK_RECEIVER,r.RTASK_CHECK, (select username from sys_users where userno = r.RTASK_RECEIVER) as F_RTASK_RECEIVER, (select WM_CONCAT(rolename) from sys_role where FIND_IN_SET(roleno,r.RTASK_USER)) AS ROLENAME,sdv.val AS D_RTASK_STATEfrom rout_task rLEFT JOIN sys_dic_val sdv ON sdv.dic_val_no = r.RTASK_ESTATELEFT JOIN (select bb.* from (SELECT * FROM biz_approval ORDER BY applicant_time DESC ) bb group by bb.manager_id) ba ON ba.manager_id = r.RTASK_ID and (ba.approval_type = '113' or ba.approval_type = '169')where 1 = 1AND r.RTASK_FINISH_TIME >= ?AND r.RTASK_FINISH_TIME <= ?AND r.RTASK_ITEM = concat((79965),'')AND ba.approval_status = ?and r.ISDELETED = 'N' ) b on a.RTASK_ID = b.RTASK_IDwhere a.RTASK_ID is null ) ) r;
7、重新执行SQL,复现问题
在disql中单独执行SQL语句,查看问题是否可复现
待补充
记录Coredump文件中所有线程的堆栈信息及对应的SQL语句
(gdb)set logging file <文件名> #设置输出的文件名称
(gdb)set logging on #开始将调试信息将输出到指定文件
(gdb)thread apply all bt #打印所有线程栈信息
(gdb)set logging off #关闭到指定文件的输出
(gdb)quit #结束gdb调试