死事务出现在异常关闭数据库或者事务进程不正常结束,比如KILL -9,shutdown abort的情况下。
当前数据库里的死事务可以通过查询内部表x$ktuxe来获得。
select ADDR,KTUXEUSN,KTUXESLT,KTUXESQN,KTUXESIZ from x$ktuxe where KTUXECFL='DEAD';
ADDR KTUXEUSN KTUXESLT KTUXESQN KTUXESIZ
---------------- ---------- ---------- ---------- ----------
00002B92FF5D5F68 15 12 314961 43611
KTUXESIZ代表需要回滚的回滚块数。
死事务的回滚进程数可以通过参数fast_start_parallel_rollback来设置。
show parameter fast
NAME TYPE VALUE
------------------------------------ ---------------------- ------------------------------
fast_start_io_target integer 0
fast_start_mttr_target integer 120
fast_start_parallel_rollback string LOW
low的设置,会使当产生死事务的时候启用2*CPU数个并行回滚。
我们来模拟一个事务被kill掉的情况,在delete的过程中,把这个事务的进程kill掉:
delete from test;
ERROR:
ORA-03114: not connected to ORACLE
delete from test
*
ERROR at line 1:
ORA-12152: TNS:unable to send break message
然后观察数据库后台的等待事件,发现启动了很多的回滚进程。
SID SPID EVENT P1 P2 P3 SQL_ID SECON
------- ---------- ------------------------- ------------- ---------- ---------- ------------------ -----
1707 2323 SQL*Net message to client 1650815232 1 0 3t37hp1cnkuux 0
762 2312 read by other session 3 298397 36 0
1327 2286 read by other session 3 298367 36 0
382 2308 wait for a undo record 0 0 0 0
384 2274 wait for a undo record 0 0 0 0
573 2276 wait for a undo record 0 0 0 0
574 2310 wait for a undo record 0 0 0 0
762 2312 wait for a undo record 0 0 0 0
763 2278 wait for a undo record 0 0 0 0
950 2280 wait for a undo record 0 0 0 0
951 2314 wait for a undo record 0 0 0 0
1139 2282 wait for a undo record 0 0 0 0
1141 2316 wait for a undo record 0 0 0 0
1516 2320 wait for a undo record 0 0 0 0
1517 2284 wait for a undo record 0 0 0 0
1518 2318 wait for a undo record 0 0 0 0
1519 2288 wait for a undo record 0 0 0 0
1705 2290 wait for a undo record 0 0 0 0
191 2306 wait for a undo record 0 0 0 0
1892 2258 wait for a undo record 0 0 0 0
2 2270 wait for a undo record 0 0 0 0
4 2304 wait for a undo record 0 0 0 0
top - 21:47:42 up 16 days, 3:30, 2 users, load average: 3.44, 1.94, 1.35
Tasks: 295 total, 1 running, 294 sleeping, 0 stopped, 0 zombie
Cpu(s): 2.6%us, 1.2%sy, 0.0%ni, 79.6%id, 16.3%wa, 0.0%hi, 0.2%si, 0.0%st
Mem: 24679196k total, 20316832k used, 4362364k free, 666864k buffers
Swap: 20482864k total, 3004k used, 20479860k free, 6074052k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
23773 oracle 16 0 1680m 100m 45m D 11.6 0.4 0:05.95 ora_arc0_xdbsb
2258 oracle 16 0 1600m 17m 14m S 6.0 0.1 0:01.56 ora_p000_xdbsb
15652 root 10 -5 0 0 0 D 4.6 0.0 0:10.26 [kjournald]
23642 oracle 15 0 1605m 31m 25m S 2.7 0.1 0:03.51 ora_smon_xdbsb
2264 oracle 15 0 1600m 17m 14m S 1.7 0.1 0:00.32 ora_p003_xdbsb
2266 oracle 15 0 1600m 17m 14m S 1.7 0.1 0:00.31 ora_p004_xdbsb
2306 oracle 16 0 1600m 17m 14m S 1.7 0.1 0:00.33 ora_p024_xdbsb
2262 oracle 15 0 1600m 17m 14m S 1.3 0.1 0:00.31 ora_p002_xdbsb
2268 oracle 15 0 1600m 17m 14m S 1.3 0.1 0:00.31 ora_p005_xdbsb
2270 oracle 15 0 1600m 17m 14m S 1.3 0.1 0:00.31 ora_p006_xdbsb
2274 oracle 15 0 1600m 17m 14m S 1.3 0.1 0:00.31 ora_p008_xdbsb
2280 oracle 15 0 1600m 17m 14m S 1.3 0.1 0:00.31 ora_p011_xdbsb
2282 oracle 15 0 1600m 17m 14m S 1.3 0.1 0:00.31 ora_p012_xdbsb
2286 oracle 15 0 1600m 17m 14m S 1.3 0.1 0:00.31 ora_p014_xdbsb
2292 oracle 15 0 1600m 17m 14m S 1.3 0.1 0:00.31 ora_p017_xdbsb
2298 oracle 15 0 1600m 17m 14m S 1.3 0.1 0:00.32 ora_p020_xdbsb
2310 oracle 15 0 1600m 17m 14m S 1.3 0.1 0:00.32 ora_p026_xdbsb
2312 oracle 16 0 1600m 17m 14m S 1.3 0.1 0:00.31 ora_p027_xdbsb
2314 oracle 15 0 1600m 17m 14m S 1.3 0.1 0:00.32 ora_p028_xdbsb
2318 oracle 15 0 1600m 17m 14m S 1.3 0.1 0:00.30 ora_p030_xdbsb
2320 oracle 15 0 1600m 17m 14m S 1.3 0.1 0:00.32 ora_p031_xdbsb
2260 oracle 15 0 1600m 17m 14m S 1.0 0.1 0:00.28 ora_p001_xdbsb
2272 oracle 15 0 1600m 17m 14m S 1.0 0.1 0:00.30 ora_p007_xdbsb
2276 oracle 15 0 1600m 17m 14m S 1.0 0.1 0:00.31 ora_p009_xdbsb
2278 oracle 15 0 1600m 17m 14m S 1.0 0.1 0:00.30 ora_p010_xdbsb
2284 oracle 15 0 1600m 17m 14m S 1.0 0.1 0:00.30 ora_p013_xdbsb
2288 oracle 15 0 1600m 17m 14m S 1.0 0.1 0:00.33 ora_p015_xdbsb
2290 oracle 15 0 1600m 17m 14m S 1.0 0.1 0:00.29 ora_p016_xdbsb
2294 oracle 15 0 1600m 17m 14m S 1.0 0.1 0:00.33 ora_p018_xdbsb
2296 oracle 15 0 1600m 17m 14m S 1.0 0.1 0:00.31 ora_p019_xdbsb
2300 oracle 16 0 1600m 17m 14m S 1.0 0.1 0:00.33 ora_p021_xdbsb
2302 oracle 16 0 1600m 17m 14m S 1.0 0.1 0:00.32 ora_p022_xdbsb
2304 oracle 15 0 1600m 17m 14m S 1.0 0.1 0:00.30 ora_p023_xdbsb
2308 oracle 16 0 1600m 17m 14m S 1.0 0.1 0:00.31 ora_p025_xdbsb
2316 oracle 15 0 1600m 17m 14m S 1.0 0.1 0:00.31 ora_p029_xdbsb
23634 oracle 15 0 1609m 29m 17m S 0.7 0.1 0:06.28 ora_dbw0_xdbsb
23636 oracle 15 0 1607m 26m 15m S 0.7 0.1 0:05.99 ora_dbw1_xdbsb
2321 oracle 15 0 12872 1252 816 R 0.3 0.0 0:00.07 top
由于我的服务器上的CPU数较多,ORACLE启动了2*CPU数个回滚进程,这些ora_pxxx_xdbsb都是后台启动的并行回滚的进程。经常会发现回滚的进程间会产生资源的争用,例如buffer busy waits等待事件。导致回滚的速度非常慢,我们可以通过调整参数 fast_start_parallel_rollback 为false,这样回滚的进程就只会有一个,速度有时候反而比并行回滚还快。而且也会解决回滚进程数太多,导致回滚进程占用了太多的系统资源。大事务运行过程中被异常kill掉是一件比较严重的事情,死事务的回滚可能会占用掉你很多的系统资源。
回滚过程中,我们通过观察x$ktuxe.ktuxesiz减少的速度来评估回滚的速度。可以根据以下算法来粗略的估算回滚需要的时间,这里是小时:
declare
l_start number;
l_end number;
begin
select ktuxesiz
into l_start
from x$ktuxe
where KTUXEUSN = 10
and KTUXESLT = 39; ---------这里根据实际数字来填写
dbms_lock.sleep(60); ---------可以缩小这个时间,但是太小,可能会导致误差较大
select ktuxesiz
into l_end
from x$ktuxe
where KTUXEUSN = 10
and KTUXESLT = 39; ---------这里根据实际数字来填写
dbms_output.put_line('time cost Day:' ||
round(l_end / (l_start - l_end) / 60, 2));
end;
/