本帖最后由 ccton 于 2014-2-18 12:08 编辑
[root@**** hydata]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.6 (Tikanga)
[root@**** hydata]# uname -a
Linux gywsj.hyb210 2.6.18-238.el5 #1 SMP Sun Dec 19 14:22:44 EST 2010 x86_64 x86_64 x86_64 GNU/Linux
数据库版本:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
问题描述,运行一段时间后出现挂载阵列的文件系统会逻辑错误,自动变为只读
另外,曾经用循环批量写入大文件,将磁盘写满也未报过错误,重新mount后写文件也正常
我怀疑是阵列的电压不稳定导致磁盘逻辑块错误,或者是ORACLE bug,但未找到相关资料证明
请各位高手帮忙诊断下
下面是相关日志
数据库日志:
Tue Feb 18 09:53:28 2014
Archived Log entry 18018 added for thread 1 sequence 565 ID 0x51475291 dest 1:
Tue Feb 18 10:19:41 2014
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ckpt_30996.trc:
ORA-00206: 写入控制文件时出错 (块 3, # 块 1)
ORA-00202: 控制文件: ''/hydata/flash_recovery_area/orcl/control02.ctl''
ORA-27072: 文件 I/O 错误
Linux-x86_64 Error: 30: Read-only file system
Additional information: 4
Additional information: 3
Additional information: -1
Tue Feb 18 10:19:41 2014
KCF: read, write or open error, block=0xaa13a online=1
Tue Feb 18 10:19:41 2014
KCF: read, write or open error, block=0xa5dfd online=1
file=5 '/hydata/tablespaces/cmsservergy.dat'
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ckpt_30996.trc:
ORA-00221: 写入控制文件时出错
ORA-00206: 写入控制文件时出错 (块 3, # 块 1)
ORA-00202: 控制文件: ''/hydata/flash_recovery_area/orcl/control02.ctl''
ORA-27072: 文件 I/O 错误
Linux-x86_64 Error: 30: Read-only file system
Additional information: 4
Additional information: 3
Additional information: -1
file=5 '/hydata/tablespaces/cmsservergy.dat'
Tue Feb 18 10:19:41 2014
KCF: read, write or open error, block=0x21593e online=1
error=27072 txt: 'Linux-x86_64 Error: 30: Read-only file system
CKPT (ospid: 30996): terminating the instance due to error 221
file=10 '/hydata/tablespaces/cmsservergy4.dat'
error=27072 txt: 'Linux-x86_64 Error: 30: Read-only file system
Additional information: 4
error=27072 txt: 'Linux-x86_64 Error: 30: Read-only file system
Additional information: 4
Tue Feb 18 10:19:41 2014
KCF: read, write or open error, block=0x153877 online=1
Additional information: 4
Additional information: 696634
file=10 '/hydata/tablespaces/cmsservergy4.dat'
Additional information: 2185534
Additional information: 679421
Additional information: -1'
error=27072 txt: 'Linux-x86_64 Error: 30: Read-only file system
Additional information: -1'
Additional information: -1'
Additional information: 4
Additional information: 1390711
Additional information: -1'
Tue Feb 18 10:19:41 2014
Some DDE async actions failed or were cancelled
Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_lgwr_30992.trc:
ORA-00345: 重做日志写入块 193051 计数 13 出错
ORA-00312: 联机日志 2 线程 1: '/hydata/orcl/redo02.log'
ORA-27072: 文件 I/O 错误
Linux-x86_64 Error: 5: Input/output error
Additional information: 4
Additional information: 193051
Additional information: -1
Tue Feb 18 10:19:42 2014
opiodr aborting process unknown ospid (11121) as a result of ORA-1092
Tue Feb 18 10:19:42 2014
ORA-1092 : opitsk aborting process
Instance terminated by CKPT, pid = 30996
操作系统日志:
Feb 18 10:19:05 gywsj kernel: INFO: task extract:32604 blocked for more than 120 seconds.
Feb 18 10:19:05 gywsj kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 18 10:19:05 gywsj kernel: extract D ffffffff80153806 0 32604 9994 32577 (NOTLB)
Feb 18 10:19:05 gywsj kernel: ffff8101d9957b78 0000000000000082 ffff810001059800 0000000000000000
Feb 18 10:19:05 gywsj kernel: ffffffff804d3480 000000000000000a ffff8102c9a5e080 ffff81087fffb080
Feb 18 10:19:05 gywsj kernel: 00067b698e9d15a4 000000000000322f ffff8102c9a5e268 0000001a00000000
Feb 18 10:19:05 gywsj kernel: Call Trace:
Feb 18 10:19:05 gywsj kernel: [] do_gettimeofday+0x40/0x90
Feb 18 10:19:05 gywsj kernel: [] sync_page+0x0/0x43
Feb 18 10:19:05 gywsj kernel: [] io_schedule+0x3f/0x67
Feb 18 10:19:05 gywsj kernel: [] sync_page+0x3e/0x43
Feb 18 10:19:05 gywsj kernel: [] __wait_on_bit+0x40/0x6e
Feb 18 10:19:05 gywsj kernel: [] wait_on_page_bit+0x6c/0x72
Feb 18 10:19:05 gywsj kernel: [] wake_bit_function+0x0/0x23
Feb 18 10:19:05 gywsj kernel: [] pagevec_lookup_tag+0x1a/0x21
Feb 18 10:19:05 gywsj kernel: [] wait_on_page_writeback_range+0x62/0x12e
Feb 18 10:19:05 gywsj kernel: [] do_writepages+0x29/0x2f
Feb 18 10:19:05 gywsj kernel: [] __filemap_fdatawrite_range+0x50/0x5b
Feb 18 10:19:05 gywsj kernel: [] filemap_write_and_wait+0x26/0x31
Feb 18 10:19:05 gywsj kernel: [] generic_file_direct_IO+0x81/0x122
Feb 18 10:19:05 gywsj kernel: [] __generic_file_aio_read+0xb8/0x198
Feb 18 10:19:05 gywsj kernel: [] generic_file_aio_read+0x34/0x39
Feb 18 10:19:05 gywsj kernel: [] do_sync_read+0xc7/0x104
Feb 18 10:19:05 gywsj kernel: [] autoremove_wake_function+0x0/0x2e
Feb 18 10:19:05 gywsj kernel: [] hrtimer_cancel+0xc/0x16
Feb 18 10:19:05 gywsj kernel: [] hrtimer_nanosleep+0x58/0x118
Feb 18 10:19:05 gywsj kernel: [] vfs_read+0xcb/0x171
Feb 18 10:19:05 gywsj kernel: [] sys_read+0x45/0x6e
Feb 18 10:19:05 gywsj kernel: [] tracesys+0xd5/0xe0
Feb 18 10:19:05 gywsj kernel:
Feb 18 10:19:41 gywsj kernel: sd 3:0:0:1: timing out command, waited 360s
Feb 18 10:19:41 gywsj kernel: sd 3:0:0:1: SCSI error: return code = 0x060d0000
Feb 18 10:19:41 gywsj kernel: end_request: I/O error, dev sdc, sector 662064886
Feb 18 10:19:41 gywsj kernel: Buffer I/O error on device sdc5, logical block 82758095
Feb 18 10:19:41 gywsj kernel: lost page write due to I/O error on sdc5
Feb 18 10:19:41 gywsj kernel: Buffer I/O error on device sdc5, logical block 82758096
Feb 18 10:19:41 gywsj kernel: lost page write due to I/O error on sdc5
Feb 18 10:19:41 gywsj kernel: Aborting journal on device sdc5.
Feb 18 10:19:41 gywsj kernel: ext3_abort called.
Feb 18 10:19:41 gywsj kernel: EXT3-fs error (device sdc5): ext3_journal_start_sb: Detected aborted journal
Feb 18 10:19:41 gywsj kernel: Remounting filesystem read-only
--重新mount后可以写入文件
操作系统日志:
Feb 18 10:47:35 gywsj kernel: __journal_remove_journal_head: freeing b_frozen_data
Feb 18 10:47:35 gywsj last message repeated 2 times
Feb 18 10:47:35 gywsj kernel: ext3_abort called.
Feb 18 10:47:35 gywsj kernel: EXT3-fs error (device sdc5): ext3_put_super: Couldn't clean up the journal
Feb 18 10:47:51 gywsj kernel: kjournald starting. Commit interval 5 seconds
Feb 18 10:47:51 gywsj kernel: EXT3-fs warning (device sdc5): ext3_clear_journal_err: Filesystem error recorded from previous mount: IO failure
Feb 18 10:47:51 gywsj kernel: EXT3-fs warning (device sdc5): ext3_clear_journal_err: Marking fs in need of filesystem check.
Feb 18 10:47:51 gywsj kernel: EXT3-fs warning: mounting fs with errors, running e2fsck is recommended
Feb 18 10:47:51 gywsj kernel: EXT3 FS on sdc5, internal journal
Feb 18 10:47:51 gywsj kernel: EXT3-fs: recovery complete.
Feb 18 10:47:51 gywsj kernel: EXT3-fs: mounted filesystem with ordered data mode.