ORA-609就alertlog中比较常见的一个报错,虽然并没有太大的影响,但是频繁的出现在alert log也是很让人厌烦的事情,本文介绍如何排查解决ORA-609问题。
1.ORA-609官方定义
could not attach to incoming connection Cause Oracle process could not answer incoming connection Action If the situation described in the next error on the stack can be corrected, do so; otherwise contact Oracle Support
简单的解释就是客户端和服务无法连接,如果排查需要看后续的跟着报错代码。
2.场景一: ORA-609 + TNS-12641
2.1首先排查alert log
一般会在alert log看到如下报错,alert log中并无TNS相关报错。
2.2排查$ORACLE_HOME/network/log/sqlnet.log
可以看到大量的TNS-12641报错
Fatal NI connect error 12641, connecting to:
(LOCAL=NO)
VERSION INFORMATION:
TNS for Linux: Version 11.2.0.4.0 - Production
Oracle Bequeath NT Protocol Adapter for Linux: Version 11.2.0.4.0 - Production
TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.4.0 - Production
Time: 27-APR-2024 22:14:00
Tracing not turned on.
Tns error struct:
ns main err code: 12641
TNS-12641: Authentication service failed to initialize
ns secondary err code: 0
nt main err code: 0
nt secondary err code: 0
nt OS err code: 0
从报错可以看到是和认证相关的,原因是因为在$ORACLE_HOME/network/admin/sqlnet.ora添加了
SQLNET.AUTHENTICATION_SERVICES= (ALL)
这里涉及到一个BUG
BUG 23728771 - ABOUT AN ACTUAL BEHAVIOR SQLNET.AUTHENTICATION_SERVICES
在将 SQLNET.AUTHENTICATION_SERVICES 设置为 ALL 时,服务器会默认选择认证适配器 KERBEROS5、RADIUS 和 KERBEROS5。如果在客户端/服务器的 sqlnet.ora 中未指定 sqlnet.authentication_kerberos5_service,则在执行 kerberos 认证时将失败,并出现 ORA-12641 错误。
2.3解决方案
修改$ORACLE_HOME/network/admin/sqlnet.ora
SQLNET.AUTHENTICATION_SERVICES= (ALL) 改为
SQLNET.AUTHENTICATION_SERVICES= (NONE)
3.场景二:ORA-609 + ORA-12537
3.1 排查alert log
一般alert log中会有 ORA-609和ORA-12537的报错
ORA-00609: could not attach to incoming connection
ORA-12537: TNS:connection closed
ORA-609 : opiodr aborting process unknown ospid (xxxx)
3.2 排查$ORACLE_HOME/network/log/sqlnet.log
可以看到sqlnet.log中有TNS-12537
Fatal NI connect error 12537, connecting to:
(LOCAL=NO)
VERSION INFORMATION:
TNS for 64-bit Windows: Version 11.1.0.7.0 - Production
Oracle Bequeath NT Protocol Adapter for 64-bit Windows: Version 11.1.0.7.0 - Production
Windows NT TCP/IP NT Protocol Adapter for 64-bit Windows: Version 11.1.0.7.0 - Production
Time: 12-OCT-2009 10:03:39
Tracing to file: E:\app\oracle\product\11.1.0\db_1\NETWORK\trace\svr1_7464.trc
Tns error struct:
ns main err code: 12537
TNS-12537: TNS:connection closed
ns secondary err code: 12560
nt main err code: 0
nt secondary err code: 0
nt OS err code: 0
3.3 listener log中会有类似如下报错
日志显示连接已建立,没有明显错误。这是因为在listener将连接移交给服务器进程之后,连接失败了。
12-OCT-2009 10:03:39 * (CONNECT_DATA=(SID=ORCL)) * (ADDRESS=(PROTOCOL=tcp)(HOST=123.456.1.123)(PORT=3158)) * establish * ORCL * 0
12-OCT-2009 10:03:39 * (CONNECT_DATA=(SID=ORCL)) * (ADDRESS=(PROTOCOL=tcp)(HOST=123.456.1.123)(PORT=3159)) * establish * ORCL * 0
3.4 解决方案
ORA-609 错误是在客户端连接在完成或在连接/身份验证过程完成之前中断连接过程时引发的。很多时候,这种连接中断是由于超时引起的。从 10gR2 开始,入站连接超时的默认值已设置为 60 秒。这个时间限制通常不足以完成整个连接过程。当错误是间歇性的时,通常不表示存在严重问题。这只是意味着服务器进程在整个连接过程完成之前超时。
我们还发现,在由 DB Console 和 Enterprise Manager 代理(emagent)监视的数据库安装中,ORA-609 错误经常发生。在启动 DB Console 并且作为例行操作后,emagent 将反复尝试连接到目标实例。我们可以在 listener.log 中看到频繁的 emagent 连接,而不会出现错误。然而,偶尔可能会在数据库处未能完成连接过程,因此会引发 ORA-609 错误。emagent 将简单地重试连接,并可能在随后的尝试中成功。(前提是在监听器或数据库上没有发生真正的故障)。这种临时连接失败不会反馈给 DB Console,并且除了 ORA-609 之外,没有其他指示表明发生了故障。
综上引起ORA-609报错的原因有很多,要想真正的追踪到根本原因相对是比较难 的;但是可以有相对简单的办法来应对,常用的方案就是拉长连接超时的时间具体操作为修改$ORACLE_HOME/network/admin/sqlnet.ora(oracle 非grid)添加如下参数,无需重启DB和reload监听
SQLNET.INBOUND_CONNECT_TIMEOUT=120
3.5排查根本原因方案
如果修改连接超时参数后还频繁的出现ORA-609,那么就需要更详细的排查官方建议方案如下:
A. 在客户端的 SQLNET.ORA 文件添加如下trace参数
DIAG_ADR_ENABLED=off # Disable ADR if version 11g
TRACE_LEVEL_CLIENT = 16 # Enable level 16 trace
TRACE_TIMESTAMP_CLIENT = ON # Set timestamp in the trace files
TRACE_DIRECTORY_CLIENT = <DIRECTORY> # Control trace file location
TRACE_FILELEN_CLIENT =<n> #Control size of trace set in kilobytes eg 20480
TRACE_FILENO_CLIENT =<n> #Control number of trace files per process
如果连接模型是 JDBC thin,则需要对客户端进行 Javanet 跟踪。请参阅文档 793415.1 How to Perform the Equivalent of SQL*Net Client Tracing with Oracle JDBC Thin Driver.。
如果使用的是 11.2 版本的 JDBC thin 客户端,则可以使用以下说明文档 1050942.1 How to Trace the Network Packets Exchanged Between JDBC and the RDBMS in Release 11.2。
B. 在服务器端设置trace参数
DIAG_ADR_ENABLED=off # Disable ADR if version 11g
TRACE_LEVEL_SERVER = 16 # Enable level 16 trace
TRACE_TIMESTAMP_SERVER = ON # Set timestamp in the trace files
TRACE_DIRECTORY_SERVER = <DIRECTORY> # Control trace file location
TRACE_FILELEN_SERVER =<n> #Control size of trace set in kilobytes eg 20480
TRACE_FILENO_SERVER =<n> #Control number of trace files per process
Cyclic tracing will allow you to control the size of and number of trace files that are produced.
The TRACE_FILELEN parameter is for the size of a trace file.
The TRACE_FILENO parameter is the number of traces per process.
C. 设置错误堆栈以捕获失败。 当捕获 Oracle Net 客户端跟踪不可行时,这可能会特别有用.
SQL> alter system set events '609 errorstack(3)';
Once a few traces have been collected while the error is reproduced:
SQL> alter system set events '609 off';
然后开SR把这些log上传给oracle support去做进一步排查。
参考文档
NOTE:609.1 - ORA-609 TNS-12537 and TNS-12547 in 11g Alert.log
11g and Newer: ORA-609 TNS-12537 and TNS-12547 or TNS-12170 in DB Alert.log (Doc ID 1116960.1)
Troubleshooting Guide ORA-609 : Opiodr aborting process unknown ospid (Doc ID 1121357.1)
Alert Log Errors: ORA-609 & TNS-12641 -Authentication Service Failed To Initialize (Doc ID 2426368.1)