一、连接的整个流程
1、A.客户端:(1)A-->B 发起连接 (9)与Server Process交互,完成连接
2、B.监听进程:(2)B-->C fork子进程并等待 (7)B-->D 传送客户端信息
3、C.监听子进程1:(3) C-->D fork子进程 (4)C-->B 子进程结束
4、D.子进程2(Server Process) (5)D-->D exec Oracle (6)D-->B 监听传送数据 (8)D-->A 与客户端交互
二、整个监听过程的处理流程如下几步:
利用操作系统工具跟踪:
strace -rf-o /gyj/lsnr.log -p 4913
1、监听接受客户端的TCP连接,并获取客户端发过来的TNS数据包
4926 0.000053 getsockname(8, {sa_family=AF_INET6, sin6_port=htons(1521),inet_pton(AF_INET6, "::", &sin6_addr), sin6_flowinfo=0,sin6_scope_id=0}, [9169787475114065948]) = 0
4926 0.000226 getpeername(8, 0x7fff2c68e5f8, [9169787475114065948]) = -1ENOTCONN (Transport endpoint is not connected)
4926 0.000055 accept(8, {sa_family=AF_INET6, sin6_port=htons(42055),inet_pton(AF_INET6, "::ffff:192.168.0.103", &sin6_addr),sin6_flowinfo=0, sin6_scope_id=0}, [120259084316]) = 12
4926 0.000063 getsockname(12, {sa_family=AF_INET6, sin6_port=htons(1521),inet_pton(AF_INET6, "::ffff:192.168.0.103", &sin6_addr),sin6_flowinfo=0, sin6_scope_id=0}, [120259084316]) = 0
4926 0.000051 fcntl(12, F_SETFL,O_RDONLY|O_NONBLOCK) = 0
4926 0.000034 getsockopt(12, SOL_SOCKET, SO_SNDBUF, [3200064202492396996],[4]) = 0
4926 0.000033 getsockopt(12, SOL_SOCKET, SO_RCVBUF, [3200064202492433792],[4]) = 0
4926 0.000036 setsockopt(12, SOL_TCP, TCP_NODELAY, [1], 4) = 0
4926 0.000087 fcntl(12, F_SETFD, FD_CLOEXEC) = 0
2、监听进程打开用于与子进程通信的管道,同时fork一个子进程,也就是前面我们称为“监听子进程1”的子进程,这里进程号为10209。然后监听进程一直等待,直到这个子进程10209结束
4926 0.000053 pipe([13, 14]) = 0
4926 0.000037 pipe([15, 16]) = 0
4926 0.000042 clone(child_stack=0,flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,child_tidptr=0x2b3d814b1320) =10209
4926 0.000765 wait4(10209, <unfinished ...>
3、在监听进程等待子进程10209结束的同时,子进程10209完成的工作相对比较简单,仅仅是fork一个子程,也就是前面称为“子进程2”的子进程,新的子进程号为10210。子进程10209完成fork子进程10210之后,就立即退出:
10209 0.000116 clone(child_stack=0,flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,child_tidptr=0x2b3d814b1320) = 10210
10209 0.001169 exit_group(0) = ?
4、回到监听主进程,监听进程在子进程10209退出后,在管道上读取数据,这就是一个会阻塞的操作,只有在管理上读到期数据后,才会返回:
4926 0.000567 <... wait4 resumed> [{WIFEXITED(s) &&WEXITSTATUS(s) == 0}], 0, NULL) = 10209
4926 0.000046 --- SIGCHLD (Child exited) @ 0 (0) ---
4926 0.000040 close(13) = 0
4926 0.000055 close(16) = 0
4926 0.000063 fcntl(15, F_SETFD, FD_CLOEXEC) = 0
4926 0.000056 fcntl(14, F_SETFD, FD_CLOEXEC) = 0
4926 0.000127 fcntl(12, F_SETFD, FD_CLOEXEC) = 0
4926 0.000270 poll([{fd=8, events=POLLIN|POLLRDNORM}, {fd=11,events=POLLIN|POLLRDNORM}, {fd=15, events=POLLIN|POLLRDNORM}, {fd=14,events=0}], 4, -1 <unfinished ...>
10210 0.000197 close(14) = 0
10210 0.000073 close(15) = 0
5、监听进程被阻塞的同时,“子进程2”,也就是进程号为10210的进程,通过exec调用,转而成为Oracle Sever Process:
10210 0.000319 setsid() =10210
10210 0.000088 geteuid() = 500
10210 0.000112 setsid() = -1EPERM (Operation not permitted)
10210 0.000169 execve("/u01/app/oracle/product/11g/bin/oracle",["oracleocp", "(LOCAL=NO)"], [/* 29 vars */]) = 0
6、Server Process执行初始化动作,然后向管道中写入数据:
10210 0.000041 fstat(3, {st_mode=S_IFREG|0644, st_size=12755, ...}) = 0
10210 0.000043 mmap(NULL, 1053208, PROT_READ|PROT_EXEC,MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x2b9880bc3000
10210 0.000031 mprotect(0x2b9880bc5000, 1044480, PROT_NONE) = 0
10210 0.000030 mmap(0x2b9880cc4000, 4096, PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1000) = 0x2b9880cc4000
10210 0.000036 close(3) = 0
10210 0.000054 open("/u01/app/oracle/product/11g/lib/libocr11.so",O_RDONLY) = 3
10210 0.000040 read(3,"\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0`\302\0\0\0\0\0\0"...,832) = 832
10210 0.000039 fstat(3, {st_mode=S_IFREG|0644, st_size=1590995, ...}) = 0
10210 0.000043 mmap(NULL, 4096, PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b9880cc5000
10210 0.000046 mmap(NULL, 1743432, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,3, 0) = 0x2b9880cc6000
10210 0.000031 mprotect(0x2b9880d6d000, 1048576, PROT_NONE) = 0
10210 0.000031 mmap(0x2b9880e6d000, 12288, PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xa7000) = 0x2b9880e6d000
10210 0.000044 close(3) = 0
10210 0.000032 open("/u01/app/oracle/product/11g/lib/libocrb11.so",O_RDONLY) = 3
10210 0.000039 read(3,"\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\340{\0\0\0\0\0\0"...,832) = 832
7、一直到期现在为止,我们还没有看到任何异常的地方。但接下来我们往下看,可以看出问题出在什么地方了:
10210 0.000042 uname({sys-“Linux”,node=”localhost.localdomain”, ….})=0
10210 0.000112 open(“/etc/resolv.conf”,O_RDONLY)=9
10210 0.000047 read(9,”search localdomain\nnameserver 10” …, 4096)=43
这段调用的含义是子进程10210尝试取得node名字localhost.localdomain,接着打开/etc/resolv.conf文件,这个是域名解析的配置文件,接下来d(9,”search localdomain\nnameserver 10” …, 4096)=43,这个地方后面省略的10开头的应该是域名服务器IP地址。表明通过这个服务器解析域名。
接下来是:
10210 0.0000057 connect(9,{sa_family=AF_INET,sin_port=host(53)},sin_addr=inet_addr(“10.54.170.70”)}),28)=0
10210 0.0000056 poll([{fd=9,events=POLLIN}],1,5000 <unfinished ……>
这段调用含义是子进程10210尝试向10.54.170.70这个IP地址,UDP协议端口53,也就是DNS协议端口请求解析域名localhost.localdomain.
Poll是子进程10210在检查返回的数据,5000ms,也就是5s.注意这里的结果是unfinished,表明是在解析域名localhost.localdomain的时候出了问题,等待了5000ms,也就是5s.
接着是:
10210 4.055269 <…poll resumed>) =0 (Timeout)
10210 0.000119 poll([{fd=9,events=POLLIN}],1,5000<unfinished…>
这说明子进程10210在执行poll的时候超时,然后继续poll.
大家数一下上述调后会发现子进程10210一共poll了4次,每次都在等待了5s后超时,所以子进程10210一共等待了20s.
这就是上述库无论什么国连接都需要等待20s后才能连上的本质原因!接下来的监听过程我们无须再分析,因为我们已经找到答案。
检查DNS设置,如果在内网中,不需要访问互联网,直接去掉/etc/resolv.con中DNS Server配置,如需要访问互联网,指定一个可以访问的域名服务器IP地址。
设置了正解的DNS Sever后,上述连接的性能问题不再出现。
看完上面的跟踪日志已基本可以定位问题了:OK先来模拟上面连接缓慢的现象,只有重现现象才才知道问题原来是这么简单啊。
这只修改/etc/resolv.conf,估计写错DNS服务器的IP地址,其它什么都不变。
vi /etc/resolv.conf
; generated by /sbin/dhclient-script
search localdomain
nameserver 192.168.217.130
#注这里192.168.217.130这个IP不是对应真正的DNS服务器,而是随便写了一个IP.
好马上用sqlplus来做连接:
[oracle@ocm ~]$ sqlplus gyj/gyj@ocm
连接非常缓慢,大约等待10S左右,请耐心等待,OK终于连接正去了。。。。后面操作正常的!!!!!!!!!!!!!!
[oracle@ocm ~]$ date;sqlplus gyj/gyj@ocm <<EOF;date
> exit
> EOF
Mon Apr 29 21:54:45 CST 2013
SQL*Plus: Release 11.2.0.1.0 Production on Mon Apr 29 21:54:45 2013
Copyright (c) 1982, 2009, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
gyj@OCM> Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
Mon Apr 29 21:54:56 CST 2013
对比下面的时间:相差11s
Mon Apr 29 21:54:45 CST 2013
Mon Apr 29 21:54:56 CST 2013
解决:在resolv.conf中配置正确的DNS IP.如果数据库服务器不接外网,干掉就去掉nameserver 192.168.217.131这行。
把resolve那个里面的条目写成8.8.8.8连接时间就会变成30秒,比原来的时间稍微长一点(这个8.8.8.8 本机必须ping不同,想尽办法将外网断掉)
****************************************************************************************************
好,如果出现的结果是另一个错误,怎么办???????????????????
报错如下:
[oracle@ocm ~]$ sqlplus gyj/gyj@ocm
SQL*Plus: Release 11.2.0.1.0 Production on Mon Apr 29 20:14:30 2013
Copyright (c) 1982, 2009, Oracle. All rights reserved.
ERROR:
ORA-12545: Connect failed because target host or object does not exist
Enter user-name:
ERROR:
ORA-01017: invalid username/password; logon denied
Enter user-name:
ERROR:
ORA-01017: invalid username/password; logon denied
SP2-0157: unable to CONNECT to ORACLE after 3 attempts, exiting SQL*Plus
对下面的配置做一系列的检查:
1.查/etc/nsswitch.conf 配置
[root@ocm ~]# more /etc/nsswitch.conf
hosts: files dns
2.查/etc/hosts
root@ocm ~]# more /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
192.168.217.130 ocm.example.com ocm
3.查/etc/resolv.conf
[root@ocm ~]# more /etc/resolv.conf
; generated by /sbin/dhclient-script
search localdomain
nameserver 192.168.217.131
4.查DSN
[root@ocm named]# more /var/named/chroot/var/named/example.file
$TTL 86400
@ IN SOA server1.example.com. root (
42 ; serial (d. adams)
3H ; refresh
15M ; retry
1W ; expiry
1D ) ; minimum
IN NS server1.example.com.
server1 IN A 192.168.217.130
ocm IN A 192.168.217.130
ocp IN A 172.34.45.57
/etc/nsswitch.conf 这个文件 定义了查找域名解析的顺序 但不是每个应用都会按照这个生面的顺序去走的
/etc/hosts 默认系统的第一解析文件
/etc/resolv.conf 默认系统定义dnsserver的ip地址
最后一个example.file 区域解析文件,负责整个example.com的解析
**************************************************
要模拟缓慢很简单:(目的是要让走DNS)
1、配一个DSN
具体参考:http://blog.youkuaiyun.com/guoyjoe/article/details/16982179
root@mydb named]# vi /var/named/chroot/var/named/example.file
$TTL 86400
@ IN SOA guoyjoe.example.com. root (
42 ; serial (d. adams)
3H ; refresh
15M ; retry
1W ; expiry
1D ) ; minimum
IN NS guoyjoe.example.com
guoyjoe IN A 192.168.153.129
mydb IN A 192.168.153.129
2、/etc/nsswitch.conf
hosts: dns files --把DNS放在前面解析(原来:hosts: files dns)
3、vi /etc/resolv.conf
; generated by /sbin/dhclient-script
search localdomain
nameserver 192.168.153.130 ----写一个错的DNS(正确的DNS 192.168.153.129)
4、 vi /etc/hosts
192.168.153.129 mydb.example.com mydb
**********本博客所有内容均为原创,如有转载请注明作者和出处!!!**********
Name: guoyJoe
QQ: 252803295
Email: oracledba_cn@hotmail.com
Blog: http://blog.youkuaiyun.com/guoyJoe
ITPUB: http://www.itpub.net/space-uid-28460966.html
OCM: http://education.oracle.com/education/otn/YGuo.HTM
_____________________________________________________________
加群验证问题:哪些SGA结构是必需的,哪些是可选的?否则拒绝申请!!!
答案在:http://blog.youkuaiyun.com/guoyjoe/article/details/8624392
Oracle@Paradise 总群:127149411
Oracle@Paradise No.1群:177089463(已满)
Oracle@Paradise No.2群:121341761
Oracle@Paradise No.3群:140856036