2017-04-26 03:41:59,354 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception for BP-1687617170-10.8.211.11-1404135347814:blk_8843355098_1109841848870
java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.8.147.58:50010 remote=/10.8.144
.40:52022]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at java.io.DataInputStream.read(DataInputStream.java:149)
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:446)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:702)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:739)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
at java.lang.Thread.run(Thread.java:745)
2017-04-26 03:41:59,354 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOExc
2017-04-26 03:41:09,552 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.8.147.51, datanodeUuid=cd05df30-c9ce-4e3d-bbe3-f4ea6aecc1cc, infoPort=50075, ipcPort=50020, storageInfo=lv=-55;cid=CID-df8798d4-d28a-4aba-9446-8d17b04a16b7;nsid=423722359;c=1404135597320):Got exception while serving BP-1687617170-10.8.211.11-1404135347814:blk_8835640234_1109834110012 to /10.8.216.18:56644
java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/10.8.147.51:50010 remote=/10.8.216.18:56644]
at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172)
at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220)
at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:547)
at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:716)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:506)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:110)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
at java.lang.Thread.run(Thread.java:745)
2017-04-26 03:41:09,552 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: dn0205:50010:DataXceiver error processing READ_BLOCK operation src: /10.8.216.18:56644 dst: /10.8.147.51:50010
java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/10.8.147.51:50010 remote=/10.8.216.18:56644]
at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172)
at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220)
at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:547)
at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:716)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:506)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:110)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
at java.lang.Thread.run(Thread.java:745)
Also we can see SYN flooding cause the java task enter hung status .
Apr 26 02:43:18 dn0205 kernel: possible SYN flooding on port 13562. Sending cookies.
Apr 26 03:53:06 dn0205 kernel: INFO: task java:11671 blocked for more than 120 seconds.
Apr 26 03:53:06 dn0205 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 26 03:53:06 dn0205 kernel: INFO: task java:11671 blocked for more than 120 seconds.
Apr 26 03:53:06 dn0205 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 26 03:53:06 dn0205 kernel: java D 0000000000000013 0 11671 29690 0x00000080
Apr 26 03:53:06 dn0205 kernel: ffff88096dd9fe08 0000000000000082 0000000000000000 ffff88096dd9fde8
Apr 26 03:53:06 dn0205 kernel: 0000000000000000 00000000fffffffb ffff8803d83efb98 ffff88096dd9fde8
Apr 26 03:53:06 dn0205 kernel: ffff88098a8cf058 ffff88096dd9ffd8 000000000000fb88 ffff88098a8cf058
Apr 26 03:53:06 dn0205 kernel: Call Trace:
Apr 26 03:53:06 dn0205 kernel: [<ffffffff81090ece>] ? prepare_to_wait+0x4e/0x80
Apr 26 03:53:06 dn0205 kernel: [<ffffffffa01c9935>] log_wait_commit+0xc5/0x140 [jbd]
Apr 26 03:53:06 dn0205 kernel: [<ffffffff81090be0>] ? autoremove_wake_function+0x0/0x40
Apr 26 03:53:06 dn0205 kernel: [<ffffffffa01c9766>] ? __log_start_commit+0x36/0x40 [jbd]
Apr 26 03:53:06 dn0205 kernel: [<ffffffffa01da5e6>] ext3_sync_file+0x126/0x1a0 [ext3]
Apr 26 03:53:06 dn0205 kernel: [<ffffffff81110898>] ? filemap_write_and_wait_range+0x78/0x90
Apr 26 03:53:06 dn0205 kernel: [<ffffffff811a50e1>] vfs_fsync_range+0xa1/0xe0
Apr 26 03:53:06 dn0205 kernel: [<ffffffff811a518d>] vfs_fsync+0x1d/0x20
Apr 26 03:53:06 dn0205 kernel: [<ffffffff811a51ce>] do_fsync+0x3e/0x60
Apr 26 03:53:06 dn0205 kernel: [<ffffffff811a5220>] sys_fsync+0x10/0x20
Apr 26 03:53:06 dn0205 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Apr 26 03:53:06 dn0205 kernel: INFO: task java:14301 blocked for more than 120 seconds.
Apr 26 03:53:06 dn0205 kernel: java D 0000000000000013 0 11671 29690 0x00000080
Apr 26 03:53:06 dn0205 kernel: ffff88096dd9fe08 0000000000000082 0000000000000000 ffff88096dd9fde8
Apr 26 03:53:06 dn0205 kernel: 0000000000000000 00000000fffffffb ffff8803d83efb98 ffff88096dd9fde8
Apr 26 03:53:06 dn0205 kernel: ffff88098a8cf058 ffff88096dd9ffd8 000000000000fb88 ffff88098a8cf058
Apr 26 03:53:06 dn0205 kernel: Call Trace:
Apr 26 03:53:06 dn0205 kernel: [<ffffffff81090ece>] ? prepare_to_wait+0x4e/0x80
Apr 26 03:53:06 dn0205 kernel: [<ffffffffa01c9935>] log_wait_commit+0xc5/0x140 [jbd]
Apr 26 03:53:06 dn0205 kernel: [<ffffffff81090be0>] ? autoremove_wake_function+0x0/0x40
Apr 26 03:53:06 dn0205 kernel: [<ffffffffa01c9766>] ? __log_start_commit+0x36/0
We can see sync flooding on other 2 nodes as well.
Apr 26 02:25:35 dn0324 kernel: possible SYN flooding on port 13562. Sending cookies.
Apr 26 02:25:35 dn0324 kernel: possible SYN flooding on port 13562. Sending cookies.
Apr 26 03:26:43 dn0324 kernel: possible SYN flooding on port 13562. Sending cookies.
Apr 26 03:26:43 dn0324 kernel: possible SYN flooding on port 13562. Sending cookies.
Apr 26 07:24:03 dn0229 kernel: possible SYN flooding on port 13562. Sending cookies.
Apr 26 07:24:03 dn0229 kernel: possible SYN flooding on port 13562. Sending cookies.
Apr 26 07:25:52 dn0229 kernel: possible SYN flooding on port 13562. Sending cookies.
Apr 26 07:25:52 dn0229 kernel: possible SYN flooding on port 13562. Sending cookies.
Apr 26 07:29:25 dn0229 kernel: possible SYN flooding on port 13562. Sending cookies.
Apr 26 07:29:25 dn0229 kernel: possible SYN flooding on port 13562. Sending cookies.
Apr 26 11:16:26 dn0229 kernel: possible SYN flooding on port 13562. Sending cookies.
Apr 26 11:16:26 dn0229 kernel: possible SYN flooding on port 13562. Sending cookies.
Apr 26 11:23:02 dn0229 auditd[23920]: Audit daemon rotating log files
Apr 26 11:23:02 dn0229 auditd[23920]: Audit daemon rotating log files
Apr 26 13:29:08 dn0229 kernel: possible SYN flooding on port 13562. Sending cookies.
Apr 26 13:29:08 dn0229 kernel: possible SYN flooding on port 13562. Sending cookies.
Apr 26 14:55:32 dn0229 kernel: possible SYN flooding on port 13562. Sending cookies.
java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.8.147.58:50010 remote=/10.8.144
.40:52022]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at java.io.DataInputStream.read(DataInputStream.java:149)
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:446)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:702)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:739)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:124)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
at java.lang.Thread.run(Thread.java:745)
2017-04-26 03:41:59,354 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOExc
2017-04-26 03:41:09,552 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.8.147.51, datanodeUuid=cd05df30-c9ce-4e3d-bbe3-f4ea6aecc1cc, infoPort=50075, ipcPort=50020, storageInfo=lv=-55;cid=CID-df8798d4-d28a-4aba-9446-8d17b04a16b7;nsid=423722359;c=1404135597320):Got exception while serving BP-1687617170-10.8.211.11-1404135347814:blk_8835640234_1109834110012 to /10.8.216.18:56644
java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/10.8.147.51:50010 remote=/10.8.216.18:56644]
at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172)
at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220)
at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:547)
at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:716)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:506)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:110)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
at java.lang.Thread.run(Thread.java:745)
2017-04-26 03:41:09,552 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: dn0205:50010:DataXceiver error processing READ_BLOCK operation src: /10.8.216.18:56644 dst: /10.8.147.51:50010
java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected local=/10.8.147.51:50010 remote=/10.8.216.18:56644]
at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172)
at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220)
at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:547)
at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:716)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:506)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:110)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:68)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:232)
at java.lang.Thread.run(Thread.java:745)
Also we can see SYN flooding cause the java task enter hung status .
Apr 26 02:43:18 dn0205 kernel: possible SYN flooding on port 13562. Sending cookies.
Apr 26 03:53:06 dn0205 kernel: INFO: task java:11671 blocked for more than 120 seconds.
Apr 26 03:53:06 dn0205 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 26 03:53:06 dn0205 kernel: INFO: task java:11671 blocked for more than 120 seconds.
Apr 26 03:53:06 dn0205 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 26 03:53:06 dn0205 kernel: java D 0000000000000013 0 11671 29690 0x00000080
Apr 26 03:53:06 dn0205 kernel: ffff88096dd9fe08 0000000000000082 0000000000000000 ffff88096dd9fde8
Apr 26 03:53:06 dn0205 kernel: 0000000000000000 00000000fffffffb ffff8803d83efb98 ffff88096dd9fde8
Apr 26 03:53:06 dn0205 kernel: ffff88098a8cf058 ffff88096dd9ffd8 000000000000fb88 ffff88098a8cf058
Apr 26 03:53:06 dn0205 kernel: Call Trace:
Apr 26 03:53:06 dn0205 kernel: [<ffffffff81090ece>] ? prepare_to_wait+0x4e/0x80
Apr 26 03:53:06 dn0205 kernel: [<ffffffffa01c9935>] log_wait_commit+0xc5/0x140 [jbd]
Apr 26 03:53:06 dn0205 kernel: [<ffffffff81090be0>] ? autoremove_wake_function+0x0/0x40
Apr 26 03:53:06 dn0205 kernel: [<ffffffffa01c9766>] ? __log_start_commit+0x36/0x40 [jbd]
Apr 26 03:53:06 dn0205 kernel: [<ffffffffa01da5e6>] ext3_sync_file+0x126/0x1a0 [ext3]
Apr 26 03:53:06 dn0205 kernel: [<ffffffff81110898>] ? filemap_write_and_wait_range+0x78/0x90
Apr 26 03:53:06 dn0205 kernel: [<ffffffff811a50e1>] vfs_fsync_range+0xa1/0xe0
Apr 26 03:53:06 dn0205 kernel: [<ffffffff811a518d>] vfs_fsync+0x1d/0x20
Apr 26 03:53:06 dn0205 kernel: [<ffffffff811a51ce>] do_fsync+0x3e/0x60
Apr 26 03:53:06 dn0205 kernel: [<ffffffff811a5220>] sys_fsync+0x10/0x20
Apr 26 03:53:06 dn0205 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Apr 26 03:53:06 dn0205 kernel: INFO: task java:14301 blocked for more than 120 seconds.
Apr 26 03:53:06 dn0205 kernel: java D 0000000000000013 0 11671 29690 0x00000080
Apr 26 03:53:06 dn0205 kernel: ffff88096dd9fe08 0000000000000082 0000000000000000 ffff88096dd9fde8
Apr 26 03:53:06 dn0205 kernel: 0000000000000000 00000000fffffffb ffff8803d83efb98 ffff88096dd9fde8
Apr 26 03:53:06 dn0205 kernel: ffff88098a8cf058 ffff88096dd9ffd8 000000000000fb88 ffff88098a8cf058
Apr 26 03:53:06 dn0205 kernel: Call Trace:
Apr 26 03:53:06 dn0205 kernel: [<ffffffff81090ece>] ? prepare_to_wait+0x4e/0x80
Apr 26 03:53:06 dn0205 kernel: [<ffffffffa01c9935>] log_wait_commit+0xc5/0x140 [jbd]
Apr 26 03:53:06 dn0205 kernel: [<ffffffff81090be0>] ? autoremove_wake_function+0x0/0x40
Apr 26 03:53:06 dn0205 kernel: [<ffffffffa01c9766>] ? __log_start_commit+0x36/0
We can see sync flooding on other 2 nodes as well.
Apr 26 02:25:35 dn0324 kernel: possible SYN flooding on port 13562. Sending cookies.
Apr 26 02:25:35 dn0324 kernel: possible SYN flooding on port 13562. Sending cookies.
Apr 26 03:26:43 dn0324 kernel: possible SYN flooding on port 13562. Sending cookies.
Apr 26 03:26:43 dn0324 kernel: possible SYN flooding on port 13562. Sending cookies.
Apr 26 07:24:03 dn0229 kernel: possible SYN flooding on port 13562. Sending cookies.
Apr 26 07:24:03 dn0229 kernel: possible SYN flooding on port 13562. Sending cookies.
Apr 26 07:25:52 dn0229 kernel: possible SYN flooding on port 13562. Sending cookies.
Apr 26 07:25:52 dn0229 kernel: possible SYN flooding on port 13562. Sending cookies.
Apr 26 07:29:25 dn0229 kernel: possible SYN flooding on port 13562. Sending cookies.
Apr 26 07:29:25 dn0229 kernel: possible SYN flooding on port 13562. Sending cookies.
Apr 26 11:16:26 dn0229 kernel: possible SYN flooding on port 13562. Sending cookies.
Apr 26 11:16:26 dn0229 kernel: possible SYN flooding on port 13562. Sending cookies.
Apr 26 11:23:02 dn0229 auditd[23920]: Audit daemon rotating log files
Apr 26 11:23:02 dn0229 auditd[23920]: Audit daemon rotating log files
Apr 26 13:29:08 dn0229 kernel: possible SYN flooding on port 13562. Sending cookies.
Apr 26 13:29:08 dn0229 kernel: possible SYN flooding on port 13562. Sending cookies.
Apr 26 14:55:32 dn0229 kernel: possible SYN flooding on port 13562. Sending cookies.
Apr 26 14:55:32 dn0229 kernel: possible SYN flooding on port 13562. Sending cookies.
increase net.core.somaxconn and net.ipv4.tcp_max_syn_backlog to handle this
我们的系统默认配置如下,明显太小了
[20:00]:[root@dn0324:~]# sysctl -a | grep somaxconn
net.core.somaxconn = 128
http://www.sjsjw.com/kf_cloud/article/315_32763_12365.asp