解析Linux内核日志中的常见错误及解决方案-优快云博客

本文链接：https://blog.youkuaiyun.com/xuyaqun/article/details/5453432

本文详细解读Linux内核日志中出现的常见错误信息，包括UDP校验和错误、过短的UDP数据包、错误的ICMP类型数据、TCP窗口大小异常等问题，并提供相应的解决策略。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

root@dee:/var/log# more kern.log.1
Mar 27 18:25:01 db24 kernel: imklog 4.4.0, log source = /proc/kmsg started.
Mar 28 18:25:01 db24 kernel: imklog 4.4.0, log source = /proc/kmsg started.
Mar 29 18:25:01 db24 kernel: imklog 4.4.0, log source = /proc/kmsg started.
Mar 30 18:25:01 db24 kernel: imklog 4.4.0, log source = /proc/kmsg started.
Mar 31 03:00:21 db24 kernel: [21944411.117618] UDP: bad checksum. From 114.80.134.10:53 to 60.217.236.24:724 ulen 43
Mar 31 03:00:21 db24 kernel: [21944411.357004] UDP: bad checksum. From 114.80.134.10:53 to 60.217.236.24:724 ulen 43
Mar 31 03:00:23 db24 kernel: [21944413.170238] UDP: bad checksum. From 114.80.134.10:53 to 60.217.236.24:724 ulen 43
Mar 31 18:25:01 db24 kernel: imklog 4.4.0, log source = /proc/kmsg started.

解决：

一般类的提示
UDP: bad checksum. From 221.200.X.X:50279 to 218.62.X.X:1155 ulen 24
UDP: short packet: 218.2.X.X:3072 3640/217 to 222.168.X.X:57596
218.26.131.X sent an invalid ICMP type 3, code 13 error to a broadcast: 0.1.0.4 on eth0
服务器收到了一个错误的数据包.分别为 UDP校验和错误; 过短的UDP数据包; 一个错误的ICMP类型数据. 这类信息一般情况下也是非法产生的.
但一般问题不大可直接忽略.

The message orginates from the udp (udp.c) part of the ipv4 TCP/IP protocol
suite in the Linux kernel.
[udp.c:printk(KERN_DEBUG "UDP: bad checksum. From %d.%d.%d.%d:%d to
%d.%d.%d.%d:%d ulen %d/n",..]
Machine 192.168.1.200 is sending malformed UPD-packets to ip-address
10.1.0.3, port 62516. Size is 25 bytes.
CIPE does not check the UDP checksum, leaving it to the TCP/IP stack.

As far I can see, port 62516 is not used by CIPE, so probably it is not
_directly_ related to CIPE. I also can't find the ip-address 192.168.1.200
in your configs, so it seems that the orginating address isn't related to
the CIPE setup too.

Possibly some network configuration error is causing this.

root@debian:/var/log# more kern.log.1
Mar 28 06:25:01 web23 kernel: imklog 4.4.0, log source = /proc/kmsg started.
Mar 29 06:25:01 web23 kernel: imklog 4.4.0, log source = /proc/kmsg started.
Mar 30 06:25:01 web23 kernel: imklog 4.4.0, log source = /proc/kmsg started.
Mar 31 06:25:02 web23 kernel: imklog 4.4.0, log source = /proc/kmsg started.
Apr 1 06:25:01 web23 kernel: imklog 4.4.0, log source = /proc/kmsg started.
Apr 2 06:25:03 web23 kernel: imklog 4.4.0, log source = /proc/kmsg started.
Apr 2 06:25:03 web23 kernel: Kernel logging (proc) stopped.

root@debian:/var/log# more kern.log
Apr 4 06:25:05 kin kernel: imklog 3.18.6, log source = /proc/kmsg started.
Apr 4 08:56:18 kin kernel: [11957718.533521] TCP: Treason uncloaked! Peer 123.234.68.29:2822/8080 shrinks window 1398217317:1398225373. Repaired.
Apr 4 09:00:34 kin kernel: [11957979.472506] TCP: Treason uncloaked! Peer 61.178.208.178:1977/8080 shrinks window 1020634706:1020655166. Repaired.
Apr 4 09:00:36 kin kernel: [11957982.168668] TCP: Treason uncloaked! Peer 61.178.208.178:1977/8080 shrinks window 1020634706:1020655166. Repaired.

解决：

> Our researches so far indicate the problem may be a buggy TCP stack
> in the client, that is in the DP301P+. But we still do not know
> exactly what caused the problem, nor how to prevent it happening
> again.

That comes from the kernel tcp code below.  Looks like the DLink has
returned information yielding a transmit window smaller than it
previously did; specifically it returned a window of zero plus an ack
of up to byte 3957222360, thus indicating that it can accept nothing
after that byte.  Previously it had sent some ack+wnd values
indicating that it would accept up to byte 3957222379.

The Linux side is now supposed to send a packet every now and then
forever until the returned window is nonzero.  It does.

However, the dlink is apparently not responding in a timely manner.
Any response would either open the window or update the rcv timestamp
such that the thing will retransmit forever.  It may be responding
very slowly, or just not responding at all.

The kernel prints the message after it expected but did not see a
response to the probe packet it sent to check for a nonzero window.
The kernel implements exponential backoff retransmissions until it
hasn't seen any response in 2m, then it will bail and close the
connection.  This is reasonable.  It's unclear from your report if the
connections are failing outright or just sometimes having to
retransmit a probe against a peer that shrank the window.

The remote host decided to shrink the TCP window size without negotiating such with your Linux box. The message is of the informational level, meaning Linux doesn't like what it is seeing but will cope with it and carry on.

意思是：在远程client和服务器通信的过程中，由于client没有和服务器协商，就减少了 TCP window的长度，也就是包的大小变小了，因此有这类的提示信息，此时服务器依然处理该类请求，但是处理的速度会比之前较差。

大致意思是说:这是一段一般性提示信息,说明远程主机在未经Linux主机"同意"就收缩了TCP window size,虽然Linux主机不喜欢这类举动,但是仍将继续处理这类请求.

这是一种解释,也就是这类信息可能并非危险.

出现这种情况的时候，最好看看你的服务器流量监控情况，看看是否有流量异常，如果带宽被占满，有可能被电信或者联通进行了带宽限制，最好的方法是花费银子买带宽，或者看看你的web服务器有没有可优化的地方，例如gzip是否开启，响应时间是合适等等，这些手段比被电信限制你的龟速访问的体验好一些，是没钱人的垂死挣扎的手段，不是最终的方法。

此类问题，增加带宽或许是最好的方式。

另外一段解释来自Debian的邮件列表,首先回复者引用了一段源代码说明错误的来源:

>>From /usr/src/linux/net/ipv4/tcp_timer.c:

        if (tp->snd_wnd == 0 && !sk->dead &&
            !((1<<sk->state)&(TCPF_SYN_SENT|TCPF_SYN_RECV))) {
                /* Receiver dastardly shrinks window. Our retransmits
                 * become zero probes, but we should not timeout this
                 * connection. If the socket is an orphan, time it out,
                 * we cannot allow such beasts to hang infinitely.
                 */
#ifdef TCP_DEBUG
                if (net_ratelimit())
                        printk(KERN_DEBUG "TCP: Treason uncloaked! Peer
%u.%u.%u.%u:%u/%u shrinks window %u:%u. Repaired./n",
                               NIPQUAD(sk->daddr), htons(sk->dport), sk->num,
                               tp->snd_una, tp->snd_nxt);
#endif

具体的解释是:

So it appears that someone is running some sort of "tar-pit" system that is
designed to keep sockets in a bad state and run you out of kernel memory.

I suspect that this ties in with the spam blocking things we recently
discussed. Maybe you should tell your ISP that they are to blame for such
actions being done to you and that they should "give you face" (I think that
was the term you used) by closing their open relays.

作者认为这可能和tar-pit攻击相关.并且建议联系ISP提供解决方案.

还有如下的解释:

The reason Linux is printing such messages is because your client guy is shrinking the TCP Window to 0, and the server has something to retransmit. There is something seriously wrong with your client's stack. Which Stack/OS are you using on he client side, and which browser?

That could explain your browser showing some html tags as the server fails to send the whole page across and based on what browser you are using it is failing to parse it out.

意思是,这类错误也可能是客户端Stack错误引起的.

i believe the "TCP: Treason Uncloaked" messages were somehow involved with SYN flood attacks. I think that they were also responsible for causing the rapid depletion of entropy.

To review what was done:

Hardware:

Ethernet card replacement from Intel e1000 to Broadcom 1000.
Switch port change from 3 to 5.

OS:

Changed rngd settings to increase entropy
Made changes to sysctl kernel ipv4 stack as prescribed in Gentoo and WHT pages.
I think the network outtages may be caused by a dos worm or some such thing, if restarting the network is fixing the outage. So this calls for a smarter iptables script! Added syn flood limit, icmp limit, and fixed eth0->eth2 directives. Syn flood limit was a terrible idea. It DOS'ed right away. Duh. You can't reliably limit syn floods via iptables.
Now trying to recompile the kernel with syncookies enabled. It looks like syn cookies may be the answer to all this garbage. Seems to be working pretty well! I THINK THIS FIXED THE PROBLEM!

http://cr.yp.to/syncookies.html

Also dropping pings which occur at a rate greater than one per second, it is logging them and there were indeed a bunch of pings coming in with large sizes.
While I'm at it, increasing the backlog:

sysctl -w net.ipv4.tcp_max_syn_backlog="2048" from: http://www.securityfocus.com/infocus/1729

Also: net.ipv4.tcp_synack_retries = 2 net.ipv4.tcp_abort_on_overflow = 1

没有确定的答案，遭受攻击的可能性大。参见http://www.informedbanking.com/wiki/TCP_Treason_Uncloaked

4. # more /var/log/nginx/error.log

2010/05/05 09:38:55 [error] 6233#0: *1572820 open() "/data/webroot/wow/js/jquery.js" failed (2: No such file or directory), client: 124.128.18.162, server: wow.china.com, request: "GET /js/jquery.js HTTP/1.1", host: " wow.china.com", referrer: "http:// wow.china.com/"
2010/05/05 09:38:55 [error] 6233#0: *1572824 open() "/data/webroot/wow/js/jquery.floatfix.js" failed (2: No such file or directory), client: 124.128.18.162, server: wow.china.com, request: "GET /js/jquery.floatfix.js HTTP/1.1", host: " wow.china.com", referrer: "http:// wow.china.com/"

原因：程序写错路径、无用的代码删除后恢复正常。

2010-5-10

5.#more /var/log/mail.log邮件服务器

最早出现的时间May 7 18:45:31

May 7 18:45:31 kin postfix/smtpd[5305]: connect from cnsmtpr2.tom.com[218.30.111.152]
May 7 18:45:31 kin postfix/smtpd[5305]: E959B4DC05: client=cnsmtpr2.tom.com[218.30.111.152]
May 7 18:45:32 kin postfix/cleanup[5309]: E959B4DC05: message-id=<4BE3EF4B.00008B.07181@cnapp61>
May 7 18:45:32 kin postfix/qmgr[6672]: E959B4DC05: from=<c111055abfdc@tom.com>, size=8480, nrcpt=1 (queue active)
May 7 18:45:32 kin postfix/smtpd[5305]: disconnect from cnsmtpr2.tom.com[218.30.111.152]
May 7 18:45:33 kin amavis[26583]: (26583-20) (!)ClamAV-clamd: Can't connect to UNIX socket /var/run/clamav/clamd.ctl: 111, retrying (2)
May 7 18:45:39 kin amavis[26583]: (26583-20) (!!)ClamAV-clamd av-scanner FAILED: run_av error: Too many retries to talk to /var/run/clamav/clamd.ctl (Can't connect to UNIX socket /var/run/clamav/clamd.ctl: Connection refused) at (eval 88) line 309.
May 7 18:45:39 kin amavis[26583]: (26583-20) (!!)WARN: all primary virus scanners failed, considering backups
May 7 18:45:39 kin amavis[26583]: (26583-20) (!!)ClamAV-clamscan av-scanner FAILED: /usr/bin/clamscan DIED on signal 11 (000b) at (eval 88) line 527.
May 7 18:45:39 kin amavis[26583]: (26583-20) (!!)TROUBLE in check_mail: virus_scan FAILED: virus_scan: ALL VIRUS SCANNERS FAILED: ClamAV-clamd av-scanner FAILED: run_av error: Too many retries to talk to /var/run/clamav/clamd.ctl (Can't connect to UNIX socket /var/run/clamav/clamd.ctl: Connection refused) at (eval 88) line 309.; ClamAV-clamscan av-scanner FAILED: /usr/bin/clamscan DIED on signal 11 (000b) at (eval 88) line 527.
May 7 18:45:39 kin amavis[26583]: (26583-20) (!)PRESERVING EVIDENCE in /var/lib/amavis/tmp/amavis-20100507T101533-26583
May 7 18:45:39 kin postfix/smtp[5310]: E959B4DC05: to=<hr@kin.com>, relay=127.0.0.1[127.0.0.1]:10024, delay=7.7, delays=0.47/0/0/7.3, dsn=4.5.0, status=deferred (host 127.0.0.1[127.0.0.1] said: 451 4.5.0 Error in processing, id=26583-20, virus_scan FAILED: virus_scan: ALL VIRUS SCANNERS FAILED: ClamAV-clamd av-scanner FAILED: run_av error: Too many retries to talk to /var/run/clamav/clamd.ctl (Can't connect to UNIX socket /var/run/clamav/clamd.ctl: Connection refused) at (eval 88) line 309.; ClamAV-clamscan av-scanner FAILED: /usr/bin/clamscan DIED on signal 11 (000b) at (eval 88) line 527. (in reply to end of DATA command))

原因：网上有人遇到事因为升级病毒库后出现这种情况

解决方法：先重启clamav-daemon、clamav-freshclam然后重启amavis

Please try to restart the clamav daemon and afterwards the amavis daemon.

6.root@kiss:/var/log# more mail.err 邮件服务器告警日志

May 16 10:19:10 kiss postfix/smtp[3713]: warning: numeric domain name in resource data of MX record for zhonghaolawfirm.
com: 219.153.37.118
May 16 11:29:10 kiss postfix/smtp[4082]: warning: numeric domain name in resource data of MX record for zhonghaolawfirm.
com: 219.153.37.118

原因：该域的mx记录设置为ip地址，这违反了RFC的规范。正确的做法是设置mx记录指向A记录，然后设置A记录指向ip地址。

The MX record on the domain is set to an IP address. This violates the
RFC for DNS RRsets. You need to create an A record for that IP address
and point the MX record to that A record. This will solve your problem.
If you do not have control over this domain, the domain administrators
need to learn how to set things up properly.