【网络编程】TCP连接connect几次syn之后一直返回EINVAL问题

最近遇到一个网络问题,一个客户端线程在connect的时候,发几次syn之后不发了,每次connect都返回EINVAL。

用strace追踪了,connect的第一次参数socketfd并未变动,而且地址和端口号也是正确的,第三个参数len更是用sizeof获得的肯定不会有问题。

还好问题比较好复现。

逐步加打印是在__inet_stream_connect函数中返回的EINVAL
https://elixir.bootlin.com/linux/v5.15.178/source/net/ipv4/af_inet.c#L649

	switch (sock->state) {
	default:
		err = -EINVAL; /* 后面connect系统调用一直返回-22,而不触发syn报文发送 */
		goto out;
	case SS_CONNECTED:
		err = -EISCONN;
		goto out;
	case SS_CONNECTING:
		if (inet_sk(sk)->defer_connect)
			err = is_sendmsg ? -EINPROGRESS : -EISCONN;
		else
			err = -EALREADY;
		/* Fall out of switch with err, set for this state */
		break;
	case SS_UNCONNECTED:
		err = -EISCONN;
		if (sk->sk_state != TCP_CLOSE)
			goto out;

		if (BPF_CGROUP_PRE_CONNECT_ENABLED(sk)) {
			err = sk->sk_prot->pre_connect(sk, uaddr, addr_len);
			if (err)
				goto out;
		}
... ...
		err = sk->sk_prot->connect(sk, uaddr, addr_len);
		if (err < 0)
			goto out;

		sock->state = SS_CONNECTING;
	/* Connection was closed by RST, timeout, ICMP error
	 * or another process disconnected us.
	 */
	if (sk->sk_state == TCP_CLOSE)
		goto sock_error;

	/* sk->sk_err may be not zero now, if RECVERR was ordered by user
	 * and error was received after socket entered established state.
	 * Hence, it is handled normally after connect() return successfully.
	 */

	sock->state = SS_CONNECTED;
	err = 0;
out:
	return err;

sock_error:
	err = sock_error(sk) ? : -ECONNABORTED;
	sock->state = SS_UNCONNECTED;
	if (sk->sk_prot->disconnect(sk, flags))
		sock->state = SS_DISCONNECTING; /* 注意这里是关键,最后一次syn之后超时,disconnect返回失败就把sock状态设置成disconnecting */
	goto out;
}

继续加打印为什么sk->sk_prot->disconnect会返回失败?返回值是EBUSY
就是这里:
https://elixir.bootlin.com/linux/v5.15.178/source/net/ipv4/tcp.c#L2989

int tcp_disconnect(struct sock *sk, int flags)
{
... ...
	/* Deny disconnect if other threads are blocked in sk_wait_event()
	 * or inet_wait_for_connect().
	 */
	if (sk->sk_wait_pending)
		return -EBUSY; /* 这里返回出错 */

那就是sk_wait_pending值不为0,那看sk_wait_pending修改的位置
https://elixir.bootlin.com/linux/v5.15.178/source/include/net/sock.h#L1128

#define sk_wait_event(__sk, __timeo, __condition, __wait)		\
	({	int __rc;						\
		__sk->sk_wait_pending++;				\
		release_sock(__sk);					\
		__rc = __condition;					\
		if (!__rc) {						\
			*(__timeo) = wait_woken(__wait,			\
						TASK_INTERRUPTIBLE,	\
						*(__timeo));		\
		}							\
		sched_annotate_sleep();					\
		lock_sock(__sk);					\
		__sk->sk_wait_pending--;				\
		__rc = __condition;					\
		__rc;							\
	})

而sk_wait_event是在
https://elixir.bootlin.com/linux/v5.15.178/source/net/core/stream.c#L75

/**
 * sk_stream_wait_connect - Wait for a socket to get into the connected state
 * @sk: sock to wait on
 * @timeo_p: for how long to wait
 *
 * Must be called with the socket locked.
 */
int sk_stream_wait_connect(struct sock *sk, long *timeo_p)
{
	DEFINE_WAIT_FUNC(wait, woken_wake_function);
	struct task_struct *tsk = current;
	int done;

	do {
		int err = sock_error(sk);
		if (err)
			return err;
		if ((1 << sk->sk_state) & ~(TCPF_SYN_SENT | TCPF_SYN_RECV))
			return -EPIPE;
		if (!*timeo_p)
			return -EAGAIN;
		if (signal_pending(tsk))
			return sock_intr_errno(*timeo_p);

		add_wait_queue(sk_sleep(sk), &wait);
		sk->sk_write_pending++;
		done = sk_wait_event(sk, timeo_p,
				     !READ_ONCE(sk->sk_err) &&
				     !((1 << READ_ONCE(sk->sk_state)) &
				       ~(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)), &wait);
		remove_wait_queue(sk_sleep(sk), &wait);
		sk->sk_write_pending--;
	} while (!done);
	return 0;
}
EXPORT_SYMBOL(sk_stream_wait_connect);

sk_stream_wait_connect这个是在tcp send的时候调用的。
加打印可以看到connect线程和send线程在同时操作这个socketfd,根本原因是connect线程连接发送几个syn包后连接失败返回超时,内核会执行disconnect,而此时正好send线程走到wait for connect中,导致disconnect失败返回EBUSY,进而把sock状态设置成了disconnecting,后面每次connect系统调用就会直接返回EINVAL,不会触发syn报文的发送。

解决办法就是在send参数的flags中传递MSG_DONTWAIT,使得send线程不会去走wait for connect,如果未connect直接返回错误。这时connect线程每次调用都会触发syn报文。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值