基于epoll异步connect实现

最新推荐文章于 2024-12-24 15:11:43 发布

iteye_4515

最新推荐文章于 2024-12-24 15:11:43 发布

阅读量1.1k

点赞数

文章标签： epoll 网络爬虫

本文详细介绍了如何在高并发环境下，通过设置socket为非阻塞、利用epoll实现异步连接，优化网络爬虫的并发问题。包括超时控制、错误与成功回调处理，以及epoll_wait的优化策略。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

这几天写网络爬虫, 结果服务器长时间阻塞在connect上导致不可容忍的并发问题, 于是着手增加一个异步connect接口.

常规的实现手段为配合select进行检测, 不过其性能对于高并发时会有些问题, 如果想做到结构简单合理采用one peer one thread的处理方式还会引发过多的线程上下文切换导致不必要的性能浪费, 故放弃使用select来实现.

由于服务器网络库采用epoll实现, 故此接口也基于epoll实现. 查阅了一些资料, 总结一下:

1. 设置socket为nonblocking

2. 调用connect

3. 将socket加入epoll监听

4. 检测epollout 或 epollhup, 回调上层处理函数做相应的处理, 如果socket ready(仅检测到epollout) 还需额外通过getsockopt判断sock_error值是否正常

5. 将该fd从epoll中清除

ok, 流程明确后, 就开工吧.

接口需求:

1. 可以设定超时, 方便的控制时间, 精度上定为妙级即可

2. 对于连接成功, 错误, 超时均会回调上层接口进行通知

数据结构和接口:

#define LNET_CONN_SUCESS 0 #define LNET_CONN_TIMEOUT 1 #define LNET_CONN_ERROR 2 typedef union{ int u32; uint64_t u64; void* ptr; }conn_arg_t; typedef void (*pconn)(int fd, int ev, conn_arg_t arg); typedef struct conn_event { int fd; int timeout; int is_block; int conn_time; //start connect time pconn pc; conn_arg_t arg; }conn_ev;

实现:

我们知道, 如果调用connect立刻就成功, 那么我们也无需将其加入epoll进行等待, 这样没有必要, 所以, 可以稍微的优化下 :), 仅当connect的errno为EINPROGRESS时才将其加入epoll

// async connect // return: // > 0 : sucess connect , you can use it // -1 : error // 0 : connect has in process // // arg: // timeout: >= 0 as normal, < 0 infinite int lnet_conn_a(char* ip, int port, int is_block, int timeout, pconn pfunc, conn_arg_t arg) { int sockfd = -1; int s = net_conn_a(ip, port, &sockfd); if( s == 0 ){ // connect sucess return sockfd; } else if( s == -1 ){ // connect error return -1; } else{ // connect has in process conn_ev* cev = (conn_ev*)malloc(sizeof(conn_ev)); cev->fd = sockfd; cev->timeout = timeout; cev->is_block = is_block; cev->conn_time = time(NULL); cev->pc = pfunc; cev->arg.u64 = arg.u64; //copy from arg int vfd = lnet_gen_vfd(); hash_set_int(pnif->ev_pool, vfd, cev); uint64_t event_data = lnet_make_eventdata(LNET_EVENT_TYPE_CONN, vfd); net_epoll_add(pnif->c_epfd, sockfd, FREAD | FWRITE, (void*)event_data); return 0; } }

上面, 我们成功的将处于处理中的链接加入epoll等待通知, 那么接下来看看epoll_wait如何处理:

void* lnet_conn_base(void* arg) { int nfds = 0, i = 0; int epfd = pnif->c_epfd; int last_check_time = 0; struct epoll_event* events = (struct epoll_event*)malloc(sizeof(events) *EPOLL_QUEUE_NUM); printf("start conn thread/n"); while(1){ last_check_time = time(NULL); nfds = epoll_wait(epfd, events, EPOLL_QUEUE_NUM, LNET_CONN_TIME_OUT); for (i=0; i<nfds; ++i){ uint64_t data = events[i].data.u64; int vfd = lnet_get_event_fd((uint64_t)data); // connect close if( events[i].events & (EPOLLHUP | EPOLLERR) ){ uint64_t event_data = lnet_make_eventdata(LNET_EVENT_TYPE_CONN_ERROR, vfd); lnet_push_queue(pnif->r_pool, vfd, (void*)event_data); } if( events[i].events & EPOLLOUT ){ uint64_t event_data = lnet_make_eventdata(LNET_EVENT_TYPE_CONN_SUCESS, vfd); lnet_push_queue(pnif->r_pool, vfd, (void*)event_data); } } if( time(NULL) - last_check_time >= LNET_CONN_CHECK_ALTER ){ lnet_check_conn_timeout(); } } free(events); return NULL; }

这里的timeout处理的优点粗糙, 没有使用time_wheel方式(具体方案详见陈硕的博文 http://blog.youkuaiyun.com/Solstice/archive/2011/05/04/6395098.aspx), 而是采用的蛮力进行轮询, 对于并发连接小时还能忍受.. 有待改进.

接下来, 我们需要处理的就是具体的事件了: 成功, 异常, 超时.

这里简单列举一下成功时的处理:

... 上面省略其他不相关事件处理... else if( event_type == LNET_EVENT_TYPE_CONN_SUCESS ){ conn_ev* cev = hash_get_int(pnif->ev_pool, vfd); if( cev ){ int error = 0; socklen_t len = sizeof(int); if (( 0 == getsockopt(cev->fd, SOL_SOCKET, SO_ERROR, &error, &len) )){ if( 0 == error ){ cev->pc(cev->fd, LNET_CONN_SUCESS, cev->arg); } else{ printf("connect fd has not ready! fd=%d/n", cev->fd); cev->pc(cev->fd, LNET_CONN_ERROR, cev->arg); } free( hash_del_int(pnif->ev_pool, vfd) ); net_epoll_del(pnif->c_epfd, cev->fd); } } lnet_pop_queue(pnif->r_pool, vfd); } ...

这里可以看到, 具体的回调通知调用和getsockopt的检测, so.. 至此, 便完成了异步connect的接口.

当然, 这里面timeout的部分还有待改进, 代码有些部分有些凌乱... 不过重点的在于原理配合代码的理解过程, 如有不正之处, 欢迎批评指出 :)