General way to implement tcp servers is “one thread/process per connection”. But on high loads this approach can be not so efficient and we need to
实现tcp服务器通常的方法是“每个线程/进程服务一个连接”。但是在高负载的情况下,这个方法不是很有效率,我们需要
use another patterns of connection handling. In this article I will describe how to implement tcp-server with synchronous connections handling using
另外一种连接处理的模式。 在这篇文章里我将描述怎样使用Linux 2.6内核里的epoll()系统调用实现同步连接的tcp-server。
epoll() system call of Linux 2.6. kernel.
epoll is a new system call introduced in Linux 2.6. It is designed to replace the deprecated select (and also poll). Unlike these earlier system
epoll 是一个新的系统调用,它是在Linux2.6才引入的。它被设计用来替换备受批评的select(以及poll)。 不像这些早期的系统
calls, which are O(n), epoll is an O(1) algorithm – this means that it scales well as the number of watched file descriptors increase. select uses a
调用是O(n)的算法,而epoll是O(1)的算法,这意味着它能很好地扩展随着监测的文件描述符增加。select使用一个
linear search through the list of watched file descriptors, which causes its O(n) behaviour, whereas epoll uses callbacks in the kernel file
线性的遍历查询监测的文件描述符,这是一个复杂度为O(n)算法,然而epoll是在内核文件结构里使用回调。
structure.
Another fundamental difference of epoll is that it can be used in an edge-triggered, as opposed to level-triggered, fashion. This means that you
另外一个基本的不同是epoll能在edge-triggered(边界触发)和与之对应的level-triggered(级别触发)中被使用。这意味着你
receive “hints” when the kernel believes the file descriptor has become ready for I/O, as opposed to being told “I/O can be carried out on this file
会接受到“提示”,当内核认为文件描述符已经为I/O操作准备好时,对应的你会被告知“在这个文件描述符上I/O能被执行”。
descriptor”. This has a couple of minor advantages: kernel space doesn’t need to keep track of the state of the file descriptor, although it might
这样有几个辅助的好处:内核空间不需要跟踪文件描述符的状态,而可能
just push that problem into user space, and user space programs can be more flexible (e.g. the readiness change notification can just be ignored).
只是把这个问题放在用户空间里,这样用户空间程序就会更加灵活(例如,准备就绪的改变通知能被忽略)。
To use epoll method you need to make following steps in your application:
为了使用epoll方法,你需要在你的应用程序里做下面几个步骤:
- Create specific file descriptor for epoll calls:
- 为epoll调用建立特殊的文件描述符:
epfd = epoll_create(EPOLL_QUEUE_LEN);
where EPOLL_QUEUE_LEN is the maximum number of connection descriptors you expect to manage at one time. The return value is a file descriptor that will be used in epoll calls later. This descriptor can be closed with close() when you do not longer need it. - After first step you can add your descriptors to epoll with following call:
static struct epoll_event ev;
int client_sock;
...
ev.events = EPOLLIN | EPOLLPRI | EPOLLERR | EPOLLHUP;
ev.data.fd = client_sock;
int res = epoll_ctl(epfd, EPOLL_CTL_ADD, client_sock, &ev);
where ev is epoll event configuration sctucture, EPOLL_CTL_ADD – predefined command constant to add sockets to epoll. Detailed description of epoll_ctl flags can be found in epoll_ctl(2) man page. When client_sock descriptor will be closed, it will be automatically deleted from epoll descriptor. - When all your descriptors will be added to epoll, your process can idle and wait to something to do with epoll’ed sockets:
while (1) {
// wait for something to do...
int nfds = epoll_wait(epfd, events,
MAX_EPOLL_EVENTS_PER_RUN,
EPOLL_RUN_TIMEOUT);
if (nfds < 0) die("Error in epoll_wait!");// for each ready socket
for(int i = 0; i < nfds; i++) {
int fd = events[i].data.fd;
handle_io_on_socket(fd);
}
}
Typical architecture of your application (networking part) is described below. This architecture allow almost unlimited scalability of your application on single and multi-processor systems:
- Listener – thread that performs bind() and listen() calls and waits for incoming conncetions. Then new connection arrives, this thread can do accept() on listening socket an send accepted connection socket to one of the I/O-workers.
- I/O-Worker(s) – one or more threads to receive connections from listener and to add them to epoll. Main loop of the generic I/O-worker looks like last step of epoll using pattern described above.
- Data Processing Worker(s) – one or more threads to receive data from and send data to I/O-workers and to perform data processing.
As you can see, epoll() API is very simple but believe me, it is very powerful. Linear scalability allows you to manage huge amounts of parallel connections with small amout of worker processes comparing to classical one-thread per connection.
If you want to read more about epoll or you want to look at some benchmarks, you can visit epoll Scalability Web Page at Sourceforge. Another interesting resources are:
- The C10K problem: a most known page about handling many connections and various I/O paradigms including epoll().
- libevent: high-level event-handling library ontop of the epoll. This page contains some information about performance tests of epoll.