最近在学习socket编程,发现epoll模型还是一个不错的东东~所以,就拿来学习下,并将自己的学习总结如下。
简介
NAME
epoll - I/O event notification facilitySYNOPSIS
#include <sys/epoll.h>DESCRIPTION
epoll is a variant of poll(2) that can be used either as Edge or Level Triggered
interface and scales well to large numbers of watched fds. Three system calls are
provided to set up and control an epoll set: epoll_create(2), epoll_ctl(2),
epoll_wait(2).An epoll set is connected to a file descriptor created by epoll_create(2). Inter-
est for certain file descriptors is then registered via epoll_ctl(2). Finally,
the actual wait is started by epoll_wait(2).NOTES
The epoll event distribution interface is able to behave both as Edge Triggered (
ET ) and Level Triggered ( LT ). The difference between ET and LT event distribu-
tion mechanism can be described as follows. Suppose that this scenario happens :1 The file descriptor that represents the read side of a pipe ( RFD ) is
added inside the epoll device.2 Pipe writer writes 2Kb of data on the write side of the pipe.
3 A call to epoll_wait(2) is done that will return RFD as ready file descrip-
tor.4 The pipe reader reads 1Kb of data from RFD.
5 A call to epoll_wait(2) is done.
If the RFD file descriptor has been added to the epoll interface using the EPOLLET
flag, the call to epoll_wait(2) done in step 5 will probably hang because of the
available data still present in the file input buffers and the remote peer might
be expecting a response based on the data it already sent. The reason for this is
that Edge Triggered event distribution delivers events only when events happens on
the monitored file. So, in step 5 the caller might end up waiting for some data
that is already present inside the input buffer. In the above example, an event on
RFD will be generated because of the write done in 2 and the event is consumed in
3. Since the read operation done in 4 does not consume the whole buffer data, the
call to epoll_wait(2) done in step 5 might lock indefinitely. The epoll interface,
when used with the EPOLLET flag ( Edge Triggered ) should use non-blocking file
descriptors to avoid having a blocking read or write starve the task that is han-
dling multiple file descriptors. The suggested way to use epoll as an Edge Trig-
gered (EPOLLET) interface is below, and possible pitfalls to avoid follow.i with non-blocking file descriptors
ii by going to wait for an event only after read(2) or write(2) return
EAGAINOn the contrary, when used as a Level Triggered interface, epoll is by all means a
faster poll(2), and can be used wherever the latter is used since it shares the
same semantics. Since even with the Edge Triggered epoll multiple events can be
generated up on receipt of multiple chunks of data, the caller has the option to
specify the EPOLLONESHOT flag, to tell epoll to disable the associated file
descriptor after the receipt of an event with epoll_wait(2). When the EPOL-
LONESHOT flag is specified, it is caller responsibility to rearm the file descrip-
tor using epoll_ctl(2) with EPOLL_CTL_MOD.EXAMPLE FOR SUGGESTED USAGE
While the usage of epoll when employed like a Level Triggered interface does have
the same semantics of poll(2), an Edge Triggered usage requires more clarification
to avoid stalls in the application event loop. In this example, listener is a non-
blocking socket on which listen(2) has been called. The function do_use_fd() uses
the new ready file descriptor until EAGAIN is returned by either read(2) or
write(2). An event driven state machine application should, after having received
EAGAIN, record its current state so that at the next call to do_use_fd() it will
continue to read(2) or write(2) from where it stopped before.struct epoll_event ev, *events; for(;;) { nfds = epoll_wait(kdpfd, events, maxevents, -1); for(n = 0; n < nfds; ++n) { if(events[n].data.fd == listener) { client = accept(listener, (struct sockaddr *) &local, &addrlen); if(client < 0){ perror("accept"); continue; } setnonblocking(client); ev.events = EPOLLIN | EPOLLET; ev.data.fd = client; if (epoll_ctl(kdpfd, EPOLL_CTL_ADD, client, &ev) < 0) { fprintf(stderr, "epoll set insertion error: fd=%d\n", client); return -1; } } else do_use_fd(events[n].data.fd); } }
When used as an Edge triggered interface, for performance reasons, it is possible
to add the file descriptor inside the epoll interface ( EPOLL_CTL_ADD ) once by
specifying ( EPOLLIN|EPOLLOUT ). This allows you to avoid continuously switching
between EPOLLIN and EPOLLOUT calling epoll_ctl(2) with EPOLL_CTL_MOD
优点
支持一个进程打开大数目的socket描述符
IO效率不随FD数目增加而线性下降
内存共享
内核微调
工作模式
LT(level triggered)是缺省的工作方式,并且同时支持block和no-block socket.在这种做法中,内核告诉你一个文件描述符是否就绪了,然后你可以对这个就绪的fd进行IO操作。如果你不作任何操作,内核还是会继续通知你的,所以,这种模式编程出错误可能性要小一点。传统的select/poll都是这种模型的代表。
ET (edge-triggered)是高速工作方式,只支持no-block socket。在这种模式下,当描述符从未就绪变为就绪时,内核通过epoll告诉你。然后它会假设你知道文件描述符已经就绪,并且不会再为那个文件描述符发送更多的就绪通知,直到你做了某些操作导致那个文件描述符不再为就绪状态了(比如,你在发送,接收或者接收请求,或者发送接收的数据少于一定量时导致了一个EWOULDBLOCK 错误)。但是请注意,如果一直不对这个fd作IO操作(从而导致它再次变成未就绪),内核不会发送更多的通知(only once),不过在TCP协议中,ET模式的加速效用仍需要更多的benchmark确认。
ET和LT的区别就在这里体现,LT事件不会丢弃,而是只要读buffer里面有数据可以让用户读,则不断的通知你。而ET则只在事件发生之时通知。可以简单理解为LT是水平触发,而ET则为边缘触发。LT模式只要有事件未处理就会触发,而ET则只在高低电平变换时(即状态从1到0或者0到1)触发。
系统调用
epoll_create
NAME
epoll_create - open an epoll file descriptorSYNOPSIS
#include <sys/epoll.h>int epoll_create(int size)
DESCRIPTION
Open an epoll file descriptor by requesting the kernel allocate an event back-
ing store dimensioned for size descriptors. The size is not the maximum size of
the backing store but just a hint to the kernel about how to dimension internal
structures. The returned file descriptor will be used for all the subsequent
calls to the epoll interface. The file descriptor returned by epoll_create(2)
must be closed by using close(2).RETURN VALUE
When successful, epoll_create(2) returns a non-negative integer identifying the
descriptor. When an error occurs, epoll_create(2) returns -1 and errno is set
appropriately.ERRORS
EINVAL size is not positive.ENFILE The system limit on the total number of open files has been
reached.ENOMEM There was insufficient memory to create the kernel object.
epoll_wait
NAME
epoll_wait - wait for an I/O event on an epoll file descriptorSYNOPSIS
#include <sys/epoll.h>int epoll_wait(int epfd, struct epoll_event * events,
int maxevents, int timeout);DESCRIPTION
Wait for events on the epoll file descriptor epfd for a maximum time of timeout
milliseconds. The memory area pointed to by events will contain the events that
will be available for the caller. Up to maxevents are returned by
epoll_wait(2). The maxevents parameter must be greater than zero. Specifying a
timeout of -1 makes epoll_wait(2) wait indefinitely, while specifying a timeout
equal to zero makes epoll_wait(2) to return immediately even if no events are
available (return code equal to zero). The struct epoll_event is defined as :typedef union epoll_data { void *ptr; int fd; __uint32_t u32; __uint64_t u64; } epoll_data_t; struct epoll_event { __uint32_t events; /* Epoll events */ epoll_data_t data; /* User data variable */ };
The data of each returned structure will contain the same data the user set
with a epoll_ctl(2) (EPOLL_CTL_ADD,EPOLL_CTL_MOD) while the events member will
contain the returned event bit field.RETURN VALUE
When successful, epoll_wait(2) returns the number of file descriptors ready for
the requested I/O, or zero if no file descriptor became ready during the
requested timeout milliseconds. When an error occurs, epoll_wait(2) returns -1
and errno is set appropriately.ERRORS
EBADF epfd is not a valid file descriptor.EFAULT The memory area pointed to by events is not accessible with write per-
missions.EINTR The call was interrupted by a signal handler before any of the requested
events occurred or the timeout expired.EINVAL epfd is not an epoll file descriptor, or maxevents is less than or equal
to zero.
epoll_ctl
NAME
epoll_ctl - control interface for an epoll descriptorSYNOPSIS
#include <sys/epoll.h>int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event)
DESCRIPTION
Control an epoll descriptor, epfd, by requesting that the operation op be per-
formed on the target file descriptor, fd. The event describes the object linked
to the file descriptor fd. The struct epoll_event is defined as :typedef union epoll_data { void *ptr; int fd; __uint32_t u32; __uint64_t u64; } epoll_data_t; struct epoll_event { __uint32_t events; /* Epoll events */ epoll_data_t data; /* User data variable */ };
The events member is a bit set composed using the following available event types
:EPOLLIN
The associated file is available for read(2) operations.EPOLLOUT
The associated file is available for write(2) operations.EPOLLRDHUP
Stream socket peer closed connection, or shut down writing half of connec-
tion. (This flag is especially useful for writing simple code to detect
peer shutdown when using Edge Triggered monitoring.)EPOLLPRI
There is urgent data available for read(2) operations.EPOLLERR
Error condition happened on the associated file descriptor. epoll_wait(2)
will always wait for this event; it is not necessary to set it in events.EPOLLHUP
Hang up happened on the associated file descriptor. epoll_wait(2) will
always wait for this event; it is not necessary to set it in events.EPOLLET
Sets the Edge Triggered behaviour for the associated file descriptor. The
default behaviour for epoll is Level Triggered. See epoll(7) for more
detailed information about Edge and Level Triggered event distribution
architectures.EPOLLONESHOT (since kernel 2.6.2)
Sets the one-shot behaviour for the associated file descriptor. This means
that after an event is pulled out with epoll_wait(2) the associated file
descriptor is internally disabled and no other events will be reported by
the epoll interface. The user must call epoll_ctl(2) with EPOLL_CTL_MOD to
re-enable the file descriptor with a new event mask.The epoll interface supports all file descriptors that support poll(2). Valid
values for the op parameter are :EPOLL_CTL_ADD
Add the target file descriptor fd to the epoll descriptor epfd and
associate the event event with the internal file linked to fd.EPOLL_CTL_MOD
Change the event event associated with the target file descriptor
fd.EPOLL_CTL_DEL
Remove the target file descriptor fd from the epoll file descriptor,
epfd. The event is ignored and can be NULL (but see BUGS below).RETURN VALUE
When successful, epoll_ctl(2) returns zero. When an error occurs, epoll_ctl(2)
returns -1 and errno is set appropriately.ERRORS
EBADF epfd or fd is not a valid file descriptor.EEXIST op was EPOLL_CTL_ADD, and the supplied file descriptor fd is already in
epfd.EINVAL epfd is not an epoll file descriptor, or fd is the same as epfd, or the
requested operation op is not supported by this interface.ENOENT op was EPOLL_CTL_MOD or EPOLL_CTL_DEL, and fd is not in epfd.
ENOMEM There was insufficient memory to handle the requested op control operation.
EPERM The target file fd does not support epoll.
实例
server.c
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>
#include <sys/types.h>
#include <netinet/in.h>
#include <sys/socket.h>
#include <sys/wait.h>
#include <unistd.h>
#include <arpa/inet.h>
#include <openssl/ssl.h>
#include <openssl/err.h>
#include <fcntl.h>
#include <sys/epoll.h>
#include <sys/time.h>
#include <sys/resource.h>
#define MAXBUF 1024
#define MAXEPOLLSIZE 10000
#define MYPORT 5000
#define LISTENQ 10
/*
setnonblocking - 设置句柄为非阻塞方式
*/
int setnonblocking(int sockfd)
{
if (fcntl(sockfd, F_SETFL, fcntl(sockfd, F_GETFD, 0)|O_NONBLOCK) == -1)
{
return -1;
}
return 0;
}
/*
handle_message - 处理每个 socket 上的消息收发
*/
int handle_message(int new_fd)
{
char buf[MAXBUF + 1];
int len;
/* 开始处理每个新连接上的数据收发 */
bzero(buf, MAXBUF + 1);
/* 接收客户端的消息 */
len = recv(new_fd, buf, MAXBUF, 0);
if (len > 0)
{
printf("%d receive msg succeed:%s,total %d Byte\n",new_fd, buf, len);
}
else
{
if (len < 0)
printf("receive msg failed %d,error msg is %s\n", errno, strerror(errno));
close(new_fd);
return -1;
}
/* 处理每个新连接上的数据收发结束 */
return len;
}
/************关于本文档********************************************
*filename: epoll-server.c
*purpose: 演示epoll处理海量socket连接的方法
*******************************************************************
**/
int main(int argc, char **argv)
{
int listener, new_fd, kdpfd, nfds, n, ret, curfds;
socklen_t len;
struct sockaddr_in my_addr, their_addr;
struct epoll_event ev;
struct epoll_event events[MAXEPOLLSIZE];
struct rlimit rt;
/* 设置每个进程允许打开的最大文件数 */
rt.rlim_max = rt.rlim_cur = MAXEPOLLSIZE;
if (setrlimit(RLIMIT_NOFILE, &rt) == -1)
{
perror("setrlimit");
exit(1);
}
else
{
printf("set system resource succeed. \n");
}
/* 开启 socket 监听 */
if ((listener = socket(AF_INET, SOCK_STREAM, 0)) == -1)
{
perror("socket");
exit(1);
}
else
{
printf("socket create succeed.\n");
}
setnonblocking(listener);
bzero(&my_addr, sizeof(my_addr));
my_addr.sin_family = AF_INET;
my_addr.sin_port = htons(MYPORT);
my_addr.sin_addr.s_addr = INADDR_ANY;
if (bind(listener, (struct sockaddr *) &my_addr, sizeof(struct sockaddr)) == -1)
{
perror("bind");
exit(1);
}
else
{
printf("IP addr and port bind succeed\n");
}
if (listen(listener, LISTENQ) == -1)
{
perror("listen");
exit(1);
}
else
{
printf("start service succeed. \n");
}
len = sizeof(struct sockaddr_in);
ev.events = EPOLLIN | EPOLLET;
ev.data.fd = listener;
if (epoll_ctl(kdpfd, EPOLL_CTL_ADD, listener, &ev) < 0)
{
fprintf(stderr, "epoll set insertion error: fd=%d\n", listener);
return -1;
}
else
{
printf("listen fd socket put into epoll succeed.\n");
}
curfds = 1;
while (1)
{
/* 等待有事件发生 */
nfds = epoll_wait(kdpfd, events, curfds, -1);
if (nfds == -1)
{
perror("epoll_wait");
break;
}
/* 处理所有事件 */
for (n = 0; n < nfds; ++n)
{
if (events[n].data.fd == listener)
{
new_fd = accept(listener, (struct sockaddr *) &their_addr,&len);
if (new_fd < 0)
{
perror("accept");
continue;
}
else
{
printf("conn come from %s:%d, allocated socket is:%d\n",
inet_ntoa(their_addr.sin_addr), ntohs(their_addr.sin_port), new_fd);
}
setnonblocking(new_fd);
ev.events = EPOLLIN | EPOLLET;
ev.data.fd = new_fd;
if (epoll_ctl(kdpfd, EPOLL_CTL_ADD, new_fd, &ev) < 0)
{
fprintf(stderr, "put socket '%d' into epoll failed .%s\n",
new_fd, strerror(errno));
return -1;
}
curfds++;
}
else
{
ret = handle_message(events[n].data.fd);
if (ret < 1 && errno != 11)
{
epoll_ctl(kdpfd, EPOLL_CTL_DEL, events[n].data.fd,&ev);
curfds--;
}
}
}
}
close(listener);
return 0;
}
client.c
#include <stdio.h>
#include <stdlib.h>
#include <sys/un.h>
#include <netdb.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#define BUFLEN 1024
int main(int argc,char *argv[])
{
int connect_fd;
int ret;
char snd_buf[BUFLEN];
int i;
int port;
int len;
static struct sockaddr_in srv_addr;
if(argc!=3){
printf("Usage: %s server_ip_address port\n",argv[0]);
return 1;
}
port=atoi(argv[2]);
connect_fd=socket(AF_INET,SOCK_STREAM,0);
if(connect_fd<0){
perror("cannot create communication socket");
return 1;
}
memset(&srv_addr,0,sizeof(srv_addr));
srv_addr.sin_family=AF_INET;
srv_addr.sin_addr.s_addr=inet_addr(argv[1]);
srv_addr.sin_port=htons(port);
ret=connect(connect_fd,(struct sockaddr*)&srv_addr,sizeof(srv_addr));
if(ret==-1){
perror("cannot connect to the server");
close(connect_fd);
return 1;
}
memset(snd_buf,0,BUFLEN);
while(1){
write(STDOUT_FILENO,"input message:",14);
bzero(snd_buf, BUFLEN);
len=read(STDIN_FILENO,snd_buf,BUFLEN);
if(snd_buf[0]=='@')
break;
if(len>0){
send(connect_fd, snd_buf, len, 0);
bzero(snd_buf, BUFLEN);
len=recv(connect_fd,snd_buf,BUFLEN,0);
if(len>0)
printf("Message from server: %s\n",snd_buf);
}
}
close(connect_fd);
return 0;
}
参考:
2.http://blog.youkuaiyun.com/haoahua/article/details/2037704
3.Linux man手册