最近在做内核空间和用户空间的双工通信,单播可以,多播一直遇到问题,还在纠结中...找了一些好帖子来看
【上】http://blog.chinaunix.net/uid-23069658-id-3400761.html
【中】http://blog.chinaunix.net/uid-23069658-id-3405954.html
【下】http://blog.chinaunix.net/uid-23069658-id-3409786.html
经过几天的尝试,终于弄明白了,现在过来写点总结性的文字。
Netlink简单来说是一种十分强大的Linux用户空间和内核空间相互传递数据的方式,相比于ioctl,proc文件系统以及其他系统调用,netlink实在是强大太多。既可以实现双工通信,还可以实现单播和多播,注意是用户到内核的多播和内核到用户的多播都可以实现。
单播的介绍网上相对较多,实现也比较多,就不多介绍了。
多播部分,分几点总结下:
1,组号和组掩码的区别
struct sockaddr_nl saddr;
nl_image_config.fd_nl = socket(AF_NETLINK, SOCK_RAW, NL_PORT_IMAGE);
memset(&saddr, 0, sizeof(saddr));
saddr.nl_family = AF_NETLINK;
saddr.nl_pid = 0;
saddr.nl_groups = netlink_group_mask(1); //here we use 1 as the group number
saddr.nl_pad = 0;
bind(nl_image_config.fd_nl, (struct sockaddr *)&saddr, sizeof(saddr));
上述代码中,saddr.nl_groups我们使用的是组号的掩码,而不是组号。以上代码,表明我们的socket可以支持组播(注意:是支持主播,单播也可以,下面会详细叙述),否则saddr.nl_groups应该设置为0(只支持单播),这里表示我们的组号1是实现多播的,我们也可以换成其他的组号。
static unsigned int netlink_group_mask(unsigned int group)
{
return group ? 1 << (group - 1) : 0;
}
以上是组号掩码的实现,大家可以清楚地看到两者的区别。
2,用户到内核的多播和内核到用户的多播
这两者有何区别,其实我们有时候不希望全部是多播,比如一些连接性的status指令,我们希望是用户和内核是单播的,以便建立内核进程和用户进程的一一对应;而一些控制信号我们希望是广播的,比如单内核进程+多个用户进程,可以将内核要发送的消息进行组播,让多个用户进程都能接受,其中一个作为消息的处理者,其他作为消息的旁观者。作者在此,也是实现的这种内核和用户进程一对多的通信方式。
static int send_image_msg_to_kernel(struct nl_msg_data nl_msg)
{
struct sockaddr_nl daddr;
struct msghdr msg;
struct nlmsghdr *nlhdr = NULL;
struct iovec iov;
memset(&daddr, 0, sizeof(daddr));
daddr.nl_family = AF_NETLINK;
daddr.nl_pid = 0;
daddr.nl_groups = 0; //here we use 0 to unicast
daddr.nl_pad = 0;
nlhdr = (struct nlmsghdr *)nl_image_config.nl_send_buf;
nlhdr->nlmsg_pid = getpid();
nlhdr->nlmsg_len = NLMSG_LENGTH(sizeof(nl_msg));
nlhdr->nlmsg_flags = 0;
memcpy(NLMSG_DATA(nlhdr), &nl_msg, sizeof(nl_msg));
memset(&msg, 0, sizeof(struct msghdr));
iov.iov_base = (void *)nlhdr;
iov.iov_len = nlhdr->nlmsg_len;
msg.msg_name = (void *)&daddr;
msg.msg_namelen = sizeof(daddr);
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
sendmsg(nl_image_config.fd_nl, &msg, 0);
return 0;
}
以上是用户层发送到内核的程序,daddr.nl_groups = 0; //here we use 0 to unicast这句表明我们要实现的是单播,其实这里没区别,因为内核就单一进程,但是当内核有多个接收者的时候还是有区别的。
static int nl_send_msg(struct iav_nl_obj *nl_obj, struct nl_msg_data *msg)
{
#define RETRY_TIMES (5)
#define TIMEOUT_JIFFY msecs_to_jiffies(1000)
struct sk_buff *skb = NULL;
struct nlmsghdr *nlhdr = NULL;
struct ambarella_iav *iav = nl_obj->iav;
struct iav_nl_request *nl_req = NULL;
void *data;
int retry = RETRY_TIMES;
int rval = 0;
int err;
if (msg->type == NL_MSG_TYPE_SESSION) {
if (msg->dir != NL_MSG_DIR_STATUS) {
iav_error("NETLINK ERR: IAV can only send session status to app!\n");
return -1;
}
} else if (msg->type == NL_MSG_TYPE_REQUEST) {
if (msg->dir != NL_MSG_DIR_CMD) {
iav_error("NETLINK ERR: IAV can only send request cmd to app!\n");
return -1;
}
} else {
iav_error("NETLINK ERR: Unrecognized IAV msg type to app!\n");
return -1;
}
skb = nlmsg_new(sizeof(*msg), GFP_KERNEL);
if (!skb) {
iav_error("NETLINK ERR: Function nlmsg_new failed!\n");
return -1;
}
nlhdr = __nlmsg_put(skb, nl_obj->nl_user_pid, 0, NLMSG_NOOP,
sizeof(*msg), 0);
data = NLMSG_DATA(nlhdr);
memcpy(data, (void *)msg, sizeof(*msg));
if (!(nl_obj->nl_groups) || (msg->type == NL_MSG_TYPE_SESSION && msg->dir == NL_MSG_DIR_STATUS)) {
iav_error("unicast!!\n");
/*From Kernel*/
NETLINK_CB(skb).portid = 0;
/*Multicast group number*/
NETLINK_CB(skb).dst_group = 0;
netlink_unicast(nl_obj->nl_sock, skb, nl_obj->nl_user_pid, 0);
} else {
iav_error("broadcast!!\n");
/*From Kernel*/
NETLINK_CB(skb).portid = 0;
/*Multicast group number*/
NETLINK_CB(skb).dst_group = NL_MULTICAST_GROUP;
/*Send message*/
err = netlink_broadcast(nl_obj->nl_sock, skb, 0, NL_MULTICAST_GROUP, 0);
if (err < 0) {
iav_error("error during broadcast!\n");
if (-3 == err)
iav_error("no such process!\n");
}
}
if (msg->type == NL_MSG_TYPE_REQUEST) {
nl_req = &nl_obj->nl_requests[msg->cmd];
mutex_unlock(&iav->iav_mutex);
// iav_debug("NETLINK DBG: Send request cmd %d to app %d.\n",
// msg->cmd, nl_obj->nl_user_pid);
iav_error("NETLINK DBG: Send request cmd %d to app %d.\n",
msg->cmd, msg->pid);
// wait for the response of the command
while (retry > 0) {
rval = wait_event_interruptible_timeout(nl_req->wq_request,
nl_req->condition, TIMEOUT_JIFFY) ;
if (rval > 0) {
break;
}
iav_debug("NETLINK DBG: receive ACK of request cmd %d from app %d error,"
" retry:%d\n", msg->cmd, nl_obj->nl_user_pid, retry);
--retry;
}
mutex_lock(&iav->iav_mutex);
if (retry <= 0) {
iav_error("NETLINK ERR: send request cmd %d to app %d timeout\n",
msg->cmd, nl_obj->nl_user_pid);
return -1;
}
} else if (msg->type == NL_MSG_TYPE_SESSION) {
iav_error("NETLINK DBG: Send session status %d to session cmd %d"
" of app %d type %d dir %d.\n", msg->status, msg->cmd, nl_obj->nl_user_pid, msg->type, msg->dir);
}
return 0;
}
以上是内核进程向用户进程发送消息的程序,楼主在此实现的是单播和多播共存的方式,有个判断,单播的时候比较简单,netlink_unicast(nl_obj->nl_sock, skb, nl_obj->nl_user_pid, 0),我们可以指定接收该消息的用户进程的PID,该PID由用户进程发送的消息指定,主要是用于用户和内核进程的握手连接。此外,我们也实现了多播的方式
err = netlink_broadcast(nl_obj->nl_sock, skb, 0, NL_MULTICAST_GROUP, 0);
if (err < 0) {
iav_error("error during broadcast!\n");
if (-3 == err)
iav_error("no such process!\n");
}
<span style="font-family: Arial, Helvetica, sans-serif;">NL_MULTICAST_GROUP指明了我们要组播的组号(注意是组号,这里也是1),另外有些遇到返回值-3的情况,有可能是因为用户空间的接受者进程已经退出,不存在此消息的接受者 ,楼主是这种情况。</span>
<span style="font-family: Arial, Helvetica, sans-serif;"></span><pre name="code" class="cpp">static int init_nl_obj_image(struct ambarella_iav *iav)
{
struct iav_nl_obj *nl_obj = &iav->nl_obj[NL_OBJ_IMAGE];
struct netlink_kernel_cfg cfg;
int ret = 0;
int i;
nl_obj->iav = iav;
nl_obj->nl_connected = 0;
nl_obj->nl_init = 0;
nl_obj->nl_port = NL_PORT_IMAGE;
nl_obj->nl_user_pid = -1;
nl_obj->nl_session_count = NL_SESS_CMD_NUM;
nl_obj->nl_request_count = NL_REQ_IMG_NUM;
nl_obj->nl_groups = NL_MULTICAST_GROUP; //decide unicast or broadast
nl_obj->nl_ref_count = 0;
INIT_LIST_HEAD(&netlink_pid_queue.list);
for (i = NL_REQ_IMG_FIRST; i < NL_REQ_IMG_LAST; ++i) {
nl_obj->nl_requests[i].request_id = i;
init_waitqueue_head(&nl_obj->nl_requests[i].wq_request);
}
cfg.groups = nl_obj->nl_groups;
cfg.input = nl_recv_msg_handler;
cfg.bind = NULL;
nl_obj->nl_sock = netlink_kernel_create(&init_net,
nl_obj->nl_port, &cfg);
nl_obj->nl_init = 1;
return ret;
}
我们在内核的init程序中有cfg.groups = nl_obj->nl_groups的语句,所以是组播。这里要提一下cfg.bind = NULL;这句,楼主最内核源码,发现netlink_kernel_create底下调用的时候,不会检查bind的空指针问题,所以最好还是自己主动加上。此外,netlink_kernel_create在不同的版本中使用的不一样,楼主的内核版本是3.7.10,应该还算比较新,网上很多代码都是2.6左右的版本,与现在的实现有很大差距。内核源码中有些会以宏来做区分,自己注意。
差不多就这么多,欢迎大家指正!