tcp-ip栈代码的详细分析:http://blog.youkuaiyun.com/cz_hyf/archive/2006/02/19/602802.aspx
NetFilter架构流程
一个数据包按照如下图所示的过程通过Netfilter系统:
--->[1]--->[ROUTE]--->[3]--->[4]--->
| ^
| |
local |
| [ROUTE]
v |
[2] [5]
| ^
| |
v |
从图中可以看到IPv4一共有5个钩子函数,分别为:
1 NF_IP_PRE_ROUTING
2 NF_IP_LOCAL_IN
3 NF_IP_FORWARD
4 NF_IP_POST_ROUTING
5 NF_IP_LOCAL_OUT
数据报从左边进入系统,进行IP校验以后,数据报经过第一个钩子函数NF_IP
_PRE_ROUTING[1]进行处理;然后就进入路由代码,其决定该数据包是需要转
发还是发给本机的;若该数据包是发被本机的,则该数据经过钩子函数NF_IP
_LOCAL_IN[2]处理以后然后传递给上层协议;若该数据包应该被转发则它被NF
_IP_FORWARD[3]处理;经过转发的数据报经过最后一个钩子函数NF_IP_POST_
ROUTING[4]处理以后,再传输到网络上。
本地产生的数据经过钩子函数NF_IP_LOCAL_OUT [5]处理可以后,进行路由选
择处理,然后经过NF_IP_POST_ROUTING[4]处理以后发送到网络上。
内核模块可以对一个或多个这样的钩子函数进行注册挂接,并且在数据报经过
这些钩子函数时被调用,从而模块可以修改这些数据报,并向netfilter返回
如下值:
NF_ACCEPT 继续正常传输数据报
NF_DROP 丢弃该数据报,不再传输
NF_STOLEN 模块接管该数据报,不要继续传输该数据报
NF_QUEUE 对该数据报进行排队(通常用于将数据报给用户空间的进程进行处理)
NF_REPEAT 再次调用该钩子函数
内核模块可以注册一个新的规则表(table),并要求数据报流经指定的规则表。
这种数据报选择用于实现数据报过滤(filter表),网络地址转换(Nat表)及数
据报处理(mangle表)。
包过滤
filter表格不会对数据报进行修改,而只对数据报进行过滤。iptables优于ipchains
的一个方面就是它更为小巧和快速。它是通过钩子函数
NF_IP_LOCAL_IN, NF_IP_FORWARD及NF_IP_LOCAL_OUT接入netfilter框架的。
因此对于任何一个数据报只有一个地方对其进行过滤。
NAT
NAT表格监听三个Netfilter钩子函数:NF_IP_PRE_ROUTING、
NF_IP_POST_ROUTING及NF_IP_LOCAL_OUT。 NF_IP_PRE_ROUTING实现
对需要转发的数据报的源地址进行地址转换而NF_IP_POST_ROUTING则对需要转
发的数据包的目的地址进行地址转换。对于本地数据报的目的地址的转换则由
NF_IP_LOCAL_OUT来实现。
NAT表格不同于filter表格,因为只有新连接的第一个数据报将遍历表格,而
随后的数据报将根据第一个数据报的结果进行同样的转换处理。
NAT表格被用在源NAT,目的NAT,伪装(其是源NAT的一个特例)及透明代理(其是
目的NAT的一个特例)。
数据报处理(Packet mangling)
mangle表格在NF_IP_PRE_ROUTING和NF_IP_LOCAL_OUT钩子中进行注册。使用mangle
表,可以实现对数据报的修改或给数据报附上一些带外数据。当前mangle表支
持修改TOS位及设置skb的nfmard字段。
参考链接:
http://www.sudu.cn/info/html/edu/20080425/301334.html
http://www.sudu.cn/info/html/edu/20060102/298005.html
http://leejingui.com/blog/2010/05/netfilter/
ipv4中Netfilter钩子调用函数
1.1 ip_input.c ip_rcv()函数
以接收到的报文为例,类似的还有ip_forward(ip_forward.c)和ip_output(ip_output.c)
int ip_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct net_device *orig_dev)
{
struct iphdr *iph; //定义一个ip报文的数据报头
u32 len;
if (skb->pkt_type == PACKET_OTHERHOST)
goto drop; //数据包不是发给我们的
IP_INC_STATS_BH(IPSTATS_MIB_INRECEIVES); //收到数据包统计量加1
if ((skb = skb_share_check(skb, GFP_ATOMIC)) == NULL)
{
/* 如果数据报是共享的,则复制一个出来,此时复制而出的已经和socket脱离了关系 */
IP_INC_STATS_BH(IPSTATS_MIB_INDISCARDS);
goto out;
}
if (!pskb_may_pull(skb, sizeof(struct iphdr)))
goto inhdr_error; //对数据报的头长度进行检查,
iph = skb->nh.iph; //取得数据报的头部位置
if (iph->ihl < 5 || iph->version != 4) //版本号或者头长度不对,
goto inhdr_error; //头长度是以4字节为单位的,所以5表示的是20字节
if (!pskb_may_pull(skb, iph->ihl*4))
goto inhdr_error;
if (unlikely(ip_fast_csum((u8 *)iph, iph->ihl)))
goto inhdr_error; //检查报文的检验和字段
len = ntohs(iph->tot_len);
if (skb->len < len || len < (iph->ihl*4))
goto inhdr_error; //整个报文长度不可能比报头长度小
if (pskb_trim_rcsum(skb, len))
{ //对数据报进行裁减,这样可以分片发送过来的数据报不会有重复数据
IP_INC_STATS_BH(IPSTATS_MIB_INDISCARDS);
goto drop;
}
return NF_HOOK(PF_INET, NF_IP_PRE_ROUTING, skb, dev, NULL,
ip_rcv_finish); //通过回调函数调用ip_rcv_finish
inhdr_error:
IP_INC_STATS_BH(IPSTATS_MIB_INHDRERRORS);
drop:
kfree_skb(skb); //丢掉数据报
out:
return NET_RX_DROP;
}
文件ip_input.c
int ip_local_deliver(struct sk_buff *skb)
{
/*
* Reassemble IP fragments.
*/
if (skb->nh.iph->frag_off & htons(IP_MF|IP_OFFSET)) {
skb = ip_defrag(skb, IP_DEFRAG_LOCAL_DELIVER);
if (!skb)
return 0;
}
return NF_HOOK(PF_INET, NF_IP_LOCAL_IN, skb, skb->dev, NULL,
ip_local_deliver_finish);
}
文件ip_ouyput.c:
int ip_queue_xmit(struct sk_buff *skb, int ipfragok)
{
struct sock *sk = skb->sk;
struct inet_sock *inet = inet_sk(sk);
struct ip_options *opt = inet->opt;
struct rtable *rt;
struct iphdr *iph;
/* Skip all of this if the packet is already routed,
* f.e. by something like SCTP.
*/
rt = (struct rtable *) skb->dst;
if (rt != NULL)
goto packet_routed;
/* Make sure we can route this packet. */
rt = (struct rtable *)__sk_dst_check(sk, 0);
if (rt == NULL) {
u32 daddr;
/* Use correct destination address if we have options. */
daddr = inet->daddr;
if(opt && opt->srr)
daddr = opt->faddr;
{
struct flowi fl = { .oif = sk->sk_bound_dev_if,
.nl_u = { .ip4_u =
{ .daddr = daddr,
.saddr = inet->saddr,
.tos = RT_CONN_FLAGS(sk) } },
.proto = sk->sk_protocol,
.uli_u = { .ports =
{ .sport = inet->sport,
.dport = inet->dport } } };
/* If this fails, retransmit mechanism of transport layer will
* keep trying until route appears or the connection times
* itself out.
*/
if (ip_route_output_flow(&rt, &fl, sk, 0))
goto no_route;
}
sk_setup_caps(sk, &rt->u.dst);
}
skb->dst = dst_clone(&rt->u.dst);
packet_routed:
if (opt && opt->is_strictroute && rt->rt_dst != rt->rt_gateway)
goto no_route;
/* OK, we know where to send it, allocate and build IP header. */
iph = (struct iphdr *) skb_push(skb, sizeof(struct iphdr) + (opt ? opt->optlen : 0));
*((__u16 *)iph) = htons((4 << 12) | (5 << 8) | (inet->tos & 0xff));
iph->tot_len = htons(skb->len);
if (ip_dont_fragment(sk, &rt->u.dst) && !ipfragok)
iph->frag_off = htons(IP_DF);
else
iph->frag_off = 0;
iph->ttl = ip_select_ttl(inet, &rt->u.dst);
iph->protocol = sk->sk_protocol;
iph->saddr = rt->rt_src;
iph->daddr = rt->rt_dst;
skb->nh.iph = iph;
/* Transport layer set skb->h.foo itself. */
if (opt && opt->optlen) {
iph->ihl += opt->optlen >> 2;
ip_options_build(skb, opt, inet->daddr, rt, 0);
}
ip_select_ident_more(iph, &rt->u.dst, sk, skb_shinfo(skb)->tso_segs);
/* Add an IP checksum. */
ip_send_check(iph);
skb->priority = sk->sk_priority;
return NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb, NULL, rt->u.dst.dev,
dst_output);
no_route:
IP_INC_STATS(IPSTATS_MIB_OUTNOROUTES);
kfree_skb(skb);
return -EHOSTUNREACH;
文件ip_forward.c:
int ip_forward(struct sk_buff *skb)
{
struct iphdr *iph; /* Our header */
struct rtable *rt; /* Route we use */
struct ip_options * opt = &(IPCB(skb)->opt);
if (!xfrm4_policy_check(NULL, XFRM_POLICY_FWD, skb))
goto drop;
if (IPCB(skb)->opt.router_alert && ip_call_ra_chain(skb))
return NET_RX_SUCCESS;
if (skb->pkt_type != PACKET_HOST)
goto drop;
skb->ip_summed = CHECKSUM_NONE;
/*
* According to the RFC, we must first decrease the TTL field. If
* that reaches zero, we must reply an ICMP control message telling
* that the packet's lifetime expired.
*/
if (skb->nh.iph->ttl <= 1)
goto too_many_hops;
if (!xfrm4_route_forward(skb))
goto drop;
rt = (struct rtable*)skb->dst;
if (opt->is_strictroute && rt->rt_dst != rt->rt_gateway)
goto sr_failed;
/* We are about to mangle packet. Copy it! */
if (skb_cow(skb, LL_RESERVED_SPACE(rt->u.dst.dev)+rt->u.dst.header_len))
goto drop;
iph = skb->nh.iph;
/* Decrease ttl after skb cow done */
ip_decrease_ttl(iph);
/*
* We now generate an ICMP HOST REDIRECT giving the route
* we calculated.
*/
if (rt->rt_flags&RTCF_DOREDIRECT && !opt->srr)
ip_rt_send_redirect(skb);
skb->priority = rt_tos2priority(iph->tos);
return NF_HOOK(PF_INET, NF_IP_FORWARD, skb, skb->dev, rt->u.dst.dev,
ip_forward_finish);
sr_failed:
/*
* Strict routing permits no gatewaying
*/
icmp_send(skb, ICMP_DEST_UNREACH, ICMP_SR_FAILED, 0);
goto drop;
too_many_hops:
/* Tell the sender its packet died... */
icmp_send(skb, ICMP_TIME_EXCEEDED, ICMP_EXC_TTL, 0);
drop:
kfree_skb(skb);
return NET_RX_DROP;
}
文件ip_output.c:
static inline int ip_finish_output(struct sk_buff *skb)
{
struct net_device *dev = skb->dst->dev;
skb->dev = dev;
skb->protocol = htons(ETH_P_IP);
return NF_HOOK(PF_INET, NF_IP_POST_ROUTING, skb, NULL, dev,
ip_finish_output2);
}
1.2 include/linux/netfilter.h NF_HOOK宏
#ifdef CONFIG_NETFILTER_DEBUG
#define NF_HOOK(pf, hook, skb, indev, outdev, okfn) /
nf_hook_slow((pf), (hook), (skb), (indev), (outdev), (okfn), INT_MIN)
#define NF_HOOK_THRESH nf_hook_slow
#else
#define NF_HOOK(pf, hook, skb, indev, outdev, okfn) /
(list_empty(&nf_hooks[(pf)][(hook)]) /
? (okfn)(skb) /
: nf_hook_slow((pf), (hook), (skb), (indev), (outdev), (okfn), INT_MIN))
#define NF_HOOK_THRESH(pf, hook, skb, indev, outdev, okfn, thresh) /
(list_empty(&nf_hooks[(pf)][(hook)]) /
? (okfn)(skb) /
: nf_hook_slow((pf), (hook), (skb), (indev), (outdev), (okfn), (thresh)))
#endif
/* 如果nf_hooks[PF_INET][NF_IP_FORWARD]所指向的链表为空(即该钩子上没有挂处理函数),则直接调用okfn;否则,则调用net/core/netfilter.c::nf_hook_slow()转入Netfilter的处理。 */
1.3 net/core/netfilter.c nf_kook_slow()函数
int nf_hook_slow(int pf, unsigned int hook, struct sk_buff **pskb,
struct net_device *indev,
struct net_device *outdev,
int (*okfn)(struct sk_buff *),
int hook_thresh)
{
struct list_head *elem;
unsigned int verdict;
int ret = 0;
rcu_read_lock();
/*取得对应的链表首部*/
elem = &nf_hooks[pf][hook];
next_hook:
/*调用对应的钩子函数*/
verdict = nf_iterate(&nf_hooks[pf][hook], pskb, hook, indev,
outdev, &elem, okfn, hook_thresh);
/*判断返回值,做相应的处理*/
if (verdict == NF_ACCEPT || verdict == NF_STOP) {
ret = 1; /*前面提到过,返回1,则表示装继续调用okfn函数指针*/
goto unlock;
} else if (verdict == NF_DROP) {
kfree_skb(*pskb); /*删除数据包,需要释放skb*/
ret = -EPERM;
} else if (verdict == NF_QUEUE) {
NFDEBUG("nf_hook: Verdict = QUEUE./n");
if (!nf_queue(*pskb, elem, pf, hook, indev, outdev, okfn))
goto next_hook;
}
unlock:
rcu_read_unlock();
return ret;
}
1.4 net/core/netfilter.c nf_iterate()函数
static unsigned int nf_iterate(struct list_head *head,
struct sk_buff **skb,
int hook,
const struct net_device *indev,
const struct net_device *outdev,
struct list_head **i,
int (*okfn)(struct sk_buff *),
int hook_thresh)
{
/*
* The caller must not block between calls to this
* function because of risk of continuing from deleted element.
*/
/* 依次调用指定hook点下的所有nf_hook_ops->(*hook)函数,这些nf_hook_ops里有filter表注册的,有mangle表注册的,等等。
list_for_each_continue_rcu函数是一个for循环的宏,当调用结点中的hook函数后,根据返回值进行相应处理。如果hook函数的返回值是NF_QUEUE,NF_STOLEN,NF_DROP时,函数返回该值;如果返回值是NF_REPEAT时,则跳到前一个结点继续处理;如果是其他值,由下一个结点继续处理。如果整条链表处理完毕,返回值不是上面四个值,则返回NF_ACCEPT。*/
list_for_each_continue_rcu(*i, head) {
struct nf_hook_ops *elem = (struct nf_hook_ops *)*i;
if (hook_thresh > elem->priority)
continue;
switch (elem->hook(hook, skb, indev, outdev, okfn)) {
case NF_QUEUE:
return NF_QUEUE;
case NF_STOLEN:
return NF_STOLEN;
case NF_DROP:
return NF_DROP;
case NF_REPEAT:
*i = (*i)->prev;
break;
}
}
return NF_ACCEPT;
}
Linux netfilter源码分析: http://alexanderlaw.blog.hexun.com/8960896_d.html
理解Netfilter:http://leejingui.com/blog/2010/05/netfilter/
Netfilter源码分析:http://linux.chinaunix.net/bbs/viewthread.php?tid=670248
Netfilter/IPTables分析:http://blog.chinaunix.net/u/24896/showart_211278.html
Kernel Packet Traveling Diagram(图):http://www.cublog.cn/u/311/showart.php?id=70642
netfilter示例程序:
1.丢弃所有到达的数据包的Netfilter hook函数的示例代码:http://edu.codepub.com/2009/1027/16942_2.php
示例代码1 : Netfilter hook的注册
/*
* 安装一个丢弃所有到达的数据包的Netfilter hook函数的示例代码
*/
#define __KERNEL__
#define MODULE
#include <linux/module.h> #include <linux/kernel.h>
#include <linux/netfilter.h>
#include <linux/netfilter_ipv4.h>
/* 用于注册我们的函数的数据结构 */
static struct nf_hook_ops nfho;
/* 注册的hook函数的实现 */
unsigned int hook_func(unsigned int hooknum,
struct sk_buff **skb,
const struct net_device *in,
const struct net_device *out,
int (*okfn)(struct sk_buff *))
{
return NF_DROP; /* 丢弃所有的数据包 */
}
/* 初始化程序 */
int init_module()
{
/* 填充我们的hook数据结构 */
nfho.hook = hook_func; /* 处理函数 */
nfho.hooknum = NF_IP_PRE_ROUTING; /* 使用IPv4的第一个hook */
nfho.pf = PF_INET;
nfho.priority = NF_IP_PRI_FIRST; /* 让我们的函数首先执行 */
nf_register_hook(&nfho);
return 0;
}
/* 清除程序 */
void cleanup_module()
{
nf_unregister_hook(&nfho);
}
2.提取数据包源地址和网络协议Netfilter hook 函数代码
#define __KERNEL__
#define MODULE
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/netfilter.h>
#include <linux/skbuff.h>
#include <linux/ip.h>
#include <linux/netdevice.h>
#include <linux/if_ether.h>
#include <linux/if_packet.h>
#include <net/tcp.h>
#include <linux/netfilter_ipv4.h>
static struct nf_hook_ops nfho;
unsigned int hook_func(unsigned int hooknum,
struct sk_buff **skb,
const struct net_device *in,
const struct net_device *out,
int (*okfn)(struct sk_buff *))
{
struct sk_buff *sb = *skb;
unsigned char src_ip[4];
*(unsigned int *)src_ip = sb->nh.iph->saddr;
printk("A packet from:%d.%d.%d.%d Detected!",
src_ip[0],src_ip[1],src_ip[2],src_ip[3]);
switch(sb->nh.iph->protocol)
{
case IPPROTO_TCP:
printk("It's a TCP PACKET/n");break;
case IPPROTO_ICMP:
printk("It's a ICMP PACKET/n");break;
case IPPROTO_UDP:
printk("It's a UDP PACKET/n");break;
}
return NF_ACCEPT;
}
int init_module()
{
nfho.hook = hook_func;
nfho.hooknum = NF_IP_PRE_ROUTING;
nfho.pf = PF_INET;
nfho.priority = NF_IP_PRI_FIRST;
nf_register_hook(&nfho);
return 0;
}
void cleanup_module()
{
nf_unregister_hook(&nfho);
}
这实际上是对前面几篇文章的几个小程序的组合,实际上就是对sk_buff 结构体的的两个元素进行了检测,就得到了源地址和协议的信息。上面的这条语句对于那些C不是很熟悉的人可能吃力了一点:
*(unsigned int *)src_ip = sb->nh.iph->saddr;
我稍微的解释一下,网络的源地址是4个子节的int,因此我定义了一个4个子节的数组src_ip,从而每一个子节里面就存储的点分十进制的一个数,为了一次完成赋值,我把src_ip 转成unsigned int指针,就可以一次4个字节一起访问了。
下面是这个程序的测试结果:
A packet from:210.43.107.130 Detected!It's a TCP PACKET
A packet from:210.43.107.130 Detected!It's a TCP PACKET
A packet from:210.43.107.130 Detected!It's a TCP PACKET
A packet from:210.43.107.130 Detected!It's a TCP PACKET
A packet from:210.43.107.130 Detected!It's a TCP PACKET
A packet from:210.43.107.130 Detected!It's a TCP PACKET
A packet from:210.43.107.130 Detected!It's a TCP PACKET
A packet from:210.43.106.210 Detected!It's a UDP PACKET
A packet from:210.43.107.130 Detected!It's a TCP PACKET
A packet from:210.43.107.8 Detected!It's a UDP PACKET
A packet from:210.43.106.214 Detected!It's a UDP PACKET
netflter编程参考:
Hacking the Linux Kernel Network Stack(译本):http://linux.chinaunix.net/bbs/thread-758787-1-91.html
2.6上简化了的lwfw(light weigh firewall):http://linux.chinaunix.net/bbs/viewthread.php?tid=978607
Powered by Zoundry Raven