openVswitch 2.10.0 (OVS)源码分析内核flow子系统

最新推荐文章于 2025-02-17 22:29:13 发布

原创最新推荐文章于 2025-02-17 22:29:13 发布 · 1.2k 阅读

6 ·

CC 4.0 BY-SA版权

openvswitch 专栏收录该内容

6 篇文章

订阅专栏

本文深入探讨OVS（Open vSwitch）中的流表机制，包括流表结构、流匹配流程及流创建与查找的详细步骤。介绍了OVS如何通过内核模块datapath处理和转发报文，以及在FastPath和SlowPath中的工作原理。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

OVS中, 内核模块datapath负责报文的处理和转发, 当它从一个接收端口(vport)收到报文后, 会提取报文中的字段, 查询流表(flow table)进行流匹配, 如果与其中一条flow匹配成功, 则执行flow中规定的动作(action), 如从另外某个vport转发, 这个过程如上面的Fast Path所示; 如果没有匹配上任何一条flow, 则将报文上送到用户空间, 如上图中的Slow Path所示. 本文关注Fast Path中flow相关操作

数据结构

flow相关的数据结构如下(只列出了其中比较重要的字段),看代码的过程中可能需要来回查看这张图

flow_table 流表结构, 每个datapath都有一个流表
table_instance 流表实例, 其中的 buckets 用来存放具体的 flow 条目,存储方式参见FlexArray
sw_flow flow条目, 其中 key 表示报文的特征, 在进行匹配时, 便是从收到的报文中提取 key , 与flow 条目的 key 进行比较
sw_flow_key 报文特征. 提取报文特征时,会提取每一层的特征.
mask_array 流表掩码集合. 老版本OVS只支持exact flow, 即报文特征必须和flow中描述完全相同才算匹配，而在较新的版本中，支持wildcarded flow. 可以为 flow中的特征添加掩码。最常见的例子，可以设置 flow条目中源 IP 和 IP 掩码，只要进行匹配的报文 IP 在掩码作用后的网段内，就认为是通过匹配的。
sw_flow_mask 掩码条目。其中 refcount 表明有多少个 flow 正在关联它。
sw_flow_match 在匹配过程中使用的结构

创建 flow

用户使用 ovs-dpctl add-flow [DP] FLOW ACTIONS命令可以创建新的 flow ，内核响应该命令的函数是ovs_flow_cmd_new

static int ovs_flow_cmd_new(struct sk_buff *skb, struct genl_info *info)
{
   struct sw_flow_mask mask;
   struct sw_flow_key key;
   struct sw_flow_match match; 
   /*  创建新的flow条目 */
   struct sw_flow* new_flow = ovs_flow_alloc();  
   
   /* 创建一个match结构，并将其与空的key和mask关联起来 */
   ovs_match_init(&match, &key, &mask);  

   /* 解析用户下发的命令，用户的命令以netlink attribute的形式存放, 有key和mask两条属性序列 */
   error = ovs_nla_get_match(net, &match, a[OVS_FLOW_ATTR_KEY], a[OVS_FLOW_ATTR_MASK], log);

在 ovs_nla_get_match中, 逐个解析attr,填充sw_flow_match 结构

int ovs_nla_get_match(struct net *net, struct sw_flow_match *match,  // 输出参数
		      const struct nlattr *nla_key, // OVS_KEY_ATTR_* attribute sequence
		      const struct nlattr *nla_mask,// OVS_KEY_ATTR_* attribute sequence
		      bool log) 
{
	const struct nlattr *a[OVS_KEY_ATTR_MAX + 1];
	u64 key_attrs = 0;     /* KEY属性BITMAP */
	u64 mask_attrs = 0;    /* MASK属性BITMAP */ 

    /* 解析nla_key, 将内层attr放到属性数组a, 位图放到key_attrs */
	err = parse_flow_nlattrs(nla_key, a, &key_attrs, log); 

	/* 利用位图key_attrs, 属性数组a, 填入match->key */
	err = ovs_key_from_nlattrs(net, match, key_attrs, a, false, log); 

	/* 解析nla_mask, 将内层attr放到a, 位图放到mask_attrs */
	err = parse_flow_mask_nlattrs(nla_mask, a, &mask_attrs, log); 
	
	/* 利用位图mask_attrs, 属性数组a, 填入match->mask.key */
    err = ovs_key_from_nlattrs(net, match, mask_attrs, a, true, log); 
}

接续ovs_flow_cmd_new

static int ovs_flow_cmd_new(struct sk_buff *skb, struct genl_info *info)
{ 
   /* 接上面 */
 
   /* 将key & mask 拷贝到 new_flow->key 
      eg. 192.168.1.101 & 255.255.255.0 = 192.168.1.0
    */
   ovs_flow_mask_key(&new_flow->key, &key, true, &mask); 
   
   dp = get_dp(net, ovs_header->dp_ifindex); 
   
   /* 将新创建的flow插入dp的流表 */
   error = ovs_flow_tbl_insert(&dp->table, new_flow, &mask); 
  
    /* 组装netlink消息向用户态回复 */
}

进入ovs_flow_tbl_insert

int ovs_flow_tbl_insert(struct flow_table *table, struct sw_flow *flow,
			const struct sw_flow_mask *mask)
{
    /* 插入到mask表 */
	flow_mask_insert(table, flow, mask);

    /* 插入到 */
	flow_key_insert(table, flow);
	
	return 0;
}

flow_mask_insert中, 会为mask创建新的条目,插入到table->mask_array列表里,如果空间不够,还会动态扩大列表, 如果列表中已有相同的条目了(即多个flow使用的mask相同), 则增加该条目的引用计数.

flow_key_insert中,会计算flow->key的hash值,根据这个hash值, 插入到 table->ti->bucket的位置

查找 flow

datapath的一个vport收到包后在进行报文处理

ovs_vport_receive => ovs_dp_process_packet => ovs_flow_tbl_lookup_stats

/*
  @tbl 流表
  @key 从skb报文中提取出的报文流特征
  @skb_hash 报文的hash值
  @n_mask_hit 输出参数, 表示尝试匹配掩码的次数
 */
struct sw_flow *ovs_flow_tbl_lookup_stats(struct flow_table *tbl,
					  const struct sw_flow_key *key,
					  u32 skb_hash,
					  u32 *n_mask_hit) 
{
    struct mask_array *ma = rcu_dereference(tbl->mask_array);
    struct table_instance *ti = rcu_dereference(tbl->ti);
    struct sw_flow *flow;
    struct mask_cache_entry *entries, *ce;
    u32 hash;
    int seg;
    
    *n_mask_hit = 0
    if (unlikely(!skb_hash)) {
        /* 如果没有计算出报文hash,则进行完整的流表查询 */
        u32 mask_index = 0;
        return flow_lookup(tbl, ti, ma, key, n_mask_hit, &mask_index)
    }
    
    /* 先尝试在mask cache进行查找 */
    ce = NULL;
    hash = skb_hash;
    entries = tbl->mask->cache;
    for (seg = 0; seg < MC_HASH_SEGS, seg++) {
        int index = hash & (MC_HASH_ENTRIES - 1);
        struct mask_cache_entry *e;
        e = &entry[index];
        if (e->skb_hash == skb_hash) {
             /* 如果entry e上记录的skb_hash与报文的hash一致 */
             flow = flow_lookup(tbl, ti, ma, key, n_mask_hit, &e->mask_index);
             if (!flow)
                 e->skb_hash = 0;
                 return flow;
        }
        
        if (!ce || e->skb_hash < ce->skb_hash)
            ce = e;  /* A better replecement cache candidata. */
        hash >>= MC_HASH_SHIFT;
    } 
   
    /* 若没有找到 则进行完整的查找 */
    flow = flow_lookup(tbl, ti, ma, key, n_mask_hit, &ce->mask_index);
    if (flow)
        ce->skb_hash = skb_hash;  /* 查询到了flow, 因此更新cache */
    return flow;
}

可以看到, 流表的查询最终都是调用flow_lookup, 略有不同的是最后一个参数,这个参数表示查询的mask的起始点, 前面创建flow的时候可以看到, 新建flow的mask会记录在mask_array, 它里面包含所有的flow使用的mask条目. 而这里的cache记录了最近查询到的mask记录

struct mask_cache_entry {
   u32 skb_hash;     /* 收到报文的hash */    
   u32 mask_index;   /* 匹配到flow使用的mask索引 */
}

cache条目的数据结构如上, skb_hash为收到报文计算出的hash, 而mask_index为该报文匹配到的flow使用的mask在mask_array中的位置

回到flow_lookup

static struct sw_flow *flow_lookup(struct flow_table *tbl,
				   struct table_instance *ti,
				   const struct mask_array *ma,
				   const struct sw_flow_key *key,
				   u32 *n_mask_hit,
				   u32 *index) 
{
     struct sw_flow_mask *mask;
	 struct sw_flow *flow;
	 int i; 

  	for (i = 0; i < ma->max; i++)  {
  	    /* 遍历每个flow掩码 */
		mask = rcu_dereference_ovsl(ma->masks[i]); 
		
		/* 比对每条flow  */
		flow = masked_flow_lookup(ti, key, mask, n_mask_hit);
		if (flow) { /* Found */
			*index = i;
			return flow;
		}
	} 
}