DPDK LPM 查找实现

榕易

已于 2024-07-04 10:46:49 修改

阅读量1.2k

点赞数 9

分类专栏： SDN 文章标签：网络

于 2024-04-27 14:35:36 首次发布

本文链接：https://blog.youkuaiyun.com/lovelylichqq/article/details/138244471

版权

SDN 专栏收录该内容

2 篇文章

订阅专栏

DPDK LPM 查找实现

LPM 实现需求
LPM 数据结构设计
路由查找
回顾
参考文档

LPM 实现需求

典型路由添加格式：

ip route add to 192.168.1.0/24 dev eth0
ip route add to 192.168.1.0/24  via 192.168.1.1 dev eth0

DPDK LPM 实现的是第一种的高速查找，对于第二种需求，则是增加外挂的next hop table 来实现。
即通过DPDK LPM 可实现两种到next hop id 的映射：

next hop id 如果为port id ，那么就对应第一种路由，直接出网口
如果next hop id 另外一张下一条路由表的索引，则可以实现第二种路由规则

所以目前LPM 的entry 查找结果的字段 uint32_t next_hop :24 其具体的定义由应用自行实现

LPM 数据结构设计

DPDK 的LPM 设计，类似内存管理的页表，用二级结构实现：
第一级用一个hash表lpm->tbl24管理，查找目的IP的高24bit位作为索引
第二级有很多个独立的hash表，所有表统一由lpm->tbl8管理。具体用哪个表，由上一级查找结果tbl24表entry中查找到的next_hop字段确定。即：在表lpm->tbl8[tbl8_index] 表中查找，用查找目的IP的低8位作为索引，这里tbl8_index就是上一级查找结果tbl24表entry中查找到的next_hop
其表entry为：

/** @internal Tbl24 entry structure. */
__extension__
struct rte_lpm_tbl_entry {
	/**
	 * Stores Next hop (tbl8 or tbl24 when valid_group is not set) or
	 * a group index pointing to a tbl8 structure (tbl24 only, when
	 * valid_group is set)
	 */
	uint32_t next_hop    :24;
	/* Using single uint8_t to store 3 values. */
	uint32_t valid       :1;   /**< Validation flag. */
	/**
	 * For tbl24:
	 *  - valid_group == 0: entry stores a next hop
	 *  - valid_group == 1: entry stores a group_index pointing to a tbl8
	 * For tbl8:
	 *  - valid_group indicates whether the current tbl8 is in use or not
	 */
	uint32_t valid_group :1;
	uint32_t depth       :6; /**< Rule depth. */
};

考虑有两点：

从而减少tbl8的存储量，
反而 tbl8 也很少用到，大部分掩码都是小于等于24。

所以默认都各个实现基本都默认指定了:config_ipv4.number_tbl8s = 256，所以基本等于默认支持256个tbl8 表。比实际全量存储还是少了不少空间：(2^24 ) * (2 ^8)。以下图为例，图中1，192，都有对应的tbl8，但是2，193 则没有对应的tbl8 表，仅仅因为目前路由规则尚未添加包含。（所以问题：如果要给IP 地址 2 添加一个对应的tbl8，如何扩展呢？）

注意这里tbl8表的实现，有rte_lpm->tbl8统领，每个tbl24 表中的entry 要么对应256个tbl8 entry，并由next hop id 确定首个tbl8entry 的位置(tbl24_entry->next_hop * 256)，要么对应0。
dpdk LPM 数据结构设计
此外LPM 还增加了一个总的管理结构表，即rte_lpm->rules_tbl：

The other main data structure is a table containing the main information about the rules (IP and next hop). This is a higher level table, used for different things:

Check whether a rule already exists or not, prior to addition or deletion, without having to actually perform a lookup.
When deleting, to check whether there is a rule containing the one that is to be deleted. This is important, since the main data structure will have to be updated accordingly.

该表用于管理所有的路由规则，主要用于增加（rte_lpm_add）和删除（rte_lpm_delete）时使用，而查询时仍然主要用rte_lpm->tbl24和rte_lpm_tbl8用于高速查找（rte_lpm_find）。
其表entry结构如下：

/** @internal Rule structure. */
struct rte_lpm_rule {
	uint32_t ip; /**< Rule IP address. */
	uint32_t next_hop; /**< Rule next hop. */
};

因为在添加和删除时，可以直接根据要添加删除的路由的ip/mask 确定对应路由entry 是否已经存在：

用掩码位数，在rules_info中找到对应的rules_tbl表中的index 范围
进而用ip_masked线性遍历rules_tbl表中的该范围，确定对应路由表项：

/*
 * Finds a rule in rule table.
 * NOTE: Valid range for depth parameter is 1 .. 32 inclusive.
 */
static int32_t
rule_find(struct __rte_lpm *i_lpm, uint32_t ip_masked, uint8_t depth)
{
	uint32_t rule_gindex, last_rule, rule_index;
	//根据掩码找到index 范围
	rule_gindex = i_lpm->rule_info[depth - 1].first_rule;
	last_rule = rule_gindex + i_lpm->rule_info[depth - 1].used_rules;

	//线性查找rules_tbl表，检查是否已经存在ip_masked对应的路由规则
	/* Scan used rules at given depth to find rule. */
	for (rule_index = rule_gindex; rule_index < last_rule; rule_index++) {
		/* If rule is found return the rule index. */
		if (i_lpm->rules_tbl[rule_index].ip == ip_masked)
			return rule_index;
	}

	/* If rule is not found return -EINVAL. */
	return -EINVAL;
}

路由查找

无论tbl24 还是tbl8 ，每个entry的结构，其实主要时next_hop 字段，用于确定从哪个设备出去。

查询时，先用目的ip的高24位查询，如果找到entry，则找到出口设备id：

ip4_lookup_node_process_scalar
- 获取报文目的ip地址，rte_be_to_cpu_32(ipv4_hdr->dst_addr)
- rte_lpm_lookup //用报文目的ip地址作为输入，查询lpm 表
  - 目的ip高24位作为 tbl24 index：tbl24_index = (ip >> 8)，得到tbl24 中的 tbl_entry
  - 检查tbl_entry从而确定tbl8 是否还需要额外查询：tbl_entry & RTE_LPM_VALID_EXT_ENTRY_BITMASK)
    - 如果tbl8还要额外查（valid_group 为1），则用next_hop 作为tbl8 表的索引，确定用哪个tbl8表：（tbl_entry & 0x00FFFFFF) * RTE_LPM_TBL8_GROUP_NUM_ENTRIES
    - 用ip 低8为作为group 内偏移，从而确定具体的tbl_entry
  - 返回entry中的next_hop作为出口设备/下一条表的索引id

回顾

注意当添加的规则是前缀20掩码时，由于我们采用24掩码长度，所以192.168.1.0/20 其实对应 2^(24-20)=16个表项都是合法的，所以对于这些表项，我们都要用相同的entry 内容去刷新，从而却确保其找到的next hop 都相等，即用固定的24掩码规则下的entry合并，实现20 的掩码效果
相应的，当前缀掩码时28时，我们则在对应的tbl8表内，实现2^(32-28) =16 个表项的相同刷新逻辑，从而用32掩码规则多entry合并，实现28的掩码效果
当tbl8 和tbl24 表entry 被删除时，需要找可能的替代entry。因为是最长匹配算法，当我们的精确entry不需要时，我们可能需要将当前仍然存在的模糊entry的覆盖范围包含我们当前要删除的entry，从而让未来我们精确的rule 匹配的ip 查找时，采用模糊entry的结果。
限制：当前采用256个tbl8 表，即我们允许有256个互不同容的大于24长度的掩码规则，超出后会添加规则失败。互不相容指比如172.168.7.0/25 和 192.168.2.0/25这样的前24比特不相等，且掩码长度都大于24的规则
问题：如果要给IP 地址 2 添加一个对应的tbl8，如何扩展呢？答：无需扩展，直接继续从lpm->tbl8 表按RTE_LPM_TBL8_GROUP_NUM_ENTRIES（256）为一组往后分配，直到找到空闲的组，直接作为当前新tbl24 entry的扩展组。所以tbl8 本质上时按 id 分配组，tbl24中的entry先到先得，后来的tbl24 entry 就继续往后分配即可，直到分配不够。