OVS源码pmd_thread_main分析

本文介绍了OVS-DPDK中PMD线程如何轮询收包及处理流程,包括通过EMC、dpcls和ofprotoclassifier进行包分类的方法。详细解析了关键函数pmd_thread_main、dp_netdev_process_rxq_port等的工作原理。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

PMD线程在其轮询列表中持续轮询输入端口,在每一个端口上最多可同时收32个包(NETDEV_MAX_BURST),根据激活的流规则可将每一个收包进行分类。分类的目的是为了找到一个流,从而对包进行恰当的处理。包根据流进行分组,并且每一个分组将执行特定的动作。


flow由dp_netdev_flow数据结构所定义,并保存在flow_table的hash表内,flow中主要有以下信息:


Rule,Action,Statistics,Batch,Thread ID,Reference count

注意:Often the words “flows” and “rules” are used interchangeably; however, note that the rule is part of the flow.



pmd_thread_main是OVS通过pmd线程轮询在用户态收包流程的入口函数

pmd_thread_main(void *f_)
{
    struct dp_netdev_pmd_thread *pmd = f_;
    unsigned int lc = 0;
    struct polled_queue *poll_list;
    bool exiting;
    int poll_cnt;
    int i;

    poll_list = NULL;

    ...

     /*将pmd->poll_list存入poll_list并返回polled_queue数*/
    poll_cnt = pmd_load_queues_and_ports(pmd, &poll_list);   
reload:
    emc_cache_init(&pmd->flow_cache);

    ...

    for (;;) {
        for (i = 0; i < poll_cnt; i++) {
            dp_netdev_process_rxq_port(pmd, poll_list[i].rx,
                                       poll_list[i].port_no);
        }

    ...

    }

    poll_cnt = pmd_load_queues_and_ports(pmd, &poll_list);
    exiting = latch_is_set(&pmd->exit_latch);      //若设置pmd->exit_latch,那么终结pmd线程
    /* Signal here to make sure the pmd finishes
     * reloading the updated configuration. */
    dp_netdev_pmd_reload_done(pmd);

    emc_cache_uninit(&pmd->flow_cache);

    if (!exiting) {
        goto reload;
    }

    free(poll_list);
    pmd_free_cached_ports(pmd);
    return NULL;
}
pmd_thread_main通过调用dp_netdev_process_rxq_port处理netdev的收包过程

dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread *pmd,
                           struct netdev_rxq *rx,
                           odp_port_t port_no)
{
    struct dp_packet_batch batch;
    int error;

    dp_packet_batch_init(&batch);
    cycles_count_start(pmd);
    /*通过调用netdev_class->rxq_recv从rx中收包存入batch中*/
    error = netdev_rxq_recv(rx, &batch);             
    cycles_count_end(pmd, PMD_CYCLES_POLLING);
    if (!error) {
        *recirc_depth_get() = 0;

        cycles_count_start(pmd);
        /*将batch中的包转入datapath中进行处理*/
        dp_netdev_input(pmd, &batch, port_no);
        cycles_count_end(pmd, PMD_CYCLES_PROCESSING);
    } 
    ...
}
netdev_class的实例有NETDEV_DPDK_CLASS,NETDEV_DUMMY_CLASS,NETDEV_BSD_CLASS,NETDEV_LINUX_CLASS.
netdev_dpdk_rxq_recv(struct netdev_rxq *rxq, struct dp_packet_batch *batch)
{
    struct netdev_rxq_dpdk *rx = netdev_rxq_dpdk_cast(rxq);
    struct netdev_dpdk *dev = netdev_dpdk_cast(rxq->netdev);
    struct ingress_policer *policer = netdev_dpdk_get_ingress_policer(dev);
    int nb_rx;
    int dropped = 0;

    if (OVS_UNLIKELY(!(dev->flags & NETDEV_UP))) {
        return EAGAIN;
    }

    /*调用dpdk接口rte_eth_rx_burst进行收包,一次最多收32个包*/
    nb_rx = rte_eth_rx_burst(rx->port_id, rxq->queue_id,
                             (struct rte_mbuf **) batch->packets,
                             NETDEV_MAX_BURST);
    if (!nb_rx) {
        return EAGAIN;
    }

    /*若存在policer那么对dp_packet_batch中的每一个dp_packet
    *调用netdev_dpdk_policer_pkt_handle进行处理,返回值为meter后的实际收包数*/
    if (policer) {
        dropped = nb_rx;
        nb_rx = ingress_policer_run(policer,
                                    (struct rte_mbuf **) batch->packets,
                                    nb_rx);
        dropped -= nb_rx;
    }

    /* Update stats to reflect dropped packets */
    if (OVS_UNLIKELY(dropped)) {
        rte_spinlock_lock(&dev->stats_lock);
        dev->stats.rx_dropped += dropped;
        rte_spinlock_unlock(&dev->stats_lock);
    }

    batch->count = nb_rx;

    return 0;
}

包从物理或者虚拟接口进入OVS-DPDK后根据包的头域将会得到一个唯一的标识或者hash,这个标识将会与以下3个交换表中的一条表项进行匹配。这三个交换表分别为:exact match cache(EMC),datapath classifier(dpcls),ofproto classifier。包将会按顺序遍历以上3个表直到找到表项与其匹配,匹配后包将执行匹配所指示的所有动作,然后进行转发。


EMC根据有限数量的表项对流提供快速的处理,在EMC中包标识必须与表项进行IP 5元组的精确匹配。若EMC未匹配上,那么包将进入dpcls。dpcls拥有多重子表来维持更多的表项,并且可使用通配(wildcard)对包标识进行匹配。当包与dpcls匹配后流表项将在EMC中进行设置,在此之后那些拥有与当前包相同标识的包可以根据EMC快速处理。若EMC依旧未匹配上,那么包将进入ofproto classifier根据openflow控制器进行处理。若在ofproto classifier中匹配了相应的表项,那个该表项将项快速交换表分发,在此之后那些拥有相同流的包将被快速处理。(翻译自:https://software.intel.com/en-us/articles/open-vswitch-with-dpdk-overview)


注意:EMC是以PMD为边界的,每个PMD拥有自己的EMC;dpcls是以端口为边界的,每个端口拥有自己的dpcls;ofproto classifier是以桥为边界的,每个桥拥有自己的ofproto classifier


dp_netdev_input__(struct dp_netdev_pmd_thread *pmd,
                  struct dp_packet_batch *packets,
                  bool md_is_valid, odp_port_t port_no)
{
    int cnt = packets->count;
#if !defined(__CHECKER__) && !defined(_WIN32)
    const size_t PKT_ARRAY_SIZE = cnt;
#else
    /* Sparse or MSVC doesn't like variable length array. */
    enum { PKT_ARRAY_SIZE = NETDEV_MAX_BURST };
#endif
    OVS_ALIGNED_VAR(CACHE_LINE_SIZE) struct netdev_flow_key keys[PKT_ARRAY_SIZE];
    struct packet_batch_per_flow batches[PKT_ARRAY_SIZE];
    long long now = time_msec();
    size_t newcnt, n_batches, i;
    odp_port_t in_port;

    n_batches = 0;
    /*将dp_packet_batch中的所有包送入EMC(pmd->flow_cache)处理
    *返回要被送入fast_path_processing中处理的包数
    *同时若md_is_valid该函数还将根据port_no初始化metadata*/
    newcnt = emc_processing(pmd, packets, keys, batches, &n_batches,
                            md_is_valid, port_no);
    if (OVS_UNLIKELY(newcnt)) {
        packets->count = newcnt;
        /* Get ingress port from first packet's metadata. */
        in_port = packets->packets[0]->md.in_port.odp_port;
        fast_path_processing(pmd, packets, keys, batches, &n_batches, in_port, now);
    }

    /* All the flow batches need to be reset before any call to
     * packet_batch_per_flow_execute() as it could potentially trigger
     * recirculation. When a packet matching flow ‘j’ happens to be
     * recirculated, the nested call to dp_netdev_input__() could potentially
     * classify the packet as matching another flow - say 'k'. It could happen
     * that in the previous call to dp_netdev_input__() that same flow 'k' had
     * already its own batches[k] still waiting to be served.  So if its
     * ‘batch’ member is not reset, the recirculated packet would be wrongly
     * appended to batches[k] of the 1st call to dp_netdev_input__(). */
    for (i = 0; i < n_batches; i++) {
        batches[i].flow->batch = NULL;
    }

    for (i = 0; i < n_batches; i++) {
        packet_batch_per_flow_execute(&batches[i], pmd, now);
    }
}




评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值