P2P non-blocking

本文介绍MPI中非阻塞通信的基本概念及其在并行计算中的优势,包括非阻塞发送与接收函数的使用方法,并通过示例程序展示了如何实现计算与通信的重叠以提高效率。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

1. 非阻塞(non-blocking)通信

(1)非阻塞的sender

(a)几乎可以立刻返回去干别的事情。不管数据是否从application buffer到了system buffer, 或者数据到了receiver端的application buffer或者system buffer.

(b)发送操作由MPI lib选择合适的时间去完成。

(c)sender最好不要去立刻修改刚发送数据的内存单元(send buffer)(不安全啊!),但可以操作其他的内存单元。

(d)若要操作刚发送数据的内存单元(send buffer),须通过*wait*函数确认发送完成。

(e)非阻塞的发送可以实现计算与通信重叠(overlap computation with communication and exploit possible performance gains)。

(2) 非阻塞receiver(原理同非阻塞的sender)

2. 函数

MPI_Isend (&buf,count,datatype,dest,tag,comm,&request)

Identifies an area in memory to serve as a send buffer. Processing continues immediately without waiting for the message to be copied out from the application buffer. A communication request handle is returned for handling the pending message status. The program should not modify the application buffer until subsequent calls to MPI_Wait or MPI_Test indicate that the non-blocking send has completed.

MPI_Irecv (&buf,count,datatype,source,tag,comm,&request)

Identifies an area in memory to serve as a receive buffer. Processing continues immediately without actually waiting for the message to be received and copied into the the application buffer. A communication request handle is returned for handling the pending message status. The program must use calls to MPI_Wait or MPI_Test to determine when the non-blocking receive operation completes and the requested message is available in the application buffer.

3. 举例

#include"mpi.h"
#include<stdio.h>

int main(int argc, char *argv[]){
        int totalNumTasks, rankID;

        MPI_Init(&argc, &argv);
        MPI_Comm_size(MPI_COMM_WORLD, &totalNumTasks);
        MPI_Comm_rank(MPI_COMM_WORLD, &rankID);
        //get the host where this process is running
        int  nameLength;
        char processor_name[MPI_MAX_PROCESSOR_NAME];
        MPI_Get_processor_name(processor_name,&nameLength);

        int prevRankID = rankID - 1;
        int nextRankID = rankID + 1;
        if(rankID == 0)  prevRankID = totalNumTasks - 1;
        if(rankID == (totalNumTasks - 1)) nextRankID = 0;

        int count = 1;
        MPI_Request request[4];

        char recvBuf1;
        char sendBuf1 = 'R';
        int tag1 = 1;
        MPI_Irecv(&recvBuf1, count, MPI_CHAR, prevRankID, tag1, MPI_COMM_WORLD, &request[0]);
        MPI_Isend(&sendBuf1, count, MPI_CHAR, nextRankID, tag1, MPI_COMM_WORLD, &request[1]);

        char recvBuf2;
        char sendBuf2 = 'L';
        int tag2 = 2;
        MPI_Irecv(&recvBuf2, count, MPI_CHAR, nextRankID, tag2, MPI_COMM_WORLD, &request[2]);
        MPI_Isend(&sendBuf2, count, MPI_CHAR, prevRankID, tag2, MPI_COMM_WORLD, &request[3]);
        //after, non-blocking send and receive, process can do something except modifying the application buffer
        //Here, application buffer is recvBuf1, sendBuf1, recvBuf2, sendBuf2
        //which can overlap the communication and computing 
        //Indeed, you can use other memory areas
        printf("My rankID = %d on Processor = %s, I can do something here.....\n", rankID, processor_name);
        //Now to check after MPI_Waitall, after it, the application buffer is safe to reuse 
        MPI_Status status[4];
        MPI_Waitall(4, request, status);
        printf("My rankID = %d, recvBuf1 = %c && source = %d && tag = %d\n", 
                rankID, recvBuf1, status[0].MPI_SOURCE, status[0].MPI_TAG);

        printf("My rankID = %d, recvBuf2 = %c && source = %d && tag = %d\n", 
                rankID, recvBuf2, status[2].MPI_SOURCE, status[2].MPI_TAG);
        printf("My rankID = %d, Now, my application buffer is safe to reuse.\n", rankID);
        MPI_Finalize();
        return 0;
}

4. 编译执行

[amao@amao991 mpi-study]$ mpicc p2pNonBlockingOnWhichProcessor.c 
[amao@amao991 mpi-study]$ mpiexec -n 3 -f machinefile ./a.out 
My rankID = 0 on Processor = amao991, I can do something here.....
My rankID = 2 on Processor = amao992, I can do something here.....
My rankID = 1 on Processor = amao991, I can do something here.....
My rankID = 0, recvBuf1 = R && source = 2 && tag = 1
My rankID = 0, recvBuf2 = L && source = 1 && tag = 2
My rankID = 0, Now, my application buffer is safe to reuse.
My rankID = 1, recvBuf1 = R && source = 0 && tag = 1
My rankID = 1, recvBuf2 = L && source = 2 && tag = 2
My rankID = 1, Now, my application buffer is safe to reuse.
My rankID = 2, recvBuf1 = R && source = 1 && tag = 1
My rankID = 2, recvBuf2 = L && source = 0 && tag = 2
My rankID = 2, Now, my application buffer is safe to reuse.

5. 总结

(1)本例中3个进程构成一个双向环,每个进程接收到消息后,就转手发送

(2)由于是采用了non-blocking发送/接受,因此,函数调用完成后,进程可以继续执行其他语句(这里用printf一条语句来示例),只要不操作接受/发送buffer就好

(3)到最后,若要查看接收buffer,须调用MPI_Waitall以确保接受完成了。


<think>我们分析日志输出和代码逻辑:日志信息:[SD0]Non-blockingRWdoesnotallowblkcnt(16)>maxblks(65536)等,一共四条,其中blkcnt分别为16,1,2,1,而maxblks都是65536。代码中与非阻塞读写检查相关的部分:#ifdefFEATURE_NONBLOCKING_RWif(blkcnt>maxblks){msdc_pr_err("[SD%d]Non-blockingRWdoesnotallowblkcnt(%d)>maxblks(%d)\n",host->id,blkcnt,maxblks);BUG_ON(1);}#endif同时,我们在函数开头还看到一条打印(非条件编译内):msdc_pr_err("[SD%d]Non-blockingRWdoesnotallowblkcnt(%d)>maxblks(%d)\n",host->id,blkcnt,maxblks);注意:这条打印语句是在条件编译之外,也就是说无论是否定义了FEATURE_NONBLOCKING_RW,都会打印。但是,错误检查(BUG_ON(1))只在定义了FEATURE_NONBLOCKING_RY时才执行。日志中打印的blkcnt(最大16)都远远小于maxblks(65536),因此不应该触发这个错误。为什么会打印这些日志?因为代码中在函数开头有一句打印:msdc_pr_err("[SD%d]Non-blockingRWdoesnotallowblkcnt(%d)>maxblks(%d)\n",host->id,blkcnt,maxblks);这条打印语句没有在任何条件判断内,也就是说,每次调用这个函数,无论blkcnt是否大于maxblks,都会打印这条错误信息。这显然是不正确的。正确做法应该是:这条错误信息应该放在条件判断内部,即只有当blkcnt>maxblks时才打印。另外,日志中打印的blkcnt(16,1,2,1)都小于maxblks(65536),所以不应该触发错误打印。但实际情况却打印了。因此,我们推断代码中可能存在问题:这条打印语句放错了位置,它应该放在条件判断内部。但是目前它被放在了函数开头,没有条件判断,所以每次调用该函数都会打印。另外,注意条件编译部分:如果定义了FEATURE_NONBLOCKING_RW,那么当blkcnt>maxblks时,会打印错误并触发BUG_ON。但是,由于日志中blkcnt并没有大于maxblks,所以不会触发BUG_ON,但是那条无条件打印却每次都会执行。所以,这些日志是异常的,因为它们本不应该被打印(因为blkcnt并不大于maxblks)。问题在于打印语句被错误地放置在了函数开头,而不是条件判断内部。修改建议:将这条打印语句移动到条件判断内部,并且只当blkcnt>maxblks时才打印。原代码:msdc_pr_err("[SD%d]Non-blockingRWdoesnotallowblkcnt(%d)>maxblks(%d)\n",host->id,blkcnt,maxblks);#ifdefFEATURE_NONBLOCKING_RWif(blkcnt>maxblks){msdc_pr_err("[SD%d]Non-blockingRWdoesnotallowblkcnt(%d)>maxblks(%d)\n",host->id,blkcnt,maxblks);BUG_ON(1);}#endif注意这里打印了两次?实际上,第一次在函数开头是无条件打印,第二次在条件判断内。所以每次调用都会打印一次,然后如果条件满足再打印一次并触发BUG。这显然是错误的。所以,函数开头的打印语句应该是多余的,或者是调试过程中遗留的。结论:日志异常是因为代码中存在一条多余的打印语句,该语句无论条件如何都会打印。需要删除函数开头的那条打印语句,只保留条件判断内的打印。因此,日志中的这些条目是错误打印,实际并不代表错误条件被触发(因为blkcnt<maxblks)。它们是代码中错误的打印语句导致的,应该被修复。相关问题:1.为什么在非阻塞读写中需要限制单次传输的块数量?2.如何正确实现非阻塞读写的错误检查和日志打印?3.在MMC驱动中,max_phys_segs通常由什么决定?如何优化?4.块设备写入操作中,如何确保数据完整性和传输效率?</think> ### 日志异常分析 #### 日志与代码逻辑对比 分析您提供的日志数据: ``` [6544] [SD0] Non-blocking RW does not allow blkcnt(16) > maxblks(65536) [7444] [SD0] Non-blocking RW does not allow blkcnt(1) > maxblks(65536) [8276] [SD0] Non-blocking RW does not allow blkcnt(2) > maxblks(65536) [9176] [SD0] Non-blocking RW does not allow blkcnt(1) > maxblks(65536) ``` 对比代码中的限制条件: ```c #ifdef FEATURE_NONBLOCKING_RW if (blkcnt > maxblks) { msdc_pr_err("[SD%d] Non-blocking RW does not allow blkcnt(%d) > maxblks(%d)\n", host->id, blkcnt, maxblks); BUG_ON(1); } #endif ``` #### 异常点诊断 1. **比较逻辑矛盾**: - 日志中所有触发条件均为:$blkcnt < maxblks$ - 例:$16 < 65536$,$1 < 65536$,$2 < 65536$ - 但代码触发条件为:$blkcnt > maxblks$ - **矛盾**:实际比较关系与日志描述完全相反 2. **BUG_ON触发异常**: - 当 $blkcnt > maxblks$ 时,`BUG_ON(1)` 会强制内核崩溃 - 日志显示$blkcnt$远小于$maxblks$,不应触发错误,但系统仍正常运行,证明: - 错误打印被错误触发 - `BUG_ON` 未被执行 3. **实际代码执行路径**: ```c // 无条件打印(问题根源) msdc_pr_err("[SD%d] Non-blocking RW does not allow blkcnt(%d) > maxblks(%d)\n", host->id, blkcnt, maxblks); // 条件检查 #ifdef FEATURE_NONBLOCKING_RW if (blkcnt > maxblks) { ... // 实际未执行 } #endif ``` - **核心问题**:日志打印语句在条件检查之外,每次调用必打印 - **误导性**:即使 $blkcnt < maxblks$ 也错误提示 $blkcnt > maxblks$ #### 问题本质 1. **代码缺陷**: - 无条件 `msdc_pr_err` 应移至条件判断内部 - 当前实现导致所有写入操作都会打印错误日志 2. **修正建议**: ```c // 修正后代码 #ifdef FEATURE_NONBLOCKING_RW if (blkcnt > maxblks) { msdc_pr_err("[SD%d] Actual error: blkcnt(%d) > maxblks(%d)\n", host->id, blkcnt, maxblks); BUG_ON(1); } #endif ``` #### 影响评估 | 日志条目 | blkcnt值 | maxblks值 | 实际是否错误 | 日志是否合理 | |---------|---------|----------|-------------|------------| | 6544 | 16 | 65536 | 否 | 异常打印 | | 7444 | 1 | 65536 | 否 | 异常打印 | | 8276 | 2 | 65536 | 否 | 异常打印 | | 9176 | 1 | 65536 | 否 | 异常打印 | **结论**:所有日志均为错误触发,系统实际正常工作,需修改代码逻辑消除误导性日志[^1]。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值