Using nodes: slurm-gb200-217-[027,047]
# nThread 1 nGpus 1 minBytes 536870912 maxBytes 8589934592 step: 2(factor) warmup iters: 5 iters: 20 agg iters: 1 validation: 1 graph: 0
#
# Using devices
# Rank 0 Group 0 Pid 234738 on slurm-gb200-217-027 device 0 [0x01] NVIDIA GB200
# Rank 1 Group 0 Pid 234739 on slurm-gb200-217-027 device 1 [0x01] NVIDIA GB200
# Rank 2 Group 0 Pid 234740 on slurm-gb200-217-027 device 2 [0x01] NVIDIA GB200
# Rank 3 Group 0 Pid 234741 on slurm-gb200-217-027 device 3 [0x01] NVIDIA GB200
# Rank 4 Group 0 Pid 237289 on slurm-gb200-217-047 device 0 [0x01] NVIDIA GB200
# Rank 5 Group 0 Pid 237290 on slurm-gb200-217-047 device 1 [0x01] NVIDIA GB200
# Rank 6 Group 0 Pid 237291 on slurm-gb200-217-047 device 2 [0x01] NVIDIA GB200
# Rank 7 Group 0 Pid 237292 on slurm-gb200-217-047 device 3 [0x01] NVIDIA GB200
slurm-gb200-217-027:234738:234738 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0
slurm-gb200-217-027:234738:234738 [0] NCCL INFO Bootstrap: Using eth0:10.0.4.227<0>
slurm-gb200-217-027:234738:234738 [0] NCCL INFO cudaDriverVersion 12080
slurm-gb200-217-027:234738:234738 [0] NCCL INFO NCCL version 2.25.1+cuda12.8
slurm-gb200-217-027:234740:234740 [2] NCCL INFO cudaDriverVersion 12080
slurm-gb200-217-027:234740:234740 [2] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0
slurm-gb200-217-027:234740:234740 [2] NCCL INFO Bootstrap: Using eth0:10.0.4.227<0>
slurm-gb200-217-027:234740:234740 [2] NCCL INFO NCCL version 2.25.1+cuda12.8
slurm-gb200-217-027:234741:234741 [3] NCCL INFO cudaDriverVersion 12080
slurm-gb200-217-027:234741:234741 [3] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0
slurm-gb200-217-027:234741:234741 [3] NCCL INFO Bootstrap: Using eth0:10.0.4.227<0>
slurm-gb200-217-027:234741:234741 [3] NCCL INFO NCCL version 2.25.1+cuda12.8
slurm-gb200-217-027:234739:234739 [1] NCCL INFO cudaDriverVersion 12080
slurm-gb200-217-027:234739:234739 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0
slurm-gb200-217-027:234739:234739 [1] NCCL INFO Bootstrap: Using eth0:10.0.4.227<0>
slurm-gb200-217-027:234739:234739 [1] NCCL INFO NCCL version 2.25.1+cuda12.8
slurm-gb200-217-047:237289:237289 [0] NCCL INFO cudaDriverVersion 12080
slurm-gb200-217-047:237289:237289 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0
slurm-gb200-217-047:237289:237289 [0] NCCL INFO Bootstrap: Using eth0:10.0.5.220<0>
slurm-gb200-217-047:237289:237289 [0] NCCL INFO NCCL version 2.25.1+cuda12.8
slurm-gb200-217-047:237292:237292 [3] NCCL INFO cudaDriverVersion 12080
slurm-gb200-217-047:237292:237292 [3] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0
slurm-gb200-217-047:237292:237292 [3] NCCL INFO Bootstrap: Using eth0:10.0.5.220<0>
slurm-gb200-217-047:237292:237292 [3] NCCL INFO NCCL version 2.25.1+cuda12.8
slurm-gb200-217-047:237290:237290 [1] NCCL INFO cudaDriverVersion 12080
slurm-gb200-217-047:237290:237290 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0
slurm-gb200-217-047:237290:237290 [1] NCCL INFO Bootstrap: Using eth0:10.0.5.220<0>
slurm-gb200-217-047:237290:237290 [1] NCCL INFO NCCL version 2.25.1+cuda12.8
slurm-gb200-217-027:234738:235129 [0] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v9 symbol.
slurm-gb200-217-027:234738:235129 [0] NCCL INFO NET/Plugin: Loaded net plugin NCCL RDMA Plugin v8 (v8)
slurm-gb200-217-027:234738:235129 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v9 symbol.
slurm-gb200-217-027:234738:235129 [0] NCCL INFO NET/Plugin: Loaded collnet plugin SHARP (v8)
slurm-gb200-217-027:234738:235129 [0] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so
slurm-gb200-217-027:234738:235129 [0] NCCL INFO P2P plugin v8 IBext_v8
slurm-gb200-217-027:234738:235129 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0
slurm-gb200-217-047:237291:237291 [2] NCCL INFO cudaDriverVersion 12080
slurm-gb200-217-047:237291:237291 [2] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0
slurm-gb200-217-047:237291:237291 [2] NCCL INFO Bootstrap: Using eth0:10.0.5.220<0>
slurm-gb200-217-047:237291:237291 [2] NCCL INFO NCCL version 2.25.1+cuda12.8
slurm-gb200-217-027:234738:235129 [0] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1.
slurm-gb200-217-027:234738:235129 [0] NCCL INFO NET/IB : Using [0]ibp0:1/IB/SHARP [1]ibp1:1/IB/SHARP [2]ibp2:1/IB/SHARP [3]ibp3:1/IB/SHARP [RO]; OOB eth0:10.0.4.227<0>
slurm-gb200-217-027:234738:235129 [0] NCCL INFO PROFILER/Plugin: Could not find: libnccl-profiler.so.
slurm-gb200-217-027:234738:235129 [0] NCCL INFO Using network IBext_v8
slurm-gb200-217-027:234740:235130 [2] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v9 symbol.
slurm-gb200-217-027:234740:235130 [2] NCCL INFO NET/Plugin: Loaded net plugin NCCL RDMA Plugin v8 (v8)
slurm-gb200-217-027:234741:235131 [3] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v9 symbol.
slurm-gb200-217-027:234741:235131 [3] NCCL INFO NET/Plugin: Loaded net plugin NCCL RDMA Plugin v8 (v8)
slurm-gb200-217-027:234740:235130 [2] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v9 symbol.
slurm-gb200-217-027:234740:235130 [2] NCCL INFO NET/Plugin: Loaded collnet plugin SHARP (v8)
slurm-gb200-217-027:234741:235131 [3] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v9 symbol.
slurm-gb200-217-027:234741:235131 [3] NCCL INFO NET/Plugin: Loaded collnet plugin SHARP (v8)
slurm-gb200-217-027:234740:235130 [2] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so
slurm-gb200-217-027:234740:235130 [2] NCCL INFO P2P plugin v8 IBext_v8
slurm-gb200-217-027:234741:235131 [3] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so
slurm-gb200-217-027:234741:235131 [3] NCCL INFO P2P plugin v8 IBext_v8
slurm-gb200-217-027:234740:235130 [2] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0
slurm-gb200-217-027:234741:235131 [3] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0
slurm-gb200-217-027:234739:235132 [1] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v9 symbol.
slurm-gb200-217-027:234739:235132 [1] NCCL INFO NET/Plugin: Loaded net plugin NCCL RDMA Plugin v8 (v8)
slurm-gb200-217-027:234739:235132 [1] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v9 symbol.
slurm-gb200-217-027:234739:235132 [1] NCCL INFO NET/Plugin: Loaded collnet plugin SHARP (v8)
slurm-gb200-217-027:234739:235132 [1] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so
slurm-gb200-217-027:234739:235132 [1] NCCL INFO P2P plugin v8 IBext_v8
slurm-gb200-217-027:234739:235132 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0
slurm-gb200-217-027:234741:235131 [3] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1.
slurm-gb200-217-027:234741:235131 [3] NCCL INFO NET/IB : Using [0]ibp0:1/IB/SHARP [1]ibp1:1/IB/SHARP [2]ibp2:1/IB/SHARP [3]ibp3:1/IB/SHARP [RO]; OOB eth0:10.0.4.227<0>
slurm-gb200-217-027:234740:235130 [2] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1.
slurm-gb200-217-027:234740:235130 [2] NCCL INFO NET/IB : Using [0]ibp0:1/IB/SHARP [1]ibp1:1/IB/SHARP [2]ibp2:1/IB/SHARP [3]ibp3:1/IB/SHARP [RO]; OOB eth0:10.0.4.227<0>
slurm-gb200-217-027:234741:235131 [3] NCCL INFO PROFILER/Plugin: Could not find: libnccl-profiler.so.
slurm-gb200-217-027:234741:235131 [3] NCCL INFO Using network IBext_v8
slurm-gb200-217-027:234740:235130 [2] NCCL INFO PROFILER/Plugin: Could not find: libnccl-profiler.so.
slurm-gb200-217-027:234740:235130 [2] NCCL INFO Using network IBext_v8
slurm-gb200-217-027:234739:235132 [1] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1.
slurm-gb200-217-027:234739:235132 [1] NCCL INFO NET/IB : Using [0]ibp0:1/IB/SHARP [1]ibp1:1/IB/SHARP [2]ibp2:1/IB/SHARP [3]ibp3:1/IB/SHARP [RO]; OOB eth0:10.0.4.227<0>
slurm-gb200-217-027:234739:235132 [1] NCCL INFO PROFILER/Plugin: Could not find: libnccl-profiler.so.
slurm-gb200-217-027:234739:235132 [1] NCCL INFO Using network IBext_v8
slurm-gb200-217-047:237289:237675 [0] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v9 symbol.
slurm-gb200-217-047:237289:237675 [0] NCCL INFO NET/Plugin: Loaded net plugin NCCL RDMA Plugin v8 (v8)
slurm-gb200-217-047:237289:237675 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v9 symbol.
slurm-gb200-217-047:237289:237675 [0] NCCL INFO NET/Plugin: Loaded collnet plugin SHARP (v8)
slurm-gb200-217-047:237289:237675 [0] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so
slurm-gb200-217-047:237289:237675 [0] NCCL INFO P2P plugin v8 IBext_v8
slurm-gb200-217-047:237289:237675 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0
slurm-gb200-217-047:237292:237676 [3] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v9 symbol.
slurm-gb200-217-047:237292:237676 [3] NCCL INFO NET/Plugin: Loaded net plugin NCCL RDMA Plugin v8 (v8)
slurm-gb200-217-047:237292:237676 [3] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v9 symbol.
slurm-gb200-217-047:237292:237676 [3] NCCL INFO NET/Plugin: Loaded collnet plugin SHARP (v8)
slurm-gb200-217-047:237292:237676 [3] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so
slurm-gb200-217-047:237292:237676 [3] NCCL INFO P2P plugin v8 IBext_v8
slurm-gb200-217-047:237292:237676 [3] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0
slurm-gb200-217-047:237290:237677 [1] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v9 symbol.
slurm-gb200-217-047:237290:237677 [1] NCCL INFO NET/Plugin: Loaded net plugin NCCL RDMA Plugin v8 (v8)
slurm-gb200-217-047:237290:237677 [1] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v9 symbol.
slurm-gb200-217-047:237290:237677 [1] NCCL INFO NET/Plugin: Loaded collnet plugin SHARP (v8)
slurm-gb200-217-047:237290:237677 [1] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so
slurm-gb200-217-047:237290:237677 [1] NCCL INFO P2P plugin v8 IBext_v8
slurm-gb200-217-047:237290:237677 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0
slurm-gb200-217-027:234738:235129 [0] NCCL INFO DMA-BUF is available on GPU device 0
slurm-gb200-217-027:234738:235129 [0] NCCL INFO ncclCommInitRank comm 0xc4b2133f8d20 rank 0 nranks 8 cudaDev 0 nvmlDev 0 busId 801000 commId 0xee13c4f1e7e030dc - Init START
slurm-gb200-217-047:237289:237675 [0] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1.
slurm-gb200-217-047:237289:237675 [0] NCCL INFO NET/IB : Using [0]ibp0:1/IB/SHARP [1]ibp1:1/IB/SHARP [2]ibp2:1/IB/SHARP [3]ibp3:1/IB/SHARP [RO]; OOB eth0:10.0.5.220<0>
slurm-gb200-217-047:237292:237676 [3] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1.
slurm-gb200-217-047:237292:237676 [3] NCCL INFO NET/IB : Using [0]ibp0:1/IB/SHARP [1]ibp1:1/IB/SHARP [2]ibp2:1/IB/SHARP [3]ibp3:1/IB/SHARP [RO]; OOB eth0:10.0.5.220<0>
slurm-gb200-217-047:237290:237677 [1] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1.
slurm-gb200-217-047:237290:237677 [1] NCCL INFO NET/IB : Using [0]ibp0:1/IB/SHARP [1]ibp1:1/IB/SHARP [2]ibp2:1/IB/SHARP [3]ibp3:1/IB/SHARP [RO]; OOB eth0:10.0.5.220<0>
slurm-gb200-217-047:237289:237675 [0] NCCL INFO PROFILER/Plugin: Could not find: libnccl-profiler.so.
slurm-gb200-217-047:237289:237675 [0] NCCL INFO Using network IBext_v8
slurm-gb200-217-047:237292:237676 [3] NCCL INFO PROFILER/Plugin: Could not find: libnccl-profiler.so.
slurm-gb200-217-047:237292:237676 [3] NCCL INFO Using network IBext_v8
slurm-gb200-217-047:237290:237677 [1] NCCL INFO PROFILER/Plugin: Could not find: libnccl-profiler.so.
slurm-gb200-217-047:237290:237677 [1] NCCL INFO Using network IBext_v8
slurm-gb200-217-047:237291:237678 [2] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v9 symbol.
slurm-gb200-217-047:237291:237678 [2] NCCL INFO NET/Plugin: Loaded net plugin NCCL RDMA Plugin v8 (v8)
slurm-gb200-217-047:237291:237678 [2] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v9 symbol.
slurm-gb200-217-047:237291:237678 [2] NCCL INFO NET/Plugin: Loaded collnet plugin SHARP (v8)
slurm-gb200-217-047:237291:237678 [2] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so
slurm-gb200-217-047:237291:237678 [2] NCCL INFO P2P plugin v8 IBext_v8
slurm-gb200-217-047:237291:237678 [2] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0
slurm-gb200-217-047:237291:237678 [2] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 1.
slurm-gb200-217-047:237291:237678 [2] NCCL INFO NET/IB : Using [0]ibp0:1/IB/SHARP [1]ibp1:1/IB/SHARP [2]ibp2:1/IB/SHARP [3]ibp3:1/IB/SHARP [RO]; OOB eth0:10.0.5.220<0>
slurm-gb200-217-047:237291:237678 [2] NCCL INFO PROFILER/Plugin: Could not find: libnccl-profiler.so.
slurm-gb200-217-047:237291:237678 [2] NCCL INFO Using network IBext_v8
slurm-gb200-217-027:234741:235131 [3] NCCL INFO DMA-BUF is available on GPU device 3
slurm-gb200-217-027:234740:235130 [2] NCCL INFO DMA-BUF is available on GPU device 2
slurm-gb200-217-027:234741:235131 [3] NCCL INFO ncclCommInitRank comm 0xb4ab52367d50 rank 3 nranks 8 cudaDev 3 nvmlDev 3 busId 1901000 commId 0xee13c4f1e7e030dc - Init START
slurm-gb200-217-027:234739:235132 [1] NCCL INFO DMA-BUF is available on GPU device 1
slurm-gb200-217-027:234740:235130 [2] NCCL INFO ncclCommInitRank comm 0xabb25a0df930 rank 2 nranks 8 cudaDev 2 nvmlDev 2 busId 1801000 commId 0xee13c4f1e7e030dc - Init START
slurm-gb200-217-027:234739:235132 [1] NCCL INFO ncclCommInitRank comm 0xbf66c0b18250 rank 1 nranks 8 cudaDev 1 nvmlDev 1 busId 901000 commId 0xee13c4f1e7e030dc - Init START
slurm-gb200-217-027:234739:235132 [1] NCCL INFO RAS client listening socket at ::1<28028>
slurm-gb200-217-027:234740:235130 [2] NCCL INFO RAS client listening socket at ::1<28028>
slurm-gb200-217-047:237290:237677 [1] NCCL INFO DMA-BUF is available on GPU device 1
slurm-gb200-217-047:237289:237675 [0] NCCL INFO DMA-BUF is available on GPU device 0
slurm-gb200-217-047:237290:237677 [1] NCCL INFO ncclCommInitRank comm 0xb82cf5a83890 rank 5 nranks 8 cudaDev 1 nvmlDev 1 busId 901000 commId 0xee13c4f1e7e030dc - Init START
slurm-gb200-217-047:237289:237675 [0] NCCL INFO ncclCommInitRank comm 0xbdcbb5305fc0 rank 4 nranks 8 cudaDev 0 nvmlDev 0 busId 801000 commId 0xee13c4f1e7e030dc - Init START
slurm-gb200-217-027:234741:235131 [3] NCCL INFO RAS client listening socket at ::1<28028>
slurm-gb200-217-047:237289:237675 [0] NCCL INFO RAS client listening socket at ::1<28028>
slurm-gb200-217-047:237292:237676 [3] NCCL INFO DMA-BUF is available on GPU device 3
slurm-gb200-217-047:237291:237678 [2] NCCL INFO DMA-BUF is available on GPU device 2
slurm-gb200-217-047:237292:237676 [3] NCCL INFO ncclCommInitRank comm 0xbcbd5fdbded0 rank 7 nranks 8 cudaDev 3 nvmlDev 3 busId 1901000 commId 0xee13c4f1e7e030dc - Init START
slurm-gb200-217-047:237291:237678 [2] NCCL INFO ncclCommInitRank comm 0xc73c5bd41e40 rank 6 nranks 8 cudaDev 2 nvmlDev 2 busId 1801000 commId 0xee13c4f1e7e030dc - Init START
slurm-gb200-217-027:234738:235129 [0] NCCL INFO RAS client listening socket at ::1<28028>
slurm-gb200-217-047:237290:237677 [1] NCCL INFO RAS client listening socket at ::1<28028>
slurm-gb200-217-047:237291:237678 [2] NCCL INFO RAS client listening socket at ::1<28028>
slurm-gb200-217-047:237292:237676 [3] NCCL INFO RAS client listening socket at ::1<28028>
slurm-gb200-217-047:237291:237678 [2] NCCL INFO Bootstrap timings total 0.001874 (create 0.000049, send 0.000236, recv 0.000857, ring 0.000277, delay 0.000001)
slurm-gb200-217-047:237292:237676 [3] NCCL INFO Bootstrap timings total 0.001987 (create 0.000058, send 0.000308, recv 0.000486, ring 0.000298, delay 0.000001)
slurm-gb200-217-027:234738:235129 [0] NCCL INFO Bootstrap timings total 0.150981 (create 0.000092, send 0.000167, recv 0.099449, ring 0.000395, delay 0.000001)
slurm-gb200-217-027:234739:235132 [1] NCCL INFO Bootstrap timings total 0.051531 (create 0.000051, send 0.000094, recv 0.000164, ring 0.050388, delay 0.000001)
slurm-gb200-217-027:234740:235130 [2] NCCL INFO Bootstrap timings total 0.051642 (create 0.000049, send 0.000100, recv 0.000104, ring 0.050388, delay 0.000001)
slurm-gb200-217-027:234741:235131 [3] NCCL INFO Bootstrap timings total 0.052601 (create 0.000062, send 0.000132, recv 0.037772, ring 0.013873, delay 0.000001)
slurm-gb200-217-047:237290:237677 [1] NCCL INFO Bootstrap timings total 0.018151 (create 0.000077, send 0.000318, recv 0.016776, ring 0.000449, delay 0.000001)
slurm-gb200-217-047:237289:237675 [0] NCCL INFO Bootstrap timings total 0.015634 (create 0.000060, send 0.000705, recv 0.000611, ring 0.013624, delay 0.000001)
slurm-gb200-217-047:237292:237676 [3] NCCL INFO MNNVL busId 0x1901000 fabric UUID 254db2329a6da67a.4641fa3fdb4d7484 cliqueId 0x7ffe state 3 healthMask 0xaa
slurm-gb200-217-047:237289:237675 [0] NCCL INFO MNNVL busId 0x801000 fabric UUID 254db2329a6da67a.4641fa3fdb4d7484 cliqueId 0x7ffe state 3 healthMask 0xaa
slurm-gb200-217-027:234741:235131 [3] NCCL INFO MNNVL busId 0x1901000 fabric UUID 254db2329a6da67a.4641fa3fdb4d7484 cliqueId 0x7ffe state 3 healthMask 0xaa
slurm-gb200-217-027:234740:235130 [2] NCCL INFO MNNVL busId 0x1801000 fabric UUID 254db2329a6da67a.4641fa3fdb4d7484 cliqueId 0x7ffe state 3 healthMask 0xaa
slurm-gb200-217-027:234738:235129 [0] NCCL INFO MNNVL busId 0x801000 fabric UUID 254db2329a6da67a.4641fa3fdb4d7484 cliqueId 0x7ffe state 3 healthMask 0xaa
slurm-gb200-217-047:237290:237677 [1] NCCL INFO MNNVL busId 0x901000 fabric UUID 254db2329a6da67a.4641fa3fdb4d7484 cliqueId 0x7ffe state 3 healthMask 0xaa
slurm-gb200-217-027:234739:235132 [1] NCCL INFO MNNVL busId 0x901000 fabric UUID 254db2329a6da67a.4641fa3fdb4d7484 cliqueId 0x7ffe state 3 healthMask 0xaa
slurm-gb200-217-047:237291:237678 [2] NCCL INFO MNNVL busId 0x1801000 fabric UUID 254db2329a6da67a.4641fa3fdb4d7484 cliqueId 0x7ffe state 3 healthMask 0xaa
slurm-gb200-217-047:237291:237678 [2] NCCL INFO MNNVL 1 cliqueId 7ffe cliqueSize 8 cliqueRank 6
slurm-gb200-217-047:237289:237675 [0] NCCL INFO MNNVL 1 cliqueId 7ffe cliqueSize 8 cliqueRank 4
slurm-gb200-217-047:237292:237676 [3] NCCL INFO MNNVL 1 cliqueId 7ffe cliqueSize 8 cliqueRank 7
slurm-gb200-217-047:237290:237677 [1] NCCL INFO MNNVL 1 cliqueId 7ffe cliqueSize 8 cliqueRank 5
slurm-gb200-217-027:234741:235131 [3] NCCL INFO MNNVL 1 cliqueId 7ffe cliqueSize 8 cliqueRank 3
slurm-gb200-217-027:234739:235132 [1] NCCL INFO MNNVL 1 cliqueId 7ffe cliqueSize 8 cliqueRank 1
slurm-gb200-217-027:234740:235130 [2] NCCL INFO MNNVL 1 cliqueId 7ffe cliqueSize 8 cliqueRank 2
slurm-gb200-217-027:234738:235129 [0] NCCL INFO MNNVL 1 cliqueId 7ffe cliqueSize 8 cliqueRank 0
slurm-gb200-217-047:237292:237676 [3] NCCL INFO Setting affinity for GPU 3 to ffff,ffffffff,ffffff00,00000000,00000000
slurm-gb200-217-047:237292:237676 [3] NCCL INFO NCCL_COLLNET_ENABLE set by environment to 0.
slurm-gb200-217-047:237292:237676 [3] NCCL INFO NCCL_NVLS_ENABLE set by environment to 0.
slurm-gb200-217-047:237291:237678 [2] NCCL INFO Setting affinity for GPU 2 to ffff,ffffffff,ffffff00,00000000,00000000
slurm-gb200-217-047:237291:237678 [2] NCCL INFO NCCL_COLLNET_ENABLE set by environment to 0.
slurm-gb200-217-047:237291:237678 [2] NCCL INFO NCCL_NVLS_ENABLE set by environment to 0.
slurm-gb200-217-027:234740:235130 [2] NCCL INFO Setting affinity for GPU 2 to ffff,ffffffff,ffffff00,00000000,00000000
slurm-gb200-217-027:234740:235130 [2] NCCL INFO NCCL_COLLNET_ENABLE set by environment to 0.
slurm-gb200-217-027:234740:235130 [2] NCCL INFO NCCL_NVLS_ENABLE set by environment to 0.
slurm-gb200-217-027:234741:235131 [3] NCCL INFO Setting affinity for GPU 3 to ffff,ffffffff,ffffff00,00000000,00000000
slurm-gb200-217-027:234741:235131 [3] NCCL INFO NCCL_COLLNET_ENABLE set by environment to 0.
slurm-gb200-217-047:237290:237677 [1] NCCL INFO Setting affinity for GPU 1 to ff,ffffffff,ffffffff
slurm-gb200-217-027:234741:235131 [3] NCCL INFO NCCL_NVLS_ENABLE set by environment to 0.
slurm-gb200-217-047:237290:237677 [1] NCCL INFO NCCL_COLLNET_ENABLE set by environment to 0.
slurm-gb200-217-047:237290:237677 [1] NCCL INFO NCCL_NVLS_ENABLE set by environment to 0.
slurm-gb200-217-027:234739:235132 [1] NCCL INFO Setting affinity for GPU 1 to ff,ffffffff,ffffffff
slurm-gb200-217-027:234738:235129 [0] NCCL INFO Setting affinity for GPU 0 to ff,ffffffff,ffffffff
slurm-gb200-217-027:234739:235132 [1] NCCL INFO NCCL_COLLNET_ENABLE set by environment to 0.
slurm-gb200-217-027:234739:235132 [1] NCCL INFO NCCL_NVLS_ENABLE set by environment to 0.
slurm-gb200-217-047:237289:237675 [0] NCCL INFO Setting affinity for GPU 0 to ff,ffffffff,ffffffff
slurm-gb200-217-027:234738:235129 [0] NCCL INFO NCCL_COLLNET_ENABLE set by environment to 0.
slurm-gb200-217-027:234738:235129 [0] NCCL INFO NCCL_NVLS_ENABLE set by environment to 0.
slurm-gb200-217-047:237289:237675 [0] NCCL INFO NCCL_COLLNET_ENABLE set by environment to 0.
slurm-gb200-217-047:237289:237675 [0] NCCL INFO NCCL_NVLS_ENABLE set by environment to 0.
slurm-gb200-217-047:237292:237676 [3] NCCL INFO comm 0xbcbd5fdbded0 rank 7 nRanks 8 nNodes 1 localRanks 8 localRank 7 MNNVL 1
slurm-gb200-217-027:234740:235130 [2] NCCL INFO comm 0xabb25a0df930 rank 2 nRanks 8 nNodes 1 localRanks 8 localRank 2 MNNVL 1
slurm-gb200-217-027:234741:235131 [3] NCCL INFO comm 0xb4ab52367d50 rank 3 nRanks 8 nNodes 1 localRanks 8 localRank 3 MNNVL 1
slurm-gb200-217-047:237291:237678 [2] NCCL INFO comm 0xc73c5bd41e40 rank 6 nRanks 8 nNodes 1 localRanks 8 localRank 6 MNNVL 1
slurm-gb200-217-047:237292:237676 [3] NCCL INFO Trees [0] -1/-1/-1->7->6 [1] -1/-1/-1->7->6 [2] -1/-1/-1->7->6 [3] -1/-1/-1->7->6 [4] -1/-1/-1->7->6 [5] -1/-1/-1->7->6 [6] -1/-1/-1->7->6 [7] -1/-1/-1->7->6 [8] -1/-1/-1->7->6 [9] -1/-1/-1->7->6 [10] -1/-1/-1->7->6 [11] -1/-1/-1->7->6 [12] -1/-1/-1->7->6 [13] -1/-1/-1->7->6 [14] -1/-1/-1->7->6 [15] -1/-1/-1->7->6 [16] -1/-1/-1->7->6 [17] -1/-1/-1->7->6 [18] -1/-1/-1->7->6 [19] -1/-1/-1->7->6 [20] -1/-1/-1->7->6 [21] -1/-1/-1->7->6 [22] -1/-1/-1->7->6 [23] -1/-1/-1->7->6 [24] -1/-1/-1->7->6 [25] -1/-1/-1->7->6 [26] -1/-1/-1->7->6 [27] -1/-1/-1->7->6 [28] -1/-1/-1->7->6 [29] -1/-1/-1->7->6 [30] -1/-1/-1->7->6 [31] -1/-1/-1->7->6
slurm-gb200-217-047:237292:237676 [3] NCCL INFO P2P Chunksize set to 524288
slurm-gb200-217-027:234738:235129 [0] NCCL INFO comm 0xc4b2133f8d20 rank 0 nRanks 8 nNodes 1 localRanks 8 localRank 0 MNNVL 1
slurm-gb200-217-027:234740:235130 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/-1/-1->2->1 [2] 3/-1/-1->2->1 [3] 3/-1/-1->2->1 [4] 3/-1/-1->2->1 [5] 3/-1/-1->2->1 [6] 3/-1/-1->2->1 [7] 3/-1/-1->2->1 [8] 3/-1/-1->2->1 [9] 3/-1/-1->2->1 [10] 3/-1/-1->2->1 [11] 3/-1/-1->2->1 [12] 3/-1/-1->2->1 [13] 3/-1/-1->2->1 [14] 3/-1/-1->2->1 [15] 3/-1/-1->2->1 [16] 3/-1/-1->2->1 [17] 3/-1/-1->2->1 [18] 3/-1/-1->2->1 [19] 3/-1/-1->2->1 [20] 3/-1/-1->2->1 [21] 3/-1/-1->2->1 [22] 3/-1/-1->2->1 [23] 3/-1/-1->2->1 [24] 3/-1/-1->2->1 [25] 3/-1/-1->2->1 [26] 3/-1/-1->2->1 [27] 3/-1/-1->2->1 [28] 3/-1/-1->2->1 [29] 3/-1/-1->2->1 [30] 3/-1/-1->2->1 [31] 3/-1/-1->2->1
slurm-gb200-217-027:234740:235130 [2] NCCL INFO P2P Chunksize set to 524288
slurm-gb200-217-027:234741:235131 [3] NCCL INFO Trees [0] 4/-1/-1->3->2 [1] 4/-1/-1->3->2 [2] 4/-1/-1->3->2 [3] 4/-1/-1->3->2 [4] 4/-1/-1->3->2 [5] 4/-1/-1->3->2 [6] 4/-1/-1->3->2 [7] 4/-1/-1->3->2 [8] 4/-1/-1->3->2 [9] 4/-1/-1->3->2 [10] 4/-1/-1->3->2 [11] 4/-1/-1->3->2 [12] 4/-1/-1->3->2 [13] 4/-1/-1->3->2 [14] 4/-1/-1->3->2 [15] 4/-1/-1->3->2 [16] 4/-1/-1->3->2 [17] 4/-1/-1->3->2 [18] 4/-1/-1->3->2 [19] 4/-1/-1->3->2 [20] 4/-1/-1->3->2 [21] 4/-1/-1->3->2 [22] 4/-1/-1->3->2 [23] 4/-1/-1->3->2 [24] 4/-1/-1->3->2 [25] 4/-1/-1->3->2 [26] 4/-1/-1->3->2 [27] 4/-1/-1->3->2 [28] 4/-1/-1->3->2 [29] 4/-1/-1->3->2 [30] 4/-1/-1->3->2 [31] 4/-1/-1->3->2
slurm-gb200-217-027:234741:235131 [3] NCCL INFO P2P Chunksize set to 524288
slurm-gb200-217-027:234738:235129 [0] NCCL INFO Channel 00/32 : 0 1 2 3 4 5 6 7
slurm-gb200-217-027:234738:235129 [0] NCCL INFO Channel 01/32 : 0 1 2 3 4 5 6 7
slurm-gb200-217-047:237291:237678 [2] NCCL INFO Trees [0] 7/-1/-1->6->5 [1] 7/-1/-1->6->5 [2] 7/-1/-1->6->5 [3] 7/-1/-1->6->5 [4] 7/-1/-1->6->5 [5] 7/-1/-1->6->5 [6] 7/-1/-1->6->5 [7] 7/-1/-1->6->5 [8] 7/-1/-1->6->5 [9] 7/-1/-1->6->5 [10] 7/-1/-1->6->5 [11] 7/-1/-1->6->5 [12] 7/-1/-1->6->5 [13] 7/-1/-1->6->5 [14] 7/-1/-1->6->5 [15] 7/-1/-1->6->5 [16] 7/-1/-1->6->5 [17] 7/-1/-1->6->5 [18] 7/-1/-1->6->5 [19] 7/-1/-1->6->5 [20] 7/-1/-1->6->5 [21] 7/-1/-1->6->5 [22] 7/-1/-1->6->5 [23] 7/-1/-1->6->5 [24] 7/-1/-1->6->5 [25] 7/-1/-1->6->5 [26] 7/-1/-1->6->5 [27] 7/-1/-1->6->5 [28] 7/-1/-1->6->5 [29] 7/-1/-1->6->5 [30] 7/-1/-1->6->5 [31] 7/-1/-1->6->5
slurm-gb200-217-047:237291:237678 [2] NCCL INFO P2P Chunksize set to 524288
slurm-gb200-217-027:234739:235132 [1] NCCL INFO comm 0xbf66c0b18250 rank 1 nRanks 8 nNodes 1 localRanks 8 localRank 1 MNNVL 1
slurm-gb200-217-047:237290:237677 [1] NCCL INFO comm 0xb82cf5a83890 rank 5 nRanks 8 nNodes 1 localRanks 8 localRank 5 MNNVL 1
slurm-gb200-217-027:234738:235129 [0] NCCL INFO Channel 02/32 : 0 1 2 3 4 5 6 7
slurm-gb200-217-027:234738:235129 [0] NCCL INFO Channel 03/32 : 0 1 2 3 4 5 6 7
slurm-gb200-217-027:234738:235129 [0] NCCL INFO Channel 04/32 : 0 1 2 3 4 5 6 7
slurm-gb200-217-027:234738:235129 [0] NCCL INFO Channel 05/32 : 0 1 2 3 4 5 6 7
slurm-gb200-217-027:234738:235129 [0] NCCL INFO Channel 06/32 : 0 1 2 3 4 5 6 7
slurm-gb200-217-027:234738:235129 [0] NCCL INFO Channel 07/32 : 0 1 2 3 4 5 6 7
slurm-gb200-217-047:237290:237677 [1] NCCL INFO Trees [0] 6/-1/-1->5->4 [1] 6/-1/-1->5->4 [2] 6/-1/-1->5->4 [3] 6/-1/-1->5->4 [4] 6/-1/-1->5->4 [5] 6/-1/-1->5->4 [6] 6/-1/-1->5->4 [7] 6/-1/-1->5->4 [8] 6/-1/-1->5->4 [9] 6/-1/-1->5->4 [10] 6/-1/-1->5->4 [11] 6/-1/-1->5->4 [12] 6/-1/-1->5->4 [13] 6/-1/-1->5->4 [14] 6/-1/-1->5->4 [15] 6/-1/-1->5->4 [16] 6/-1/-1->5->4 [17] 6/-1/-1->5->4 [18] 6/-1/-1->5->4 [19] 6/-1/-1->5->4 [20] 6/-1/-1->5->4 [21] 6/-1/-1->5->4 [22] 6/-1/-1->5->4 [23] 6/-1/-1->5->4 [24] 6/-1/-1->5->4 [25] 6/-1/-1->5->4 [26] 6/-1/-1->5->4 [27] 6/-1/-1->5->4 [28] 6/-1/-1->5->4 [29] 6/-1/-1->5->4 [30] 6/-1/-1->5->4 [31] 6/-1/-1->5->4
slurm-gb200-217-047:237290:237677 [1] NCCL INFO P2P Chunksize set to 524288
slurm-gb200-217-027:234738:235129 [0] NCCL INFO Channel 08/32 : 0 1 2 3 4 5 6 7
slurm-gb200-217-027:234738:235129 [0] NCCL INFO Channel 09/32 : 0 1 2 3 4 5 6 7
slurm-gb200-217-027:234738:235129 [0] NCCL INFO Channel 10/32 : 0 1 2 3 4 5 6 7
slurm-gb200-217-027:234738:235129 [0] NCCL INFO Channel 11/32 : 0 1 2 3 4 5 6 7
slurm-gb200-217-027:234738:235129 [0] NCCL INFO Channel 12/32 : 0 1 2 3 4 5 6 7
slurm-gb200-217-027:234738:235129 [0] NCCL INFO Channel 13/32 : 0 1 2 3 4 5 6 7
slurm-gb200-217-047:237289:237675 [0] NCCL INFO comm 0xbdcbb5305fc0 rank 4 nRanks 8 nNodes 1 localRanks 8 localRank 4 MNNVL 1
slurm-gb200-217-027:234739:235132 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0 [2] 2/-1/-1->1->0 [3] 2/-1/-1->1->0 [4] 2/-1/-1->1->0 [5] 2/-1/-1->1->0 [6] 2/-1/-1->1->0 [7] 2/-1/-1->1->0 [8] 2/-1/-1->1->0 [9] 2/-1/-1->1->0 [10] 2/-1/-1->1->0 [11] 2/-1/-1->1->0 [12] 2/-1/-1->1->0 [13] 2/-1/-1->1->0 [14] 2/-1/-1->1->0 [15] 2/-1/-1->1->0 [16] 2/-1/-1->1->0 [17] 2/-1/-1->1->0 [18] 2/-1/-1->1->0 [19] 2/-1/-1->1->0 [20] 2/-1/-1->1->0 [21] 2/-1/-1->1->0 [22] 2/-1/-1->1->0 [23] 2/-1/-1->1->0 [24] 2/-1/-1->1->0 [25] 2/-1/-1->1->0 [26] 2/-1/-1->1->0 [27] 2/-1/-1->1->0 [28] 2/-1/-1->1->0 [29] 2/-1/-1->1->0 [30] 2/-1/-1->1->0 [31] 2/-1/-1->1->0
slurm-gb200-217-027:234739:235132 [1] NCCL INFO P2P Chunksize set to 524288
slurm-gb200-217-047:237289:237675 [0] NCCL INFO Trees [0] 5/-1/-1->4->3 [1] 5/-1/-1->4->3 [2] 5/-1/-1->4->3 [3] 5/-1/-1->4->3 [4] 5/-1/-1->4->3 [5] 5/-1/-1->4->3 [6] 5/-1/-1->4->3 [7] 5/-1/-1->4->3 [8] 5/-1/-1->4->3 [9] 5/-1/-1->4->3 [10] 5/-1/-1->4->3 [11] 5/-1/-1->4->3 [12] 5/-1/-1->4->3 [13] 5/-1/-1->4->3 [14] 5/-1/-1->4->3 [15] 5/-1/-1->4->3 [16] 5/-1/-1->4->3 [17] 5/-1/-1->4->3 [18] 5/-1/-1->4->3 [19] 5/-1/-1->4->3 [20] 5/-1/-1->4->3 [21] 5/-1/-1->4->3 [22] 5/-1/-1->4->3 [23] 5/-1/-1->4->3 [24] 5/-1/-1->4->3 [25] 5/-1/-1->4->3 [26] 5/-1/-1->4->3 [27] 5/-1/-1->4->3 [28] 5/-1/-1->4->3 [29] 5/-1/-1->4->3 [30] 5/-1/-1->4->3 [31] 5/-1/-1->4->3
slurm-gb200-217-047:237289:237675 [0] NCCL INFO P2P Chunksize set to 524288
slurm-gb200-217-027:234738:235129 [0] NCCL INFO Channel 14/32 : 0 1 2 3 4 5 6 7
slurm-gb200-217-027:234738:235129 [0] NCCL INFO Channel 15/32 : 0 1 2 3 4 5 6 7
slurm-gb200-217-027:234738:235129 [0] NCCL INFO Channel 16/32 : 0 1 2 3 4 5 6 7
slurm-gb200-217-027:234738:235129 [0] NCCL INFO Channel 17/32 : 0 1 2 3 4 5 6 7
slurm-gb200-217-027:234738:235129 [0] NCCL INFO Channel 18/32 : 0 1 2 3 4 5 6 7
slurm-gb200-217-027:234738:235129 [0] NCCL INFO Channel 19/32 : 0 1 2 3 4 5 6 7
slurm-gb200-217-027:234738:235129 [0] NCCL INFO Channel 20/32 : 0 1 2 3 4 5 6 7
slurm-gb200-217-027:234738:235129 [0] NCCL INFO Channel 21/32 : 0 1 2 3 4 5 6 7
slurm-gb200-217-027:234738:235129 [0] NCCL INFO Channel 22/32 : 0 1 2 3 4 5 6 7
slurm-gb200-217-047:237291:237700 [2] NCCL INFO [Proxy Service] Device 2 CPU core 140
slurm-gb200-217-027:234738:235129 [0] NCCL INFO Channel 23/32 : 0 1 2 3 4 5 6 7
slurm-gb200-217-027:234738:235129 [0] NCCL INFO Channel 24/32 : 0 1 2 3 4 5 6 7
slurm-gb200-217-027:234738:235129 [0] NCCL INFO Channel 25/32 : 0 1 2 3 4 5 6 7
slurm-gb200-217-027:234738:235129 [0] NCCL INFO Channel 26/32 : 0 1 2 3 4 5 6 7
slurm-gb200-217-027:234738:235129 [0] NCCL INFO Channel 27/32 : 0 1 2 3 4 5 6 7
slurm-gb200-217-027:234738:235129 [0] NCCL INFO Channel 28/32 : 0 1 2 3 4 5 6 7
slurm-gb200-217-047:237292:237699 [3] NCCL INFO [Proxy Service] Device 3 CPU core 79
slurm-gb200-217-027:234738:235129 [0] NCCL INFO Channel 29/32 : 0 1 2 3 4 5 6 7
slurm-gb200-217-027:234738:235129 [0] NCCL INFO Channel 30/32 : 0 1 2 3 4 5 6 7
slurm-gb200-217-027:234738:235129 [0] NCCL INFO Channel 31/32 : 0 1 2 3 4 5 6 7
slurm-gb200-217-047:237291:237702 [2] NCCL INFO [Proxy Service UDS] Device 2 CPU core 73
slurm-gb200-217-027:234738:235129 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 [4] 1/-1/-1->0->-1 [5] 1/-1/-1->0->-1 [6] 1/-1/-1->0->-1 [7] 1/-1/-1->0->-1 [8] 1/-1/-1->0->-1 [9] 1/-1/-1->0->-1 [10] 1/-1/-1->0->-1 [11] 1/-1/-1->0->-1 [12] 1/-1/-1->0->-1 [13] 1/-1/-1->0->-1 [14] 1/-1/-1->0->-1 [15] 1/-1/-1->0->-1 [16] 1/-1/-1->0->-1 [17] 1/-1/-1->0->-1 [18] 1/-1/-1->0->-1 [19] 1/-1/-1->0->-1 [20] 1/-1/-1->0->-1 [21] 1/-1/-1->0->-1 [22] 1/-1/-1->0->-1 [23] 1/-1/-1->0->-1 [24] 1/-1/-1->0->-1 [25] 1/-1/-1->0->-1 [26] 1/-1/-1->0->-1 [27] 1/-1/-1->0->-1 [28] 1/-1/-1->0->-1 [29] 1/-1/-1->0->-1 [30] 1/-1/-1->0->-1 [31] 1/-1/-1->0->-1
slurm-gb200-217-027:234738:235129 [0] NCCL INFO P2P Chunksize set to 524288
slurm-gb200-217-047:237292:237701 [3] NCCL INFO [Proxy Service UDS] Device 3 CPU core 80
slurm-gb200-217-027:234738:235129 [0] NCCL INFO Check P2P Type intraNodeP2pSupport 1 directMode 0
slurm-gb200-217-027:234740:235153 [2] NCCL INFO [Proxy Service] Device 2 CPU core 112
slurm-gb200-217-027:234741:235154 [3] NCCL INFO [Proxy Service] Device 3 CPU core 126
slurm-gb200-217-027:234741:235155 [3] NCCL INFO [Proxy Service UDS] Device 3 CPU core 129
slurm-gb200-217-027:234740:235156 [2] NCCL INFO [Proxy Service UDS] Device 2 CPU core 114
slurm-gb200-217-047:237289:237704 [0] NCCL INFO [Proxy Service] Device 0 CPU core 4
slurm-gb200-217-047:237290:237706 [1] NCCL INFO [Proxy Service UDS] Device 1 CPU core 60
slurm-gb200-217-027:234739:235157 [1] NCCL INFO [Proxy Service] Device 1 CPU core 8
slurm-gb200-217-047:237290:237703 [1] NCCL INFO [Proxy Service] Device 1 CPU core 2
slurm-gb200-217-027:234738:235158 [0] NCCL INFO [Proxy Service] Device 0 CPU core 8
slurm-gb200-217-047:237289:237705 [0] NCCL INFO [Proxy Service UDS] Device 0 CPU core 6
slurm-gb200-217-027:234739:235159 [1] NCCL INFO [Proxy Service UDS] Device 1 CPU core 45
slurm-gb200-217-027:234738:235160 [0] NCCL INFO [Proxy Service UDS] Device 0 CPU core 45
slurm-gb200-217-027:234741:235131 [3] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512
slurm-gb200-217-027:234741:235131 [3] NCCL INFO 32 coll channels, 32 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer
slurm-gb200-217-047:237291:237678 [2] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512
slurm-gb200-217-047:237291:237678 [2] NCCL INFO 32 coll channels, 32 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer
slurm-gb200-217-027:234739:235132 [1] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512
slurm-gb200-217-027:234739:235132 [1] NCCL INFO 32 coll channels, 32 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer
slurm-gb200-217-047:237292:237676 [3] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512
slurm-gb200-217-047:237292:237676 [3] NCCL INFO 32 coll channels, 32 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer
slurm-gb200-217-027:234740:235130 [2] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512
slurm-gb200-217-027:234740:235130 [2] NCCL INFO 32 coll channels, 32 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer
slurm-gb200-217-047:237290:237677 [1] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512
slurm-gb200-217-047:237290:237677 [1] NCCL INFO 32 coll channels, 32 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer
slurm-gb200-217-047:237289:237675 [0] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512
slurm-gb200-217-047:237289:237675 [0] NCCL INFO 32 coll channels, 32 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer
slurm-gb200-217-027:234738:235129 [0] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512
slurm-gb200-217-027:234738:235129 [0] NCCL INFO 32 coll channels, 32 collnet channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer
slurm-gb200-217-027:234738:235129 [0] NCCL INFO CC Off, workFifoBytes 1048576
slurm-gb200-217-027:234740:235130 [2] NCCL INFO TUNER/Plugin: Failed to find ncclTunerPlugin_v4 symbol.
slurm-gb200-217-027:234740:235130 [2] NCCL INFO TUNER/Plugin: Failed to find ncclTunerPlugin_v3 symbol.
slurm-gb200-217-027:234740:235130 [2] NCCL INFO TUNER/Plugin: Failed to find ncclTunerPlugin_v2 symbol, using internal tuner instead.
slurm-gb200-217-027:234740:235130 [2] NCCL INFO ncclCommInitRank comm 0xabb25a0df930 rank 2 nranks 8 cudaDev 2 nvmlDev 2 busId 1801000 commId 0xee13c4f1e7e030dc - Init COMPLETE
slurm-gb200-217-027:234740:235130 [2] NCCL INFO Init timings - ncclCommInitRank: rank 2 nranks 8 total 0.94 (kernels 0.09, alloc 0.16, bootstrap 0.05, allgathers 0.00, topo 0.52, graphs 0.01, connections 0.09, rest 0.01)
slurm-gb200-217-027:234741:235131 [3] NCCL INFO TUNER/Plugin: Failed to find ncclTunerPlugin_v4 symbol.
slurm-gb200-217-027:234741:235131 [3] NCCL INFO TUNER/Plugin: Failed to find ncclTunerPlugin_v3 symbol.
slurm-gb200-217-027:234741:235131 [3] NCCL INFO TUNER/Plugin: Failed to find ncclTunerPlugin_v2 symbol, using internal tuner instead.
slurm-gb200-217-027:234741:235131 [3] NCCL INFO ncclCommInitRank comm 0xb4ab52367d50 rank 3 nranks 8 cudaDev 3 nvmlDev 3 busId 1901000 commId 0xee13c4f1e7e030dc - Init COMPLETE
slurm-gb200-217-027:234741:235131 [3] NCCL INFO Init timings - ncclCommInitRank: rank 3 nranks 8 total 0.93 (kernels 0.09, alloc 0.16, bootstrap 0.05, allgathers 0.00, topo 0.52, graphs 0.01, connections 0.08, rest 0.01)
slurm-gb200-217-047:237291:237678 [2] NCCL INFO TUNER/Plugin: Failed to find ncclTunerPlugin_v4 symbol.
slurm-gb200-217-047:237291:237678 [2] NCCL INFO TUNER/Plugin: Failed to find ncclTunerPlugin_v3 symbol.
slurm-gb200-217-047:237292:237676 [3] NCCL INFO TUNER/Plugin: Failed to find ncclTunerPlugin_v4 symbol.
slurm-gb200-217-047:237292:237676 [3] NCCL INFO TUNER/Plugin: Failed to find ncclTunerPlugin_v3 symbol.
slurm-gb200-217-047:237292:237676 [3] NCCL INFO TUNER/Plugin: Failed to find ncclTunerPlugin_v2 symbol, using internal tuner instead.
slurm-gb200-217-047:237291:237678 [2] NCCL INFO TUNER/Plugin: Failed to find ncclTunerPlugin_v2 symbol, using internal tuner instead.
slurm-gb200-217-047:237291:237678 [2] NCCL INFO ncclCommInitRank comm 0xc73c5bd41e40 rank 6 nranks 8 cudaDev 2 nvmlDev 2 busId 1801000 commId 0xee13c4f1e7e030dc - Init COMPLETE
slurm-gb200-217-047:237291:237678 [2] NCCL INFO Init timings - ncclCommInitRank: rank 6 nranks 8 total 0.88 (kernels 0.10, alloc 0.14, bootstrap 0.00, allgathers 0.00, topo 0.52, graphs 0.01, connections 0.09, rest 0.01)
slurm-gb200-217-027:234739:235132 [1] NCCL INFO TUNER/Plugin: Failed to find ncclTunerPlugin_v4 symbol.
slurm-gb200-217-027:234739:235132 [1] NCCL INFO TUNER/Plugin: Failed to find ncclTunerPlugin_v3 symbol.
slurm-gb200-217-027:234739:235132 [1] NCCL INFO TUNER/Plugin: Failed to find ncclTunerPlugin_v2 symbol, using internal tuner instead.
slurm-gb200-217-047:237292:237676 [3] NCCL INFO ncclCommInitRank comm 0xbcbd5fdbded0 rank 7 nranks 8 cudaDev 3 nvmlDev 3 busId 1901000 commId 0xee13c4f1e7e030dc - Init COMPLETE
slurm-gb200-217-047:237292:237676 [3] NCCL INFO Init timings - ncclCommInitRank: rank 7 nranks 8 total 0.88 (kernels 0.09, alloc 0.16, bootstrap 0.00, allgathers 0.00, topo 0.52, graphs 0.01, connections 0.09, rest 0.01)
slurm-gb200-217-027:234739:235132 [1] NCCL INFO ncclCommInitRank comm 0xbf66c0b18250 rank 1 nranks 8 cudaDev 1 nvmlDev 1 busId 901000 commId 0xee13c4f1e7e030dc - Init COMPLETE
slurm-gb200-217-027:234739:235132 [1] NCCL INFO Init timings - ncclCommInitRank: rank 1 nranks 8 total 0.93 (kernels 0.09, alloc 0.15, bootstrap 0.05, allgathers 0.00, topo 0.52, graphs 0.01, connections 0.09, rest 0.01)
slurm-gb200-217-047:237289:237675 [0] NCCL INFO TUNER/Plugin: Failed to find ncclTunerPlugin_v4 symbol.
slurm-gb200-217-027:234738:235129 [0] NCCL INFO TUNER/Plugin: Failed to find ncclTunerPlugin_v4 symbol.
slurm-gb200-217-027:234738:235129 [0] NCCL INFO TUNER/Plugin: Failed to find ncclTunerPlugin_v3 symbol.
slurm-gb200-217-027:234738:235129 [0] NCCL INFO TUNER/Plugin: Failed to find ncclTunerPlugin_v2 symbol, using internal tuner instead.
slurm-gb200-217-047:237290:237677 [1] NCCL INFO TUNER/Plugin: Failed to find ncclTunerPlugin_v4 symbol.
slurm-gb200-217-047:237290:237677 [1] NCCL INFO TUNER/Plugin: Failed to find ncclTunerPlugin_v3 symbol.
slurm-gb200-217-027:234738:235129 [0] NCCL INFO ncclCommInitRank comm 0xc4b2133f8d20 rank 0 nranks 8 cudaDev 0 nvmlDev 0 busId 801000 commId 0xee13c4f1e7e030dc - Init COMPLETE
slurm-gb200-217-047:237289:237675 [0] NCCL INFO TUNER/Plugin: Failed to find ncclTunerPlugin_v3 symbol.
slurm-gb200-217-047:237289:237675 [0] NCCL INFO TUNER/Plugin: Failed to find ncclTunerPlugin_v2 symbol, using internal tuner instead.
slurm-gb200-217-047:237289:237675 [0] NCCL INFO ncclCommInitRank comm 0xbdcbb5305fc0 rank 4 nranks 8 cudaDev 0 nvmlDev 0 busId 801000 commId 0xee13c4f1e7e030dc - Init COMPLETE
slurm-gb200-217-027:234738:235129 [0] NCCL INFO Init timings - ncclCommInitRank: rank 0 nranks 8 total 0.97 (kernels 0.09, alloc 0.10, bootstrap 0.15, allgathers 0.00, topo 0.52, graphs 0.01, connections 0.09, rest 0.00)
slurm-gb200-217-047:237290:237677 [1] NCCL INFO TUNER/Plugin: Failed to find ncclTunerPlugin_v2 symbol, using internal tuner instead.
slurm-gb200-217-047:237290:237677 [1] NCCL INFO ncclCommInitRank comm 0xb82cf5a83890 rank 5 nranks 8 cudaDev 1 nvmlDev 1 busId 901000 commId 0xee13c4f1e7e030dc - Init COMPLETE
slurm-gb200-217-047:237290:237677 [1] NCCL INFO Init timings - ncclCommInitRank: rank 5 nranks 8 total 0.88 (kernels 0.09, alloc 0.14, bootstrap 0.02, allgathers 0.00, topo 0.52, graphs 0.01, connections 0.09, rest 0.01)
slurm-gb200-217-047:237289:237675 [0] NCCL INFO Init timings - ncclCommInitRank: rank 4 nranks 8 total 0.89 (kernels 0.09, alloc 0.15, bootstrap 0.02, allgathers 0.00, topo 0.52, graphs 0.01, connections 0.09, rest 0.00)
#
# out-of-place in-place
# size count type redop root time algbw busbw #wrong time algbw busbw #wrong
# (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s)
slurm-gb200-217-027:234741:235161 [3] NCCL INFO Channel 00/0 : 3[3] -> 4[0] via P2P/MNNVL
slurm-gb200-217-027:234740:235162 [2] NCCL INFO Channel 00/0 : 2[2] -> 3[3] via P2P/MNNVL
slurm-gb200-217-027:234740:235162 [2] NCCL INFO Channel 01/0 : 2[2] -> 3[3] via P2P/MNNVL
slurm-gb200-217-027:234741:235161 [3] NCCL INFO Channel 01/0 : 3[3] -> 4[0] via P2P/MNNVL
slurm-gb200-217-027:234741:235161 [3] NCCL INFO Channel 02/0 : 3[3] -> 4[0] via P2P/MNNVL
slurm-gb200-217-027:234740:235162 [2] NCCL INFO Channel 02/0 : 2[2] -> 3[3] via P2P/MNNVL
slurm-gb200-217-027:234741:235161 [3] NCCL INFO Channel 03/0 : 3[3] -> 4[0] via P2P/MNNVL
slurm-gb200-217-027:234740:235162 [2] NCCL INFO Channel 03/0 : 2[2] -> 3[3] via P2P/MNNVL
slurm-gb200-217-027:234741:235161 [3] NCCL INFO Channel 04/0 : 3[3] -> 4[0] via P2P/MNNVL
slurm-gb200-217-027:234740:235162 [2] NCCL INFO Channel 04/0 : 2[2] -> 3[3] via P2P/MNNVL
slurm-gb200-217-027:234741:235161 [3] NCCL INFO Channel 05/0 : 3[3] -> 4[0] via P2P/MNNVL
slurm-gb200-217-027:234740:235162 [2] NCCL INFO Channel 05/0 : 2[2] -> 3[3] via P2P/MNNVL
slurm-gb200-217-027:234741:235161 [3] NCCL INFO Channel 06/0 : 3[3] -> 4[0] via P2P/MNNVL
slurm-gb200-217-027:234740:235162 [2] NCCL INFO Channel 06/0 : 2[2] -> 3[3] via P2P/MNNVL
slurm-gb200-217-027:234741:235161 [3] NCCL INFO Channel 07/0 : 3[3] -> 4[0] via P2P/MNNVL
slurm-gb200-217-027:234740:235162 [2] NCCL INFO Channel 07/0 : 2[2] -> 3[3] via P2P/MNNVL
slurm-gb200-217-027:234740:235162 [2] NCCL INFO Channel 08/0 : 2[2] -> 3[3] via P2P/MNNVL
slurm-gb200-217-027:234741:235161 [3] NCCL INFO Channel 08/0 : 3[3] -> 4[0] via P2P/MNNVL
slurm-gb200-217-027:234741:235161 [3] NCCL INFO Channel 09/0 : 3[3] -> 4[0] via P2P/MNNVL
slurm-gb200-217-027:234740:235162 [2] NCCL INFO Channel 09/0 : 2[2] -> 3[3] via P2P/MNNVL
slurm-gb200-217-027:234741:235161 [3] NCCL INFO Channel 10/0 : 3[3] -> 4[0] via P2P/MNNVL
slurm-gb200-217-027:234740:235162 [2] NCCL INFO Channel 10/0 : 2[2] -> 3[3] via P2P/MNNVL
slurm-gb200-217-027:234741:235161 [3] NCCL INFO Channel 11/0 : 3[3] -> 4[0] via P2P/MNNVL
slurm-gb200-217-027:234740:235162 [2] NCCL INFO Channel 11/0 : 2[2] -> 3[3] via P2P/MNNVL
slurm-gb200-217-027:234741:235161 [3] NCCL INFO Channel 12/0 : 3[3] -> 4[0] via P2P/MNNVL
slurm-gb200-217-027:234740:235162 [2] NCCL INFO Channel 12/0 : 2[2] -> 3[3] via P2P/MNNVL
slurm-gb200-217-027:234741:235161 [3] NCCL INFO Channel 13/0 : 3[3] -> 4[0] via P2P/MNNVL
slurm-gb200-217-027:234740:235162 [2] NCCL INFO Channel 13/0 : 2[2] -> 3[3] via P2P/MNNVL
slurm-gb200-217-027:234741:235161 [3] NCCL INFO Channel 14/0 : 3[3] -> 4[0] via P2P/MNNVL
slurm-gb200-217-027:234740:235162 [2] NCCL INFO Channel 14/0 : 2[2] -> 3[3] via P2P/MNNVL
slurm-gb200-217-027:234741:235161 [3] NCCL INFO Channel 15/0 : 3[3] -> 4[0] via P2P/MNNVL
slurm-gb200-217-027:234740:235162 [2] NCCL INFO Channel 15/0 : 2[2] -> 3[3] via P2P/MNNVL
slurm-gb200-217-027:234741:235161 [3] NCCL INFO Channel 16/0 : 3[3] -> 4[0] via P2P/MNNVL
slurm-gb200-217-027:234740:235162 [2] NCCL INFO Channel 16/0 : 2[2] -> 3[3] via P2P/MNNVL
slurm-gb200-217-027:234741:235161 [3] NCCL INFO Channel 17/0 : 3[3] -> 4[0] via P2P/MNNVL
slurm-gb200-217-027:234740:235162 [2] NCCL INFO Channel 17/0 : 2[2] -> 3[3] via P2P/MNNVL
slurm-gb200-217-027:234741:235161 [3] NCCL INFO Channel 18/0 : 3[3] -> 4[0] via P2P/MNNVL
slurm-gb200-217-027:234740:235162 [2] NCCL INFO Channel 18/0 : 2[2] -> 3[3] via P2P/MNNVL
slurm-gb200-217-027:234741:235161 [3] NCCL INFO Channel 19/0 : 3[3] -> 4[0] via P2P/MNNVL
slurm-gb200-217-027:234740:235162 [2] NCCL INFO Channel 19/0 : 2[2] -> 3[3] via P2P/MNNVL
slurm-gb200-217-027:234741:235161 [3] NCCL INFO Channel 20/0 : 3[3] -> 4[0] via P2P/MNNVL
slurm-gb200-217-027:234740:235162 [2] NCCL INFO Channel 20/0 : 2[2] -> 3[3] via P2P/MNNVL
slurm-gb200-217-027:234741:235161 [3] NCCL INFO Channel 21/0 : 3[3] -> 4[0] via P2P/MNNVL
slurm-gb200-217-027:234740:235162 [2] NCCL INFO Channel 21/0 : 2[2] -> 3[3] via P2P/MNNVL
slurm-gb200-217-027:234741:235161 [3] NCCL INFO Channel 22/0 : 3[3] -> 4[0] via P2P/MNNVL
slurm-gb200-217-027:234740:235162 [2] NCCL INFO Channel 22/0 : 2[2] -> 3[3] via P2P/MNNVL
slurm-gb200-217-027:234741:235161 [3] NCCL INFO Channel 23/0 : 3[3] -> 4[0] via P2P/MNNVL
slurm-gb200-217-027:234740:235162 [2] NCCL INFO Channel 23/0 : 2[2] -> 3[3] via P2P/MNNVL
slurm-gb200-217-027:234741:235161 [3] NCCL INFO Channel 24/0 : 3[3] -> 4[0] via P2P/MNNVL
slurm-gb200-217-027:234740:235162 [2] NCCL INFO Channel 24/0 : 2[2] -> 3[3] via P2P/MNNVL
slurm-gb200-217-027:234741:235161 [3] NCCL INFO Channel 25/0 : 3[3] -> 4[0] via P2P/MNNVL
slurm-gb200-217-027:234740:235162 [2] NCCL INFO Channel 25/0 : 2[2] -> 3[3] via P2P/MNNVL
slurm-gb200-217-027:234740:235162 [2] NCCL INFO Channel 26/0 : 2[2] -> 3[3] via P2P/MNNVL
slurm-gb200-217-027:234741:235161 [3] NCCL INFO Channel 26/0 : 3[3] -> 4[0] via P2P/MNNVL
slurm-gb200-217-027:234740:235162 [2] NCCL INFO Channel 27/0 : 2[2] -> 3[3] via P2P/MNNVL
slurm-gb200-217-027:234741:235161 [3] NCCL INFO Channel 27/0 : 3[3] -> 4[0] via P2P/MNNVL
slurm-gb200-217-027:234740:235162 [2] NCCL INFO Channel 28/0 : 2[2] -> 3[3] via P2P/MNNVL
slurm-gb200-217-027:234741:235161 [3] NCCL INFO Channel 28/0 : 3[3] -> 4[0] via P2P/MNNVL
slurm-gb200-217-027:234740:235162 [2] NCCL INFO Channel 29/0 : 2[2] -> 3[3] via P2P/MNNVL
slurm-gb200-217-027:234741:235161 [3] NCCL INFO Channel 29/0 : 3[3] -> 4[0] via P2P/MNNVL
slurm-gb200-217-027:234740:235162 [2] NCCL INFO Channel 30/0 : 2[2] -> 3[3] via P2P/MNNVL
slurm-gb200-217-027:234741:235161 [3] NCCL INFO Channel 30/0 : 3[3] -> 4[0] via P2P/MNNVL
slurm-gb200-217-027:234740:235162 [2] NCCL INFO Channel 31/0 : 2[2] -> 3[3] via P2P/MNNVL
slurm-gb200-217-027:234741:235161 [3] NCCL INFO Channel 31/0 : 3[3] -> 4[0] via P2P/MNNVL
slurm-gb200-217-047:237292:237707 [3] NCCL INFO Channel 00/0 : 7[3] -> 0[0] via P2P/MNNVL
slurm-gb200-217-047:237289:237709 [0] NCCL INFO Channel 00/0 : 4[0] -> 5[1] via P2P/MNNVL
slurm-gb200-217-047:237290:237710 [1] NCCL INFO Channel 00/0 : 5[1] -> 6[2] via P2P/MNNVL
slurm-gb200-217-047:237292:237707 [3] NCCL INFO Channel 01/0 : 7[3] -> 0[0] via P2P/MNNVL
slurm-gb200-217-047:237289:237709 [0] NCCL INFO Channel 01/0 : 4[0] -> 5[1] via P2P/MNNVL
slurm-gb200-217-047:237290:237710 [1] NCCL INFO Channel 01/0 : 5[1] -> 6[2] via P2P/MNNVL
slurm-gb200-217-047:237292:237707 [3] NCCL INFO Channel 02/0 : 7[3] -> 0[0] via P2P/MNNVL
slurm-gb200-217-047:237289:237709 [0] NCCL INFO Channel 02/0 : 4[0] -> 5[1] via P2P/MNNVL
slurm-gb200-217-047:237292:237707 [3] NCCL INFO Channel 03/0 : 7[3] -> 0[0] via P2P/MNNVL
slurm-gb200-217-047:237289:237709 [0] NCCL INFO Channel 03/0 : 4[0] -> 5[1] via P2P/MNNVL
slurm-gb200-217-047:237290:237710 [1] NCCL INFO Channel 02/0 : 5[1] -> 6[2] via P2P/MNNVL
slurm-gb200-217-047:237291:237708 [2] NCCL INFO Channel 00/0 : 6[2] -> 7[3] via P2P/MNNVL
slurm-gb200-217-047:237291:237708 [2] NCCL INFO Channel 01/0 : 6[2] -> 7[3] via P2P/MNNVL
slurm-gb200-217-047:237289:237709 [0] NCCL INFO Channel 04/0 : 4[0] -> 5[1] via P2P/MNNVL
slurm-gb200-217-047:237292:237707 [3] NCCL INFO Channel 04/0 : 7[3] -> 0[0] via P2P/MNNVL
slurm-gb200-217-047:237290:237710 [1] NCCL INFO Channel 03/0 : 5[1] -> 6[2] via P2P/MNNVL
slurm-gb200-217-047:237291:237708 [2] NCCL INFO Channel 02/0 : 6[2] -> 7[3] via P2P/MNNVL
slurm-gb200-217-047:237292:237707 [3] NCCL INFO Channel 05/0 : 7[3] -> 0[0] via P2P/MNNVL
slurm-gb200-217-047:237290:237710 [1] NCCL INFO Channel 04/0 : 5[1] -> 6[2] via P2P/MNNVL
slurm-gb200-217-047:237289:237709 [0] NCCL INFO Channel 05/0 : 4[0] -> 5[1] via P2P/MNNVL
slurm-gb200-217-047:237291:237708 [2] NCCL INFO Channel 03/0 : 6[2] -> 7[3] via P2P/MNNVL
slurm-gb200-217-047:237290:237710 [1] NCCL INFO Channel 05/0 : 5[1] -> 6[2] via P2P/MNNVL
slurm-gb200-217-047:237289:237709 [0] NCCL INFO Channel 06/0 : 4[0] -> 5[1] via P2P/MNNVL
slurm-gb200-217-047:237292:237707 [3] NCCL INFO Channel 06/0 : 7[3] -> 0[0] via P2P/MNNVL
slurm-gb200-217-047:237291:237708 [2] NCCL INFO Channel 04/0 : 6[2] -> 7[3] via P2P/MNNVL
slurm-gb200-217-047:237292:237707 [3] NCCL INFO Channel 07/0 : 7[3] -> 0[0] via P2P/MNNVL
slurm-gb200-217-047:237289:237709 [0] NCCL INFO Channel 07/0 : 4[0] -> 5[1] via P2P/MNNVL
slurm-gb200-217-047:237290:237710 [1] NCCL INFO Channel 06/0 : 5[1] -> 6[2] via P2P/MNNVL
slurm-gb200-217-047:237291:237708 [2] NCCL INFO Channel 05/0 : 6[2] -> 7[3] via P2P/MNNVL
slurm-gb200-217-047:237290:237710 [1] NCCL INFO Channel 07/0 : 5[1] -> 6[2] via P2P/MNNVL
slurm-gb200-217-047:237289:237709 [0] NCCL INFO Channel 08/0 : 4[0] -> 5[1] via P2P/MNNVL
slurm-gb200-217-047:237292:237707 [3] NCCL INFO Channel 08/0 : 7[3] -> 0[0] via P2P/MNNVL
slurm-gb200-217-047:237291:237708 [2] NCCL INFO Channel 06/0 : 6[2] -> 7[3] via P2P/MNNVL
slurm-gb200-217-047:237289:237709 [0] NCCL INFO Channel 09/0 : 4[0] -> 5[1] via P2P/MNNVL
slurm-gb200-217-047:237292:237707 [3] NCCL INFO Channel 09/0 : 7[3] -> 0[0] via P2P/MNNVL
slurm-gb200-217-047:237291:237708 [2] NCCL INFO Channel 07/0 : 6[2] -> 7[3] via P2P/MNNVL
slurm-gb200-217-047:237290:237710 [1] NCCL INFO Channel 08/0 : 5[1] -> 6[2] via P2P/MNNVL
slurm-gb200-217-047:237289:237709 [0] NCCL INFO Channel 10/0 : 4[0] -> 5[1] via P2P/MNNVL
slurm-gb200-217-047:237290:237710 [1] NCCL INFO Channel 09/0 : 5[1] -> 6[2] via P2P/MNNVL
slurm-gb200-217-047:237291:237708 [2] NCCL INFO Channel 08/0 : 6[2] -> 7[3] via P2P/MNNVL
slurm-gb200-217-047:237292:237707 [3] NCCL INFO Channel 10/0 : 7[3] -> 0[0] via P2P/MNNVL
slurm-gb200-217-047:237290:237710 [1] NCCL INFO Channel 10/0 : 5[1] -> 6[2] via P2P/MNNVL
slurm-gb200-217-047:237291:237708 [2] NCCL INFO Channel 09/0 : 6[2] -> 7[3] via P2P/MNNVL
slurm-gb200-217-047:237289:237709 [0] NCCL INFO Channel 11/0 : 4[0] -> 5[1] via P2P/MNNVL
slurm-gb200-217-047:237292:237707 [3] NCCL INFO Channel 11/0 : 7[3] -> 0[0] via P2P/MNNVL
slurm-gb200-217-047:237289:237709 [0] NCCL INFO Channel 12/0 : 4[0] -> 5[1] via P2P/MNNVL
slurm-gb200-217-047:237291:237708 [2] NCCL INFO Channel 10/0 : 6[2] -> 7[3] via P2P/MNNVL
slurm-gb200-217-047:237290:237710 [1] NCCL INFO Channel 11/0 : 5[1] -> 6[2] via P2P/MNNVL
slurm-gb200-217-047:237292:237707 [3] NCCL INFO Channel 12/0 : 7[3] -> 0[0] via P2P/MNNVL
slurm-gb200-217-047:237291:237708 [2] NCCL INFO Channel 11/0 : 6[2] -> 7[3] via P2P/MNNVL
slurm-gb200-217-047:237289:237709 [0] NCCL INFO Channel 13/0 : 4[0] -> 5[1] via P2P/MNNVL
slurm-gb200-217-047:237290:237710 [1] NCCL INFO Channel 12/0 : 5[1] -> 6[2] via P2P/MNNVL
slurm-gb200-217-047:237291:237708 [2] NCCL INFO Channel 12/0 : 6[2] -> 7[3] via P2P/MNNVL
slurm-gb200-217-047:237292:237707 [3] NCCL INFO Channel 13/0 : 7[3] -> 0[0] via P2P/MNNVL
slurm-gb200-217-047:237289:237709 [0] NCCL INFO Channel 14/0 : 4[0] -> 5[1] via P2P/MNNVL
slurm-gb200-217-047:237290:237710 [1] NCCL INFO Channel 13/0 : 5[1] -> 6[2] via P2P/MNNVL
slurm-gb200-217-047:237289:237709 [0] NCCL INFO Channel 15/0 : 4[0] -> 5[1] via P2P/MNNVL
slurm-gb200-217-047:237292:237707 [3] NCCL INFO Channel 14/0 : 7[3] -> 0[0] via P2P/MNNVL
slurm-gb200-217-047:237290:237710 [1] NCCL INFO Channel 14/0 : 5[1] -> 6[2] via P2P/MNNVL
slurm-gb200-217-047:237291:237708 [2] NCCL INFO Channel 13/0 : 6[2] -> 7[3] via P2P/MNNVL
slurm-gb200-217-047:237292:237707 [3] NCCL INFO Channel 15/0 : 7[3] -> 0[0] via P2P/MNNVL
slurm-gb200-217-047:237289:237709 [0] NCCL INFO Channel 16/0 : 4[0] -> 5[1] via P2P/MNNVL
slurm-gb200-217-047:237291:237708 [2] NCCL INFO Channel 14/0 : 6[2] -> 7[3] via P2P/MNNVL
slurm-gb200-217-047:237290:237710 [1] NCCL INFO Channel 15/0 : 5[1] -> 6[2] via P2P/MNNVL
slurm-gb200-217-047:237292:237707 [3] NCCL INFO Channel 16/0 : 7[3] -> 0[0] via P2P/MNNVL
slurm-gb200-217-047:237289:237709 [0] NCCL INFO Channel 17/0 : 4[0] -> 5[1] via P2P/MNNVL
slurm-gb200-217-047:237290:237710 [1] NCCL INFO Channel 16/0 : 5[1] -> 6[2] via P2P/MNNVL
slurm-gb200-217-047:237291:237708 [2] NCCL INFO Channel 15/0 : 6[2] -> 7[3] via P2P/MNNVL
slurm-gb200-217-047:237292:237707 [3] NCCL INFO Channel 17/0 : 7[3] -> 0[0] via P2P/MNNVL
slurm-gb200-217-047:237290:237710 [1] NCCL INFO Channel 17/0 : 5[1] -> 6[2] via P2P/MNNVL
slurm-gb200-217-047:237289:237709 [0] NCCL INFO Channel 18/0 : 4[0] -> 5[1] via P2P/MNNVL
slurm-gb200-217-047:237291:237708 [2] NCCL INFO Channel 16/0 : 6[2] -> 7[3] via P2P/MNNVL
slurm-gb200-217-047:237292:237707 [3] NCCL INFO Channel 18/0 : 7[3] -> 0[0] via P2P/MNNVL
slurm-gb200-217-047:237290:237710 [1] NCCL INFO Channel 18/0 : 5[1] -> 6[2] via P2P/MNNVL
slurm-gb200-217-047:237292:237707 [3] NCCL INFO Channel 19/0 : 7[3] -> 0[0] via P2P/MNNVL
slurm-gb200-217-047:237289:237709 [0] NCCL INFO Channel 19/0 : 4[0] -> 5[1] via P2P/MNNVL
slurm-gb200-217-047:237291:237708 [2] NCCL INFO Channel 17/0 : 6[2] -> 7[3] via P2P/MNNVL
slurm-gb200-217-047:237290:237710 [1] NCCL INFO Channel 19/0 : 5[1] -> 6[2] via P2P/MNNVL
slurm-gb200-217-047:237291:237708 [2] NCCL INFO Channel 18/0 : 6[2] -> 7[3] via P2P/MNNVL
slurm-gb200-217-047:237289:237709 [0] NCCL INFO Channel 20/0 : 4[0] -> 5[1] via P2P/MNNVL
slurm-gb200-217-047:237292:237707 [3] NCCL INFO Channel 20/0 : 7[3] -> 0[0] via P2P/MNNVL
slurm-gb200-217-047:237290:237710 [1] NCCL INFO Channel 20/0 : 5[1] -> 6[2] via P2P/MNNVL
slurm-gb200-217-047:237289:237709 [0] NCCL INFO Channel 21/0 : 4[0] -> 5[1] via P2P/MNNVL
slurm-gb200-217-047:237291:237708 [2] NCCL INFO Channel 19/0 : 6[2] -> 7[3] via P2P/MNNVL
slurm-gb200-217-047:237290:237710 [1] NCCL INFO Channel 21/0 : 5[1] -> 6[2] via P2P/MNNVL
slurm-gb200-217-047:237292:237707 [3] NCCL INFO Channel 21/0 : 7[3] -> 0[0] via P2P/MNNVL
slurm-gb200-217-047:237291:237708 [2] NCCL INFO Channel 20/0 : 6[2] -> 7[3] via P2P/MNNVL
slurm-gb200-217-047:237289:237709 [0] NCCL INFO Channel 22/0 : 4[0] -> 5[1] via P2P/MNNVL
slurm-gb200-217-047:237292:237707 [3] NCCL INFO Channel 22/0 : 7[3] -> 0[0] via P2P/MNNVL
slurm-gb200-217-047:237289:237709 [0] NCCL INFO Channel 23/0 : 4[0] -> 5[1] via P2P/MNNVL
slurm-gb200-217-047:237290:237710 [1] NCCL INFO Channel 22/0 : 5[1] -> 6[2] via P2P/MNNVL
slurm-gb200-217-047:237291:237708 [2] NCCL INFO Channel 21/0 : 6[2] -> 7[3] via P2P/MNNVL
slurm-gb200-217-047:237292:237707 [3] NCCL INFO Channel 23/0 : 7[3] -> 0[0] via P2P/MNNVL
slurm-gb200-217-047:237290:237710 [1] NCCL INFO Channel 23/0 : 5[1] -> 6[2] via P2P/MNNVL
slurm-gb200-217-047:237289:237709 [0] NCCL INFO Channel 24/0 : 4[0] -> 5[1] via P2P/MNNVL
slurm-gb200-217-047:237291:237708 [2] NCCL INFO Channel 22/0 : 6[2] -> 7[3] via P2P/MNNVL
slurm-gb200-217-047:237292:237707 [3] NCCL INFO Channel 24/0 : 7[3] -> 0[0] via P2P/MNNVL
slurm-gb200-217-047:237289:237709 [0] NCCL INFO Channel 25/0 : 4[0] -> 5[1] via P2P/MNNVL
slurm-gb200-217-047:237290:237710 [1] NCCL INFO Channel 24/0 : 5[1] -> 6[2] via P2P/MNNVL
slurm-gb200-217-047:237291:237708 [2] NCCL INFO Channel 23/0 : 6[2] -> 7[3] via P2P/MNNVL
slurm-gb200-217-047:237292:237707 [3] NCCL INFO Channel 25/0 : 7[3] -> 0[0] via P2P/MNNVL
slurm-gb200-217-047:237291:237708 [2] NCCL INFO Channel 24/0 : 6[2] -> 7[3] via P2P/MNNVL
slurm-gb200-217-047:237290:237710 [1] NCCL INFO Channel 25/0 : 5[1] -> 6[2] via P2P/MNNVL
slurm-gb200-217-047:237292:237707 [3] NCCL INFO Channel 26/0 : 7[3] -> 0[0] via P2P/MNNVL
slurm-gb200-217-047:237289:237709 [0] NCCL INFO Channel 26/0 : 4[0] -> 5[1] via P2P/MNNVL
slurm-gb200-217-047:237291:237708 [2] NCCL INFO Channel 25/0 : 6[2] -> 7[3] via P2P/MNNVL
slurm-gb200-217-047:237290:237710 [1] NCCL INFO Channel 26/0 : 5[1] -> 6[2] via P2P/MNNVL
slurm-gb200-217-047:237289:237709 [0] NCCL INFO Channel 27/0 : 4[0] -> 5[1] via P2P/MNNVL
slurm-gb200-217-047:237292:237707 [3] NCCL INFO Channel 27/0 : 7[3] -> 0[0] via P2P/MNNVL
slurm-gb200-217-047:237291:237708 [2] NCCL INFO Channel 26/0 : 6[2] -> 7[3] via P2P/MNNVL
slurm-gb200-217-047:237290:237710 [1] NCCL INFO Channel 27/0 : 5[1] -> 6[2] via P2P/MNNVL
slurm-gb200-217-047:237289:237709 [0] NCCL INFO Channel 28/0 : 4[0] -> 5[1] via P2P/MNNVL
slurm-gb200-217-047:237291:237708 [2] NCCL INFO Channel 27/0 : 6[2] -> 7[3] via P2P/MNNVL
slurm-gb200-217-047:237292:237707 [3] NCCL INFO Channel 28/0 : 7[3] -> 0[0] via P2P/MNNVL
slurm-gb200-217-047:237292:237707 [3] NCCL INFO Channel 29/0 : 7[3] -> 0[0] via P2P/MNNVL
slurm-gb200-217-047:237290:237710 [1] NCCL INFO Channel 28/0 : 5[1] -> 6[2] via P2P/MNNVL
slurm-gb200-217-047:237289:237709 [0] NCCL INFO Channel 29/0 : 4[0] -> 5[1] via P2P/MNNVL
slurm-gb200-217-047:237291:237708 [2] NCCL INFO Channel 28/0 : 6[2] -> 7[3] via P2P/MNNVL
slurm-gb200-217-047:237290:237710 [1] NCCL INFO Channel 29/0 : 5[1] -> 6[2] via P2P/MNNVL
slurm-gb200-217-047:237292:237707 [3] NCCL INFO Channel 30/0 : 7[3] -> 0[0] via P2P/MNNVL
slurm-gb200-217-047:237291:237708 [2] NCCL INFO Channel 29/0 : 6[2] -> 7[3] via P2P/MNNVL
slurm-gb200-217-047:237289:237709 [0] NCCL INFO Channel 30/0 : 4[0] -> 5[1] via P2P/MNNVL
slurm-gb200-217-047:237290:237710 [1] NCCL INFO Channel 30/0 : 5[1] -> 6[2] via P2P/MNNVL
slurm-gb200-217-047:237291:237708 [2] NCCL INFO Channel 30/0 : 6[2] -> 7[3] via P2P/MNNVL
slurm-gb200-217-047:237292:237707 [3] NCCL INFO Channel 31/0 : 7[3] -> 0[0] via P2P/MNNVL
slurm-gb200-217-047:237290:237710 [1] NCCL INFO Channel 31/0 : 5[1] -> 6[2] via P2P/MNNVL
slurm-gb200-217-047:237289:237709 [0] NCCL INFO Channel 31/0 : 4[0] -> 5[1] via P2P/MNNVL
slurm-gb200-217-047:237291:237708 [2] NCCL INFO Channel 31/0 : 6[2] -> 7[3] via P2P/MNNVL
slurm-gb200-217-027:234738:235163 [0] NCCL INFO Channel 00/0 : 0[0] -> 1[1] via P2P/MNNVL
slurm-gb200-217-027:234738:235163 [0] NCCL INFO Channel 01/0 : 0[0] -> 1[1] via P2P/MNNVL
slurm-gb200-217-027:234738:235163 [0] NCCL INFO Channel 02/0 : 0[0] -> 1[1] via P2P/MNNVL
slurm-gb200-217-027:234738:235163 [0] NCCL INFO Channel 03/0 : 0[0] -> 1[1] via P2P/MNNVL
slurm-gb200-217-027:234738:235163 [0] NCCL INFO Channel 04/0 : 0[0] -> 1[1] via P2P/MNNVL
slurm-gb200-217-027:234739:235164 [1] NCCL INFO Channel 00/0 : 1[1] -> 2[2] via P2P/MNNVL
slurm-gb200-217-027:234738:235163 [0] NCCL INFO Channel 05/0 : 0[0] -> 1[1] via P2P/MNNVL
slurm-gb200-217-027:234739:235164 [1] NCCL INFO Channel 01/0 : 1[1] -> 2[2] via P2P/MNNVL
slurm-gb200-217-027:234738:235163 [0] NCCL INFO Channel 06/0 : 0[0] -> 1[1] via P2P/MNNVL
slurm-gb200-217-027:234739:235164 [1] NCCL INFO Channel 02/0 : 1[1] -> 2[2] via P2P/MNNVL
slurm-gb200-217-027:234738:235163 [0] NCCL INFO Channel 07/0 : 0[0] -> 1[1] via P2P/MNNVL
slurm-gb200-217-027:234739:235164 [1] NCCL INFO Channel 03/0 : 1[1] -> 2[2] via P2P/MNNVL
slurm-gb200-217-027:234738:235163 [0] NCCL INFO Channel 08/0 : 0[0] -> 1[1] via P2P/MNNVL
slurm-gb200-217-027:234739:235164 [1] NCCL INFO Channel 04/0 : 1[1] -> 2[2] via P2P/MNNVL
slurm-gb200-217-027:234738:235163 [0] NCCL INFO Channel 09/0 : 0[0] -> 1[1] via P2P/MNNVL
slurm-gb200-217-027:234739:235164 [1] NCCL INFO Channel 05/0 : 1[1] -> 2[2] via P2P/MNNVL
slurm-gb200-217-027:234738:235163 [0] NCCL INFO Channel 10/0 : 0[0] -> 1[1] via P2P/MNNVL
slurm-gb200-217-027:234739:235164 [1] NCCL INFO Channel 06/0 : 1[1] -> 2[2] via P2P/MNNVL
slurm-gb200-217-027:234738:235163 [0] NCCL INFO Channel 11/0 : 0[0] -> 1[1] via P2P/MNNVL
slurm-gb200-217-027:234739:235164 [1] NCCL INFO Channel 07/0 : 1[1] -> 2[2] via P2P/MNNVL
slurm-gb200-217-027:234738:235163 [0] NCCL INFO Channel 12/0 : 0[0] -> 1[1] via P2P/MNNVL
slurm-gb200-217-027:234739:235164 [1] NCCL INFO Channel 08/0 : 1[1] -> 2[2] via P2P/MNNVL
slurm-gb200-217-027:234738:235163 [0] NCCL INFO Channel 13/0 : 0[0] -> 1[1] via P2P/MNNVL
slurm-gb200-217-027:234739:235164 [1] NCCL INFO Channel 09/0 : 1[1] -> 2[2] via P2P/MNNVL
slurm-gb200-217-027:234738:235163 [0] NCCL INFO Channel 14/0 : 0[0] -> 1[1] via P2P/MNNVL
slurm-gb200-217-027:234739:235164 [1] NCCL INFO Channel 10/0 : 1[1] -> 2[2] via P2P/MNNVL
slurm-gb200-217-027:234738:235163 [0] NCCL INFO Channel 15/0 : 0[0] -> 1[1] via P2P/MNNVL
slurm-gb200-217-027:234739:235164 [1] NCCL INFO Channel 11/0 : 1[1] -> 2[2] via P2P/MNNVL
slurm-gb200-217-027:234738:235163 [0] NCCL INFO Channel 16/0 : 0[0] -> 1[1] via P2P/MNNVL
slurm-gb200-217-027:234739:235164 [1] NCCL INFO Channel 12/0 : 1[1] -> 2[2] via P2P/MNNVL
slurm-gb200-217-027:234738:235163 [0] NCCL INFO Channel 17/0 : 0[0] -> 1[1] via P2P/MNNVL
slurm-gb200-217-027:234739:235164 [1] NCCL INFO Channel 13/0 : 1[1] -> 2[2] via P2P/MNNVL
slurm-gb200-217-027:234738:235163 [0] NCCL INFO Channel 18/0 : 0[0] -> 1[1] via P2P/MNNVL
slurm-gb200-217-027:234739:235164 [1] NCCL INFO Channel 14/0 : 1[1] -> 2[2] via P2P/MNNVL
slurm-gb200-217-027:234739:235164 [1] NCCL INFO Channel 15/0 : 1[1] -> 2[2] via P2P/MNNVL
slurm-gb200-217-027:234738:235163 [0] NCCL INFO Channel 19/0 : 0[0] -> 1[1] via P2P/MNNVL
slurm-gb200-217-027:234739:235164 [1] NCCL INFO Channel 16/0 : 1[1] -> 2[2] via P2P/MNNVL
slurm-gb200-217-027:234738:235163 [0] NCCL INFO Channel 20/0 : 0[0] -> 1[1] via P2P/MNNVL
slurm-gb200-217-027:234739:235164 [1] NCCL INFO Channel 17/0 : 1[1] -> 2[2] via P2P/MNNVL
slurm-gb200-217-027:234738:235163 [0] NCCL INFO Channel 21/0 : 0[0] -> 1[1] via P2P/MNNVL
slurm-gb200-217-027:234739:235164 [1] NCCL INFO Channel 18/0 : 1[1] -> 2[2] via P2P/MNNVL
slurm-gb200-217-027:234738:235163 [0] NCCL INFO Channel 22/0 : 0[0] -> 1[1] via P2P/MNNVL
slurm-gb200-217-027:234739:235164 [1] NCCL INFO Channel 19/0 : 1[1] -> 2[2] via P2P/MNNVL
slurm-gb200-217-027:234738:235163 [0] NCCL INFO Channel 23/0 : 0[0] -> 1[1] via P2P/MNNVL
slurm-gb200-217-027:234739:235164 [1] NCCL INFO Channel 20/0 : 1[1] -> 2[2] via P2P/MNNVL
slurm-gb200-217-027:234738:235163 [0] NCCL INFO Channel 24/0 : 0[0] -> 1[1] via P2P/MNNVL
slurm-gb200-217-027:234739:235164 [1] NCCL INFO Channel 21/0 : 1[1] -> 2[2] via P2P/MNNVL
slurm-gb200-217-027:234738:235163 [0] NCCL INFO Channel 25/0 : 0[0] -> 1[1] via P2P/MNNVL
slurm-gb200-217-027:234739:235164 [1] NCCL INFO Channel 22/0 : 1[1] -> 2[2] via P2P/MNNVL
slurm-gb200-217-027:234738:235163 [0] NCCL INFO Channel 26/0 : 0[0] -> 1[1] via P2P/MNNVL
slurm-gb200-217-027:234739:235164 [1] NCCL INFO Channel 23/0 : 1[1] -> 2[2] via P2P/MNNVL
slurm-gb200-217-027:234738:235163 [0] NCCL INFO Channel 27/0 : 0[0] -> 1[1] via P2P/MNNVL
slurm-gb200-217-027:234739:235164 [1] NCCL INFO Channel 24/0 : 1[1] -> 2[2] via P2P/MNNVL
slurm-gb200-217-027:234738:235163 [0] NCCL INFO Channel 28/0 : 0[0] -> 1[1] via P2P/MNNVL
slurm-gb200-217-027:234739:235164 [1] NCCL INFO Channel 25/0 : 1[1] -> 2[2] via P2P/MNNVL
slurm-gb200-217-027:234738:235163 [0] NCCL INFO Channel 29/0 : 0[0] -> 1[1] via P2P/MNNVL
slurm-gb200-217-027:234739:235164 [1] NCCL INFO Channel 26/0 : 1[1] -> 2[2] via P2P/MNNVL
slurm-gb200-217-027:234739:235164 [1] NCCL INFO Channel 27/0 : 1[1] -> 2[2] via P2P/MNNVL
slurm-gb200-217-027:234738:235163 [0] NCCL INFO Channel 30/0 : 0[0] -> 1[1] via P2P/MNNVL
slurm-gb200-217-027:234739:235164 [1] NCCL INFO Channel 28/0 : 1[1] -> 2[2] via P2P/MNNVL
slurm-gb200-217-027:234738:235163 [0] NCCL INFO Channel 31/0 : 0[0] -> 1[1] via P2P/MNNVL
slurm-gb200-217-027:234739:235164 [1] NCCL INFO Channel 29/0 : 1[1] -> 2[2] via P2P/MNNVL
slurm-gb200-217-027:234739:235164 [1] NCCL INFO Channel 30/0 : 1[1] -> 2[2] via P2P/MNNVL
slurm-gb200-217-027:234739:235164 [1] NCCL INFO Channel 31/0 : 1[1] -> 2[2] via P2P/MNNVL
slurm-gb200-217-027:234741:235161 [3] transport/p2p.cc:277 NCCL WARN Cuda failure 400 'invalid resource handle'
slurm-gb200-217-027:234741:235161 [3] NCCL INFO transport/p2p.cc:352 -> 1
slurm-gb200-217-027:234741:235161 [3] NCCL INFO transport/p2p.cc:487 -> 1
slurm-gb200-217-027:234741:235161 [3] NCCL INFO transport.cc:194 -> 1
slurm-gb200-217-027:234741:235161 [3] NCCL INFO transport/generic.cc:19 -> 1
slurm-gb200-217-027:234741:235161 [3] NCCL INFO group.cc:148 -> 1
slurm-gb200-217-027:234741:235161 [3] NCCL INFO group.cc:75 -> 1 [Async thread]
slurm-gb200-217-027:234741:234741 [3] NCCL INFO group.cc:454 -> 1
slurm-gb200-217-027:234741:234741 [3] NCCL INFO group.cc:573 -> 1
slurm-gb200-217-027:234741:234741 [3] NCCL INFO enqueue.cc:2229 -> 1
slurm-gb200-217-027: Test NCCL failure all_reduce.cu:44 'unhandled cuda error (run with NCCL_DEBUG=INFO for details) / '
.. slurm-gb200-217-027 pid 234741: Test failure common.cu:377
.. slurm-gb200-217-027 pid 234741: Test failure common.cu:584
.. slurm-gb200-217-027 pid 234741: Test failure all_reduce.cu:90
.. slurm-gb200-217-027 pid 234741: Test failure common.cu:613
.. slurm-gb200-217-027 pid 234741: Test failure common.cu:1016
.. slurm-gb200-217-027 pid 234741: Test failure common.cu:842
srun: error: slurm-gb200-217-027: task 3: Exited with exit code 3
srun: Terminating StepId=1742.0
slurmstepd: error: *** STEP 1742.0 ON slurm-gb200-217-027 CANCELLED AT 2025-07-14T20:26:11 ***
slurmstepd: error: mpi/pmix_v4: _errhandler: slurm-gb200-217-027 [0]: pmixp_client_v2.c:211: Error handler invoked: status = -61, source = [slurm.pmix.1742.0:3]
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
slurmstepd: error: mpi/pmix_v4: _errhandler: slurm-gb200-217-047 [1]: pmixp_client_v2.c:211: Error handler invoked: status = -61, source = [slurm.pmix.1742.0:6]
srun: error: slurm-gb200-217-027: tasks 0-2: Terminated
srun: error: slurm-gb200-217-047: tasks 4-7: Terminated
最新发布