OverLoCK项目中的设备类型错误分析与解决方案-优快云博客

本文链接：https://blog.youkuaiyun.com/gitblog_07523/article/details/148647536

OverLoCK项目中的设备类型错误分析与解决方案

OverLoCK [CVPR 2025] OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels 项目地址: https://gitcode.com/gh_mirrors/ove/OverLoCK

问题背景

在使用OverLoCK项目进行分布式训练时，用户遇到了一个典型的PyTorch设备类型错误。具体表现为当尝试运行分布式训练脚本时，系统抛出AttributeError: 'int' object has no attribute 'type'异常，这表明在处理设备类型时出现了类型不匹配的问题。

错误根源分析

该错误发生在MMCV库的_functions.py文件中，具体是在Scatter类的forward方法中。核心问题在于不同版本的PyTorch对设备参数的处理方式发生了变化：

PyTorch 2.1.0之前版本：设备参数直接使用整数表示GPU索引
PyTorch 2.1.0及之后版本：需要使用torch.device对象来表示设备

当代码尝试访问整数类型参数的type属性时，自然会引发属性错误，因为整数类型没有这个属性。

解决方案详解

针对这个问题，OverLoCK项目提供了明确的解决方案，需要对MMCV库中的_functions.py文件进行修改。具体修改位置在Scatter类的forward方法中：

class Scatter:
    @staticmethod
    def forward(target_gpus: List[int], input: Union[List, Tensor]) -> tuple:
        input_device = get_input_device(input)
        streams = None
        if input_device == -1 and target_gpus != [-1]:
            # 根据PyTorch版本选择不同的设备表示方式
            if version.parse(torch.__version__) >= version.parse('2.1.0'):
                streams = [_get_stream(torch.device("cuda", device)) for device in target_gpus]
            else:
                streams = [_get_stream(device) for device in target_gpus]

这个修改的核心思想是：