Pytorch使用shuffle打乱数据

最新推荐文章于 2025-03-29 16:53:31 发布

永远的小白虾

最新推荐文章于 2025-03-29 16:53:31 发布

阅读量1.9w

点赞数 11

分类专栏： Pytorch 文章标签： Pytorch 深度学习 python

本文链接：https://blog.youkuaiyun.com/qq_41487299/article/details/107424432

版权

Pytorch 专栏收录该内容

16 篇文章

订阅专栏

这个东西算是我被这个shuffle坑了的一个总结吧！
首先我得告诉你一件事，那就是pytorch中的tensor，如果直接使用random.shuffle打乱数据，或者使用下面的方式，自己定义直接写。

 def Shuffle(self, x, y,random=None, int=int):
         if random is None:
            random = self.random
                 for i in range(len(x)):
            j = int(random() * (i + 1))
            if j<=len(x)-1:
                x[i],x[j]=x[j],x[i]
                y[i],y[j]=y[j],y[i]
          retrun x,y

那你就会收获一堆的混乱数据，因为使用这种交换的方式对tensor类型的数据进行操作，会导致里面的数据出现重复复制的问题。
比如我y中的数据为【0,1,0，1,0，1】
在经过几次shuffle,其中的数据就变成了【1,1,1,1,1,1】。
数据顿时出现混乱。
正确的方式是先转成numpy，再进行交换数据。
比如：

 def Shuffle(self, x, y,random=None, int=int):
        """x, random=random.random -> shuffle list x in place; return None.

        Optional arg random is a 0-argument function returning a random
        float in [0.0, 1.0); by default, the standard random.random.
        """

        if random is None:
            random = self.random #random=random.random
        #转成numpy
        if torch.is_tensor(x)==True:
            if self.use_cuda==True:
               x=x.cpu().numpy()
            else:
               x=x.numpy()
        if torch.is_tensor(y) == True:
            if self.use_cuda==True:
               y=y.cpu().numpy()
            else:
               y=y.numpy()
        #开始随机置换
        for i in range(len(x)):
            j = int(random() * (i + 1))
            if j<=len(x)-1:#交换
                x[i],x[j]=x[j],x[i]
                y[i],y[j]=y[j],y[i]

        #转回tensor
        if self.use_cuda == True:
            x=torch.from_numpy(x).cuda()
            y=torch.from_numpy(y).cuda()

        else:
            x = torch.from_numpy(x)
            y = torch.from_numpy(y)
        return x,y