PyTorch Dataloader当使用LMDB数据时num_worker>0引发TypeError: can't pickle Environment objects问题的解决
防忘记,原issue:https://github.com/pytorch/vision/issues/689#issuecomment-787215916
解决方法:
- 不要在
__init__方法中调用lmdb.open方法; - 在第一次加载数据时打开lmdb。
代码如下
class DataLoader(torch.utils.data.Dataset):
def __init__(self):
"""do not open lmdb here!!"""
def open_lmdb(self):
self.env = lmdb.open(self.lmdb_dir, readonly=True, create=False)
self.txn = self.env.begin(buffers=True)
def __getitem__(self, item: int):
if not hasattr(self, 'txn'):
self.open_lmdb()
"""
Then do anything you want with env/txn here.
"""
解释(照搬原文,懒得翻译了):
The multi-processing actually happens when you create the data iterator (e.g., when calling for datum in dataloader:):
https://github.com/pytorch/pytorch/blob/461014d54b3981c8fa6617f90ff7b7df51ab1e85/torch/utils/data/dataloader.py#L712-L720
In short, it would create multiple processes which “copy” the state of the current process. This copy involves a pickle of the LMDB’s Env thus causes an issue. In our solution, we open it at the first data iteration and the opened lmdb file object would be dedicated to each subprocess.
博客记录了解决PyTorch中LMDB多进程问题的方法。原issue可参考特定链接,解决办法是不在方法中调用lmdb.open方法,在第一次加载数据时打开lmdb,并给出了代码及相关解释,指出多进程创建数据迭代器时会复制当前进程状态,导致LMDB的Env出现问题。
8134

被折叠的 条评论
为什么被折叠?



