论文 《Learning a Text-Video Embedding from Incomplete and Heterogeneous Data》 所有数据存成.npy,然后np.load(.npy): class LSMDC(Dataset): def __init__(self, clip_path, text_features, audio_features, flow_path, face_path, **):