【项目实战】AUTOVC 代码解析 —— conversion.py

本文解析了AUTOVC项目中的conversion.py文件,介绍了pad_seq函数如何处理数据补全,以及main函数如何利用训练好的模型完成语音转换并保存结果。

AUTOVC 代码解析 —— conversion.py

  简介

       本项目一个基于 AUTOVC 模型的语音转换项目,它是使用 PyTorch 实现的(项目地址)。
       
        AUTOVC 遵循自动编码器框架,只对自动编码器损耗进行训练,但它引入了精心调整的降维和时间下采样来约束信息流,这个简单的方案带来了显著的性能提高。(详情请参阅 AUTOVC 的详细介绍)。
       
       由于 AUTOVC 项目较大,代码较多。为了方便学习与整理,将按照工程文件的结构依次介绍。
       
       本文将介绍项目中的 conversion.py 文件:语音音色转换处理脚本
       

  函数解析

    pad_seq

          该函数的作用是: 计算数据 x 以 base 个元素为一份划分,若最后一组不足一份则补齐

          输入参数:

		x		:	待处理数据
		base	:	一份中的元素数量

          输出参数:

		np.pad(x, ((0,len_pad),(0,0)), 'constant'), len_pad	:	补齐后的数据,以及补齐用的长度

          代码详解:

		def pad_seq(x, base=32):
		    # ceil 向上取整,计算数据 x 以 base 划分为多少份,若最后一组不足一份则补齐
		    len_out = int(base * ceil(float(x.shape[0])/base))
		    # 计算需要补齐的长度
		    len_pad = len_out - x.shape[0]
		    # 当长度小于 0 时抛出警告
		    assert len_pad >= 0
		    # 返回补齐后的数据,以及补齐用的长度
		    return np.pad(x, ((0,len_pad),(0,0)), 'constant'), len_pad

    main

          该函数的作用是: 选取转换路径,执行语音转换,并将转换结果保存

          输入参数:

          输出参数:

          代码详解:

		# 设置设备为第一个 GPU 设备
		device = 'cuda:0'
		# 创建生成器模型,内容编码长度为 32 ,说话人编码长度为 256 ,后置网络输出长度为 512 
		# 采样系数为 32 ,设置为生成模式,并移入设备 device 
		G = Generator(32, 256, 512, 32).eval().to(device)
		
		# 加载训练好的参数至变量 g_checkpoint
		g_checkpoint = torch.load('autovc.ckpt')
		# 生成器模型加载训练好的数据
		G.load_state_dict(g_checkpoint['model'])
		
		# 打开事先准备包括说话人信息(包括说话人民,说话人编码,说话梅尔数据)的训练数据
		metadata = pickle.load(open('metadata.pkl', "rb"))
		
		# 创建转换信息与语音梅尔频谱列表
		spect_vc = []
		
		# 遍历说话人信息,取源说话人信息 sbmt_i
		for sbmt_i in metadata:
		
		    # 取说话人说话梅尔数据
		    x_org = sbmt_i[2]
		    # 处理去除的梅尔数据,得到补齐后的说话数据与补齐长度
		    x_org, len_pad = pad_seq(x_org)
		    # 在补齐的说话数据前增加新的维度,将说话数据转为 Tensor ,接着移入设备 device ,得到源说话人说话数据 uttr_org 
		    uttr_org = torch.from_numpy(x_org[np.newaxis, :, :]).to(device)
		    # sbmt_i[1] 取出说话人编码,并在数据前增加新的维度
		    # 将说话数据转为 Tensor ,接着移入设备 device ,得到源说话人说话人编码 emb_org
		    emb_org = torch.from_numpy(sbmt_i[1][np.newaxis, :]).to(device)
		
		    # 遍历说话人信息,取目标说话人信息 sbmt_j
		    for sbmt_j in metadata:
		
		        # sbmt_j[1] 取出说话人编码,并在数据前增加新的维度
		    	# 将说话数据转为 Tensor ,接着移入设备 device ,得到目标说话人说话人编码 emb_trg
		        emb_trg = torch.from_numpy(sbmt_j[1][np.newaxis, :]).to(device)
		
		        # torch.no_grad 是一个上下文管理器,被该语句 wrap 起来的部分将不会 track 梯度。
		        with torch.no_grad():
		            # 输入源说话人说话数据 uttr_org ,源说话人说话人编码 emb_org 与目标说话人说话人编码 emb_trg ,得到最终转换结果 x_identic_psnt
		            _, x_identic_psnt, _ = G(uttr_org, emb_org, emb_trg)
		
		        # 若补齐所用的长度 len_pad 为 0
		        if len_pad == 0:
		            # 取最终转换结果的全部数据,移入 cpu 中,转为 numpy 数据,得到目标语音数据 uttr_trg
		            uttr_trg = x_identic_psnt[0, 0, :, :].cpu().numpy()
		        # 若补齐所用的长度 len_pad 不为 0
		        else:
		            # 取最终转换结果的部分数据(将补齐的部分剪切),移入 cpu 中,转为 numpy 数据,得到目标语音数据 uttr_trg
		            uttr_trg = x_identic_psnt[0, 0, :-len_pad, :].cpu().numpy()
		
		        # 将转换路径 sbmt_i[0], sbmt_j[0] 与转换后的语音数据 uttr_trg 合并,接着添加至转换信息与语音梅尔频谱列表 spect_vc
		        spect_vc.append(('{}x{}'.format(sbmt_i[0], sbmt_j[0]), uttr_trg))
		
		# 打开最终结果文件 results.pkl 准备写入
		with open('results.pkl', 'wb') as handle:
		    # 将转换信息与语音梅尔频谱列表 spect_vc 写入文件
		    pickle.dump(spect_vc, handle)
(nnunet_env) jzuser@vpc87-3:~/Work_dir/Gn/pystudy/nnUNet/nnUNet$ ls -R .: documentation LICENSE nnunetv2 nnunetv2.egg-info pyproject.toml readme.md setup.py UNKNOWN.egg-info ./documentation: assets dataset_format.md __init__.py run_inference_with_pretrained_models.md benchmarking.md explanation_normalization.md installation_instructions.md set_environment_variables.md changelog.md explanation_plans_files.md manual_data_splits.md setting_up_paths.md competitions extending_nnunet.md pretraining_and_finetuning.md tldr_migration_guide_from_v1.md convert_msd_dataset.md how_to_use_nnunet.md region_based_training.md dataset_format_inference.md ignore_label.md resenc_presets.md ./documentation/assets: amos2022_sparseseg10_2d.png dkfz_logo.png nnUNetMagician.png regions_vs_labels.png sparse_annotation_amos.png amos2022_sparseseg10.png HI_Logo.png nnU-Net_overview.png scribble_example.png ./documentation/competitions: AortaSeg24.md AutoPETII.md FLARE24 __init__.py Toothfairy2 ./documentation/competitions/FLARE24: __init__.py Task_1 Task_2 ./documentation/competitions/FLARE24/Task_1: inference_flare_task1.py __init__.py readme.md ./documentation/competitions/FLARE24/Task_2: inference_flare_task2.py __init__.py readme.md ./documentation/competitions/Toothfairy2: inference_script_semseg_only_customInf2.py __init__.py readme.md ./nnunetv2: batch_running ensembling imageio model_sharing preprocessing training configuration.py evaluation inference paths.py run utilities dataset_conversion experiment_planning __init__.py postprocessing tests ./nnunetv2/batch_running: benchmarking collect_results_custom_Decathlon.py __init__.py release_trainings collect_results_custom_Decathlon_2d.py generate_lsf_runs_customDecathlon.py jobs.sh ./nnunetv2/batch_running/benchmarking: generate_benchmarking_commands.py __init__.py summarize_benchmark_results.py ./nnunetv2/batch_running/release_trainings: __init__.py nnunetv2_v1 ./nnunetv2/batch_running/release_trainings/nnunetv2_v1: collect_results.py generate_lsf_commands.py __init__.py ./nnunetv2/dataset_conversion: convert_MSD_dataset.py Dataset114_MNMs.py Dataset223_AMOS2022postChallenge.py convert_raw_dataset_from_old_nnunet_format.py Dataset115_EMIDEC.py Dataset224_AbdomenAtlas1.0.py Dataset015_018_RibFrac_RibSeg.py Dataset119_ToothFairy2_All.py Dataset226_BraTS2024-BraTS-GLI.py Dataset021_CTAAorta.py Dataset120_RoadSegmentation.py Dataset227_TotalSegmentatorMRI.py Dataset023_AbdomenAtlas1_1Mini.py Dataset137_BraTS21.py Dataset987_dummyDataset4.py Dataset027_ACDC.py Dataset218_Amos2022_task1.py Dataset989_dummyDataset4_2.py Dataset042_BraTS18.py Dataset219_Amos2022_task2.py datasets_for_integration_tests Dataset043_BraTS19.py Dataset220_KiTS2023.py generate_dataset_json.py Dataset073_Fluo_C3DH_A549_SIM.py Dataset221_AutoPETII_2023.py __init__.py ./nnunetv2/dataset_conversion/datasets_for_integration_tests: Dataset996_IntegrationTest_Hippocampus_regions_ignore.py Dataset998_IntegrationTest_Hippocampus_ignore.py __init__.py Dataset997_IntegrationTest_Hippocampus_regions.py Dataset999_IntegrationTest_Hippocampus.py ./nnunetv2/ensembling: ensemble.py __init__.py ./nnunetv2/evaluation: accumulate_cv_results.py evaluate_predictions.py find_best_configuration.py __init__.py ./nnunetv2/experiment_planning: dataset_fingerprint __init__.py plan_and_preprocess_entrypoints.py verify_dataset_integrity.py experiment_planners plan_and_preprocess_api.py plans_for_pretraining ./nnunetv2/experiment_planning/dataset_fingerprint: fingerprint_extractor.py __init__.py ./nnunetv2/experiment_planning/experiment_planners: default_experiment_planner.py __init__.py network_topology.py resampling resencUNet_planner.py residual_unets ./nnunetv2/experiment_planning/experiment_planners/resampling: __init__.py planners_no_resampling.py resample_with_torch.py ./nnunetv2/experiment_planning/experiment_planners/residual_unets: __init__.py residual_encoder_unet_planners.py ./nnunetv2/experiment_planning/plans_for_pretraining: __init__.py move_plans_between_datasets.py ./nnunetv2/imageio: base_reader_writer.py natural_image_reader_writer.py reader_writer_registry.py simpleitk_reader_writer.py __init__.py nibabel_reader_writer.py readme.md tif_reader_writer.py ./nnunetv2/inference: data_iterators.py export_prediction.py JHU_inference.py readme.md examples.py __init__.py predict_from_raw_data.py sliding_window_prediction.py ./nnunetv2/model_sharing: entry_points.py __init__.py model_download.py model_export.py model_import.py ./nnunetv2/postprocessing: __init__.py remove_connected_components.py ./nnunetv2/preprocessing: cropping __init__.py normalization preprocessors resampling ./nnunetv2/preprocessing/cropping: cropping.py __init__.py ./nnunetv2/preprocessing/normalization: default_normalization_schemes.py __init__.py map_channel_name_to_normalization.py readme.md ./nnunetv2/preprocessing/preprocessors: default_preprocessor.py __init__.py ./nnunetv2/preprocessing/resampling: default_resampling.py __init__.py no_resampling.py resample_torch.py utils.py ./nnunetv2/run: __init__.py load_pretrained_weights.py run_training.py ./nnunetv2/tests: example_data __init__.py integration_tests ./nnunetv2/tests/example_data: example_ct_sm.nii.gz example_ct_sm_T300_output.nii.gz ./nnunetv2/tests/integration_tests: add_lowres_and_cascade.py lsf_commands.sh run_integration_test_bestconfig_inference.py run_nnunet_inference.py cleanup_integration_test.py prepare_integration_tests.sh run_integration_test.sh __init__.py readme.md run_integration_test_trainingOnly_DDP.sh ./nnunetv2/training: data_augmentation dataloading __init__.py logging loss lr_scheduler nnUNetTrainer ./nnunetv2/training/data_augmentation: compute_initial_patch_size.py custom_transforms __init__.py ./nnunetv2/training/data_augmentation/custom_transforms: cascade_transforms.py __init__.py region_based_training.py deep_supervision_donwsampling.py masking.py transforms_for_dummy_2d.py ./nnunetv2/training/dataloading: data_loader.py __init__.py nnunet_dataset.py utils.py ./nnunetv2/training/logging: __init__.py nnunet_logger.py ./nnunetv2/training/loss: compound_losses.py deep_supervision.py dice.py __init__.py robust_ce_loss.py ./nnunetv2/training/lr_scheduler: __init__.py polylr.py warmup.py ./nnunetv2/training/nnUNetTrainer: __init__.py nnUNetTrainer.py primus variants ./nnunetv2/training/nnUNetTrainer/primus: __init__.py primus_trainers.py ./nnunetv2/training/nnUNetTrainer/variants: benchmarking data_augmentation loss network_architecture sampling competitions __init__.py lr_schedule optimizer training_length ./nnunetv2/training/nnUNetTrainer/variants/benchmarking: __init__.py nnUNetTrainerBenchmark_5epochs_noDataLoading.py nnUNetTrainerBenchmark_5epochs.py ./nnunetv2/training/nnUNetTrainer/variants/competitions: aortaseg24.py __init__.py ./nnunetv2/training/nnUNetTrainer/variants/data_augmentation: __init__.py nnUNetTrainerDAOrd0.py nnUNetTrainer_noDummy2DDA.py nnUNetTrainerDA5.py nnUNetTrainerNoDA.py nnUNetTrainerNoMirroring.py ./nnunetv2/training/nnUNetTrainer/variants/loss: __init__.py nnUNetTrainerCELoss.py nnUNetTrainerDiceLoss.py nnUNetTrainerTopkLoss.py ./nnunetv2/training/nnUNetTrainer/variants/lr_schedule: __init__.py nnUNetTrainerCosAnneal.py nnUNetTrainer_warmup.py ./nnunetv2/training/nnUNetTrainer/variants/network_architecture: __init__.py nnUNetTrainerBN.py nnUNetTrainerNoDeepSupervision.py ./nnunetv2/training/nnUNetTrainer/variants/optimizer: __init__.py nnUNetTrainerAdam.py nnUNetTrainerAdan.py ./nnunetv2/training/nnUNetTrainer/variants/sampling: __init__.py nnUNetTrainer_probabilisticOversampling.py ./nnunetv2/training/nnUNetTrainer/variants/training_length: __init__.py nnUNetTrainer_Xepochs_NoMirroring.py nnUNetTrainer_Xepochs.py ./nnunetv2/utilities: collate_outputs.py default_n_proc_DA.py helpers.py network_initialization.py crossval_split.py file_path_utilities.py __init__.py overlay_plots.py dataset_name_id_conversion.py find_class_by_name.py json_export.py plans_handling ddp_allgather.py get_network_from_plans.py label_handling utils.py ./nnunetv2/utilities/label_handling: __init__.py label_handling.py ./nnunetv2/utilities/plans_handling: __init__.py plans_handler.py ./nnunetv2.egg-info: dependency_links.txt entry_points.txt PKG-INFO requires.txt SOURCES.txt top_level.txt ./UNKNOWN.egg-info: dependency_links.txt PKG-INFO SOURCES.txt top_level.txt (nnunet_env) jzuser@vpc87-3:~/Work_dir/Gn/pystudy/nnUNet/nnUNet$
08-15
你运行了 `ls -R`,显示了当前目录的完整结构,看起来你已经进入了 `nnUNet/nnUNet` 目录,并且: - 存在 `setup.py` 文件(虽然你之前看到它内容为空) - 存在完整的 `nnunetv2/` 源码目录 - 存在 `nnunetv2_plan_and_preprocess` 的模块路径:`nnunetv2/experiment_planning/plan_and_preprocess_entrypoints.py` --- ## ✅ 问题定位 你之前运行 `cat setup.py` 显示内容为空,这说明: - 当前目录下的 `setup.py` 是空的,无法通过 `pip install -e .` 生成命令行脚本(如 `nnunetv2_plan_and_preprocess`) - 但是你拥有完整的源码结构,说明你**可以手动运行模块** --- ## ✅ 解决方案:绕过命令行工具,直接调用 Python 模块 你可以使用如下方式直接运行 `nnunetv2_plan_and_preprocess` 的功能: ### ✅ 手动运行预处理模块(Python 模块方式) ```bash cd ~/Work_dir/Gn/pystudy/nnUNet/nnUNet PYTHONPATH=/home/jzuser/Work_dir/Gn/pystudy/nnUNet/nnUNet python -m nnunetv2.experiment_planning.plan_and_preprocess_entrypoints.nnunetv2_plan_and_preprocess -d Task003_Liver --configuration 3d_fullres ``` ### 🔍 参数说明: - `PYTHONPATH=...`:告诉 Python 到哪里去找 `nnunetv2` 模块 - `-m ...`:运行 `plan_and_preprocess_entrypoints.py` 中的 `nnunetv2_plan_and_preprocess` 模块 - `-d Task003_Liver`:指定你要预处理的数据集 - `--configuration 3d_fullres`:指定配置(3D 全分辨率) --- ## ✅ 设置环境变量(非常重要) 确保你设置了以下环境变量: ```bash export nnUNet_raw=/home/jzuser/Work_dir/Gn export nnUNet_preprocessed=/home/jzuser/Work_dir/Gn/preprocessed export nnUNet_results=/home/jzuser/Work_dir/Gn/results ``` 这些路径用于告诉 `nnUNet` 数据放在哪里。 --- ## ✅ 示例完整命令流程 ```bash # 设置环境变量 export nnUNet_raw=/home/jzuser/Work_dir/Gn export nnUNet_preprocessed=/home/jzuser/Work_dir/Gn/preprocessed export nnUNet_results=/home/jzuser/Work_dir/Gn/results # 设置 PYTHONPATH export PYTHONPATH=/home/jzuser/Work_dir/Gn/pystudy/nnUNet/nnUNet:$PYTHONPATH # 进入项目目录 cd ~/Work_dir/Gn/pystudy/nnUNet/nnUNet # 手动运行预处理 python -m nnunetv2.experiment_planning.plan_and_preprocess_entrypoints.nnunetv2_plan_and_preprocess -d Task003_Liver --configuration 3d_fullres ``` --- ## ✅ 如果你希望修复 `setup.py`(可选) 你可以将下面的内容保存为 `setup.py` 来修复它: ```python from setuptools import setup, find_packages setup( name="nnunetv2", version="2.1", packages=find_packages(), entry_points={ 'console_scripts': [ 'nnunetv2_plan_and_preprocess = nnunetv2.experiment_planning.plan_and_preprocess_entrypoints:nnunetv2_plan_and_preprocess', 'nnunetv2_train = nnunetv2.run.run_training:main', 'nnunetv2_predict = nnunetv2.inference.predict:main', 'nnunetv2_convert_decathlon_task = nnunetv2.dataset_conversion.convert_Decathlon_dataset:main', ], }, ) ``` 保存后重新运行: ```bash pip install -e . ``` 然后你就可以使用命令: ```bash nnunetv2_plan_and_preprocess -d Task003_Liver --configuration 3d_fullres ``` --- ## ✅ 总结 | 目标 | 方法 | |------|------| | 运行预处理 | 使用 `python -m` 手动调用模块 | | 设置路径 | 设置 `nnUNet_raw`, `nnUNet_preprocessed`, `nnUNet_results` | | 修复 `setup.py` | 替换为空文件 → 使用上面的完整 `setup.py` 内容 | | 生成命令行脚本 | 安装修复后的 `setup.py` | --- ##
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值