RVC声音转换的使用

最新推荐文章于 2026-01-09 17:19:19 发布

原创最新推荐文章于 2026-01-09 17:19:19 发布 · 135 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#语音识别 #音频 #音视频 #算法

PyTorch 2.5

PyTorch

Cuda

PyTorch 是一个开源的 Python 机器学习库，基于 Torch 库，底层由 C++ 实现，应用于人工智能领域，如计算机视觉和自然语言处理

RVC对于想要替换声音配音来说应该是非常好用的一个工具了，这里我们也不用担心版权问题。选择使用RVC来进行声音的克隆和训练过程。
首先需要配置对应的环境，在win的conda环境中（注意需要先安装conda），我们新建一个py38：

conda create -n py38 python=3.8

进入py38

conda activate py38

安装torch环境：

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

在这里插入图片描述
等待安装完成

接着我们需要先安装win的cuda：
https://developer.nvidia.com/cuda-11-8-0-download-archive?target_os=Windows&target_arch=x86_64&target_version=11&target_type=exe_local
这里直接安装cuda11.8的即可。
在这里插入图片描述
x
下载本地的exe，我们直接安装，选择对应的安装位置。进入实际的安装界面

还会选择一次安装位置，建议还是不要在c盘吧。

安装完成之后直接打开一个cmd进行检查：

nvcc -V

在这里插入图片描述
出现这个就说明安装成功，可以使用了。
然后我们git源码
https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI

 git clone https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI.git

在这里插入图片描述
如果git没有安装需要先进行安装。https://git-scm.com/install/windows安装时一路默认安装即可。

有了项目过后我们直接安装依赖项：

pip install -r requirements.txt

在这里插入图片描述
注意，此时可能会遇到问题

WARNING: Ignoring version 2.0.6 of omegaconf since it has invalid metadata:
Requested omegaconf<2.1 from https://files.pythonhosted.org/packages/d0/eb/9d63ce09dd8aa85767c65668d5414958ea29648a0eec80a4a7d311ec2684/omegaconf-2.0.6-py3-none-any.whl (from fairseq==0.12.2->-r requirements.txt (line 8)) has invalid metadata: .* suffix can only be used with `==` or `!=` operators
    PyYAML (>=5.1.*)
            ~~~~~~^
Please use pip<24.1 if you need to use this version.
  Using cached omegaconf-2.0.5-py3-none-any.whl.metadata (3.0 kB)
WARNING: Ignoring version 2.0.5 of omegaconf since it has invalid metadata:
Requested omegaconf<2.1 from https://files.pythonhosted.org/packages/e5/f6/043b6d255dd6fbf2025110cea35b87f4c5100a181681d8eab496269f0d5b/omegaconf-2.0.5-py3-none-any.whl (from fairseq==0.12.2->-r requirements.txt (line 8)) has invalid metadata: .* suffix can only be used with `==` or `!=` operators
    PyYAML (>=5.1.*)
            ~~~~~~^
Please use pip<24.1 if you need to use this version.
INFO: pip is looking at multiple versions of hydra-core to determine which version is compatible with other requirements. This could take a while.
ERROR: Cannot install -r requirements.txt (line 8) and fairseq because these package versions have conflicting dependencies.

The conflict is caused by:
    fairseq 0.12.2 depends on omegaconf<2.1
    hydra-core 1.0.7 depends on omegaconf<2.1 and >=2.0.5

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip to attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

我们需要降低pip的版本：

pip install --upgrade pip==24.0

然后重新pip install。
在这里插入图片描述
然后我们需要通过hugging face下载一些预训练的模型

把他们下载到对应的assets中：
./assets/hubert/hubert_base.pt
./assets/pretrained
./assets/uvr5_weights
想使用v2版本模型的话，需要额外下载
./assets/pretrained_v2
比如
在这里插入图片描述
然后下载几个exe文件，包括ffmpeg等：
https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/ffmpeg.exe

https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/ffprobe.exe
下载完成后放置在对应的目录下：
在这里插入图片描述

接下来配置一下go-web.bat的启动方式：
在这里插入图片描述
使用vscode打开

替换一下exe的位置，使用自己创建的python的位置。即可正常运行：

将对应的音色模型和音频文件的位置填写正确即可。注意，音色模型文件需要放在assets下的weights里面。

其中index的文件可以不用填写，其他都需要对应的内容，然后点击转换
其中可能会因为没有一些模型文件导致出现错误：
在这里插入图片描述
这个就是没有rmvpe的模型，我们去huggingface中下载即可。注意这里的rmvpe_inputs.pth是一个空文件，不需要管。

转换完成以后可以试听，没有文件就可以进行下载了。

除了这个之外，还有实时变声的能力，通过运行gui_v1.py来实现（pip install FreeSimpleGUI sounddevice 或者通过各种requirements.txt来下载依赖项）
在这里插入图片描述
执行这个py可以看到：

注意，如果不懂python等东西的可以直接下载完整的包：
https://huggingface.co/lj1995/VoiceConversionWebUI/tree/main

这里面是有完整的python环境和配置好了的模型环境的，直接点击go-web.bat就能够运行使用。
之后我们再看怎么训练自己的音色以及实时的变声功能，这里我们试用了他的音色替换功能，非常迅速同时效果也很不错.