OuteTTS 发布新的版本,0.2 版本使用了 Qwen-2.5-0.5B 作为基础模型进行训练,OuteTTS 使用了 WavTokenizer,并将文字 Token 与 语音 Token 做了一对一的对应。TTS 也支持自定义读音。
OuteTTS 文字转音频
#安装依赖
pip install outetts
import outetts
# Configure the model
model_config = outetts.HFModelConfig_v1(
model_path="OuteAI/OuteTTS-0.2-500M",
language="zh", # Supported languages in v0.2: en, zh, ja, ko
)
# Initialize the interface
interface = outetts.InterfaceHF(model_version="0.2", cfg=model_config)
# 以下代码为自定义读音代码,更改对应路径即可
# Optional: Create a speaker profile (use a 10-15 second audio clip)
# speaker = interface.create_speaker(
# audio_path="path/to/audio/file",
# transcript="Transcription of the audio file."
# )
# Optional: Save and load speaker profiles
# interface.save_speaker(speaker, "speaker.json")
# speaker = interface.load_speaker("speaker.json")
# Optional: Load speaker from default presets
interface.print_default_speakers()
speaker = interface.load_default_speaker(name="female_1")
output = interface.generate(
text="""黄花又名忘忧草,既能食用,也能药用。""",
# Lower temperature values may result in a more stable tone,
# while higher values can introduce varied and expressive speech
temperature=0.1,
repetition_penalty=1.1,
max_length=4096,
# Optional: Use a speaker profile for consistent voice characteristics
# Without a speaker profile, the model will generate a voice with random characteristics
speaker=speaker,
)
# Save the synthesized speech to a file
output.save("output.wav")
# Optional: Play the synthesized speech
# output.play()
总结
OuteTTS 中文效果不如 F5-TTS,英文效果还是不错的,读数字也不是很好,需要先转换一下,OuteTTS 最大长度 4096, 更长的文字需要自己进行切分。
1195

被折叠的 条评论
为什么被折叠?



