笔记地址:https://flowus.cn/share/1683b50b-1469-4d57-bef0-7631d39ac8f0
【FlowUs 息流】FastSpeech2
论文地址:lFastSpeech 2: Fast and High-Quality End-to-End Text to Speechhttps://arxiv.org/abs/2006.04558
Abstract:
tacotron→fastspeech,引入knowledge distillation,缓解TTS中one-to-many problem。问题:teacher-student distillation pipeline 1)复杂速度慢 2)不够准确 3)学生模型是从教师模型输出的结果来学习,而不是直接学习mel图谱,导致信息缺失
fastspeech2的解决方案:1)直接从gt进行训练 2)引入更多条件输入:pitch, enerngy, accurate, duration。具体为:extract duration, pitch and energy from speech waveform and directly take them as conditional inputs in training and use predicted values in inference
1.Introduction:
fastspeech2改进之处:
1.直接使用gt来训练fastspeech2模型
2.为了缓解one-to-many problem,引入更多的声音condition;训练时,先从目标语音波形中提取pitch, energy, extrate duration,然后作为condition输入
3.音高energy难以预测且重要,采用方法we convert the pitch contour into pitch spectrogram using continuous wavelet transform and predict the pitch in the frequency domain, which can improve the accuracy of predicted pitch.
4.Fastspeech2s,不采用mel图谱,而是直接从text中生成语音波形
贡献:
- FastSpeech 2 achieves a 3x training speed-up over FastSpeech by simplifying the training pipeline.
- FastSpeech 2 alleviates the one-to-many mapping problem in TTS and achieves better voice quality.
- FastSpeech 2s further simplifies the inference pipeline for speech synthesis while maintaining high voice quality, by directly generating speech waveform from text.
2.FastSpeech2 and 2s
2.1Motivation
解决自回归模型中one-to-many problem,fastspeech中teacher-student复杂,损失,不准确问题