FastSpeech2——TTS论文阅读

笔记地址:https://flowus.cn/share/1683b50b-1469-4d57-bef0-7631d39ac8f0
【FlowUs 息流】FastSpeech2

论文地址:lFastSpeech 2: Fast and High-Quality End-to-End Text to Speechicon-default.png?t=N7T8https://arxiv.org/abs/2006.04558

Abstract:

tacotron→fastspeech,引入knowledge distillation,缓解TTS中one-to-many problem。问题:teacher-student distillation pipeline 1)复杂速度慢 2)不够准确 3)学生模型是从教师模型输出的结果来学习,而不是直接学习mel图谱,导致信息缺失

fastspeech2的解决方案:1)直接从gt进行训练 2)引入更多条件输入:pitch, enerngy, accurate, duration。具体为:extract duration, pitch and energy from speech waveform and directly take them as conditional inputs in training and use predicted values in inference

1.Introduction:

fastspeech2改进之处:

1.直接使用gt来训练fastspeech2模型

2.为了缓解one-to-many problem,引入更多的声音condition;训练时,先从目标语音波形中提取pitch, energy, extrate duration,然后作为condition输入

3.音高energy难以预测且重要,采用方法we convert the pitch contour into pitch spectrogram using continuous wavelet transform and predict the pitch in the frequency domain, which can improve the accuracy of predicted pitch.

4.Fastspeech2s,不采用mel图谱,而是直接从text中生成语音波形

贡献:

  • FastSpeech 2 achieves a 3x training speed-up over FastSpeech by simplifying the training pipeline.
  • FastSpeech 2 alleviates the one-to-many mapping problem in TTS and achieves better voice quality.
  • FastSpeech 2s further simplifies the inference pipeline for speech synthesis while maintaining high voice quality, by directly generating speech waveform from text.

2.FastSpeech2 and 2s

2.1Motivation

解决自回归模型中one-to-many problem,fastspeech中teacher-student复杂,损失,不准确问题

2.2Model Overview

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值