2021年语音合成论文统计（1~2月）

最新推荐文章于 2023-01-08 21:08:02 发布

原创最新推荐文章于 2023-01-08 21:08:02 发布 · 387 阅读

0 ·

CC 4.0 BY-SA版权

语音合成综述专栏收录该内容

44 篇文章

订阅专栏

部署运行你感兴趣的模型镜像

论文统计每月第一周更新一次，主要跟踪语音合成的发展状况(很多文章都是在会议后才发出，但不影响统计。统计过程难免存在疏漏，因此统计结果仅供参考。读者有什么建议可以直接向我发消息，我将不断修改该统计。历年文章统计可访问 http://yqli.tech/page/tts_paper.html）。如有转载，请注明出处。欢迎关注微信公众号：低调奋进。

语音合成文章情况表（单位：篇）

		1月	2月
前端	多音字，韵律，g2p等等。	1	0
声学模型	语言特征转声学特征，attention工作以及双重学习	1	7
声码器	波形生成	1	3
个性化	少数据，脏数据应用等	1	1
多语言	多语言多说话人模型	0	0
歌唱合成	歌唱和音乐合成	0	1
情感	风格和情感	2	2
多模态	talking head等等	2	1
声音转换	基于GAN方案和特征解耦方案	4	2
其它	基于EEG合成，数据，MOS评测以及语音合成的应用	1	1

文章列表：

1月

		类型
1	Supervised and Unsupervised Approaches for Controlling Narrow Lexical Focus in Sequence-to-Sequence Speech Synthesis	am
2	Polyphone Disambiguition in Mandarin Chinese with Semi-Supervised Learning	frontend
3	Generating coherent spontaneous speech and gesture from text	multimodality
4	Creating Song From Lip and Tongue Videos With a Convolutional Vocoder	multimodality
5	On Interfacing the Brain with Quantum Computers: An Approach to Listen to the Logic of the Mind	other
6	Whispered and Lombard Neural Speech Synthesis	expression
7	Expressive Neural Voice Cloning	expression/ personalization
8	High-Quality Vocoding Design with Signal Processing for Speech Synthesis and Voice Conversion	vc
9	EmoCat: Language-agnostic Emotional Voice Conversion	vc
10	Hierarchical disentangled representation learning for singing voice conversion	vc
11	Adversarially learning disentangled speech representations for robust multi-factor voice conversion	vc
12	Improved parallel WaveGAN vocoder with perceptually weighted spectrogram loss	vocoder

2月

1	Triple M: A Practical Neural Text-to-speech System With Multi-guidance Attention And Multi-band Multi-time Lpcnet	am
2	Mixture Density Network for Phone-Level Prosody Modelling in Speech Synthesis	am
3	VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention	am
4	Alternate Endings: Improving Prosody for Incremental Neural TTS with Predicted Future Text Input	am
5	Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech	am
6	LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search	am
7	Data-Efficient Training Strategies for Neural TTS Systemsmatch	am
8	Model architectures to extrapolate emotional expressions in DNN-based text-to-speech	expression
9	Model architectures to extrapolate emotional expressions in DNN-based text-to-speech	expression
10	SPEAK WITH YOUR HANDS Using Continuous Hand Gestures to control Articulatory Speech Synthesizer	modal
11	MBNet: MOS Prediction for Synthesized Speech with Mean-Bias Network	other
12	Voice Cloning: a Multi-Speaker Text-to-Speech Synthesis Approach based on Transfer Learning	personalization
13	Anyone GAN Sing	sing
14	Towards Natural and Controllable Cross-Lingual Voice Conversion Based on Neural TTS Model and Phonetic Posteriorgram	vc
15	Investigating Deep Neural Structures and their Interpretability in the Domain of Voice Conversion	vc
16	Universal Neural Vocoding with Parallel WaveNet	vocoder
17	LVCNet: Efficient Condition-Dependent Modeling Network for Waveform Generation	vocoder
18	High-Quality Vocoding Design with Signal Processing for Speech Synthesis and Voice Conversion	vocoder