语音合成(TTS)论文优选:Cross-lingual Multispeaker Text-to-Speech under Limited-Data Scenario

部署运行你感兴趣的模型镜像

声明:语音合成(TTS)论文优选系列主要分享论文,分享论文不做直接翻译,所写的内容主要是我对论文内容的概括和个人看法。如有转载,请标注来源。欢迎关注微信公众号:低调奋进

Cross-lingual Multispeaker Text-to-Speech under Limited-Data Scenario

本篇文章是做TTS 语言交叉转换的方向,是杜克大学发表的文章,更新2020.05.21.具体的文章链接http://yqli.tech/pdf/tts_paper/2020%20Cross%20lingual%20Multispeaker%20Text%20to%20Speech%20under%20Limited%20Data%20Scenario.pdf

1 研究背景

国际化的发展,造成多种语言的交叉使用,这对TTS提出了一种挑战:模型不仅需要支持多种语言,还要支持语言之间的切换自然。然而,大部分企业手中拥有不同说话人不同语言的语料,要想获取同一说话人不同语言的语料需要花费昂贵的成本。本文章设计了支持多说人多语言和语言切换的TTS,该TTS只需要手中拥有不同语言的训练语料即可。

2 详细的系统设计

TTS框架进行多语言模型的设计,需要考虑多语言之间的兼容,比如英文和中文的的输入集设计。从事TTS的都知道,目前的输入集设计方案主要包括按照character,phoneme或者bytes等方式进行输入。往年的文章已经做实验证实使用phoneme效果最好。为了支持多语言和语言转换,本文章添加了language tokens序列,该序列跟phoneme seq是一对一关系。当输入时候,language token 和phoneme seq拼接在一起,输入到tacotorn2的encoder部分。另外为支持多发音人还需要还需要拼接speaker 信息,其它的decoder部分没有改变,详细的系统设计如图一:(就我个人的经验来说,还需要添加DAT模块,使其发音人信息和语言信息进行解耦,这样学习的效果更好)

语音合成(TTS)论文优选:Cross-lingual

3 实验结果

不对本文章的试验参数进行阐述,平时多看文章就可以猜到大体的参数。本文章对语音的自然度,相似度和可懂度等多个方面进行评估,由结果可知(Table 2 ,3 ,4),单语言发音人可以发出多种语言,而且可以进行语言切换。另外,由图2的对齐图片可知,要是拥有语言混合的语料,则语言切换的效果会更好。

语音合成(TTS)论文优选:Cross-lingual

语音合成(TTS)论文优选:Cross-lingual

语音合成(TTS)论文优选:Cross-lingual

4 总结

本文章通过添加lg tokens使TTS支持multilingual以及code-switch,这种设计很好,不过我感觉添加一个DAT(domain adversarial training)的效果会更好。

您可能感兴趣的与本文相关的镜像

HunyuanVideo-Foley

HunyuanVideo-Foley

语音合成

HunyuanVideo-Foley是由腾讯混元2025年8月28日宣布开源端到端视频音效生成模型,用户只需输入视频和文字,就能为视频匹配电影级音效

Here are some methods to defend against cross - lingual prompt injection: ### Input Validation and Sanitization - **Character and Syntax Checks**: Validate the input to ensure it only contains expected characters and follows the correct syntax for the language and the system's requirements. For example, if the system expects only alphanumeric characters in a certain field, reject inputs with special characters that could be used for injection. ```python import re def validate_input(input_str): pattern = r'^[a-zA-Z0-9]+$' return bool(re.match(pattern, input_str)) input_text = "validinput123" if validate_input(input_text): print("Input is valid.") else: print("Input may be malicious.") ``` - **Length Limitation**: Set reasonable length limits for user inputs. Long inputs may be more likely to contain malicious injection attempts. ### Encoding and Escaping - **Proper Encoding**: Use appropriate encoding for user inputs, such as UTF - 8. This can prevent some encoding - related injection attacks. - **Escaping Special Characters**: Escape special characters in the input to prevent them from being interpreted as part of a malicious command. For example, in SQL, characters like single quotes (' ) need to be properly escaped. ```python import sqlite3 def escape_input(input_str): return input_str.replace("'", "''") input_text = "O'Connor" escaped_text = escape_input(input_text) conn = sqlite3.connect('example.db') cursor = conn.cursor() query = f"SELECT * FROM users WHERE name = '{escaped_text}'" cursor.execute(query) ``` ### Context - Aware Filtering - **Understand the Context**: Analyze the context in which the input is used. For example, if the input is used in a translation context, filter out words or phrases that are not relevant to normal translation requests and may be injection attempts. - **Language - Specific Rules**: Apply language - specific rules and filters. Different languages have different grammar, vocabulary, and common patterns. Use these to identify abnormal inputs. ### Model - Based Detection - **Anomaly Detection Models**: Train machine learning or deep learning models to detect abnormal patterns in user inputs. These models can be trained on a large dataset of normal and malicious inputs. ```python import tensorflow as tf from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense # Assume X_train and y_train are pre - processed training data model = Sequential([ Dense(64, activation='relu', input_shape=(input_dim,)), Dense(32, activation='relu'), Dense(1, activation='sigmoid') ]) model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy']) model.fit(X_train, y_train, epochs=10, batch_size=32) ``` ### Isolation and Sandboxing - **Isolate User Inputs**: Run operations involving user inputs in isolated environments or sandboxes. This can prevent malicious code from affecting the main system. For example, use containerization technologies like Docker to isolate translation tasks.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

我叫永强

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值