coqui（一）phoneme到sequence

原创已于 2022-03-21 12:56:07 修改 · 834 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#p2p #网络协议 #网络

于 2022-03-14 18:30:24 首次发布

该博客介绍了英文和中文的音素转换过程，包括使用gruut库进行英文句子的音素化，以及jieba分词和拼音转换来处理中文文本。此外，还详细展示了如何将音素转换为序列，如使用phoneme_to_sequence函数，并提到了clean_text函数用于文本预处理。整个流程涉及文本到音素，再到序列的转化，对于语音合成和自然语言处理等领域具有重要意义。

部署运行你感兴趣的模型镜像

英文g2p

text2phone函数

gruut.sentences(text, lang=language, espeak=use_espeak_phonemes)将句子分词，并将每个词划为音素序列

转化为用|划分的音素序列字符串

phoneme_to_sequence

将上述的音素序列字符串，分割，取每个音素的映射索引，加入到sequence列表中

取每个phoneme的索引用_phoneme_to_sequence，

phome到id的映射

make_symbols

phoneme字符在character config文件中(逗号,也在这里面)

_phonemes_to_id = {s: i for i, s in enumerate(_phonemes)} # 音素到索引的映射

textcleaner

clean_text = _clean_text(text, cleaner_names)清理换行符等符号

中文g2p

汉字文本转音素
chinese_text_to_phonemes(text: str) -> str

text 卡尔普陪外孙玩滑梯。

tokenized_text = jieba.cut(text, HMM=False)
tokenized_text = " ".join(tokenized_text)

tokenized_text 卡尔普陪外孙玩滑梯。

pinyined_text: List[str] = _chinese_character_to_pinyin(tokenized_text)

pinyined_text ['ka3', 'er3', 'pu3', ' ', 'pei2', ' ', 'wai4', 'sun1', ' ', 'wan2', ' ', 'hua2', 'ti1', ' 。']

results: List[str] = []
for token in pinyined_text:
    if token[-1] in "12345":  # TODO transform to is_pinyin()
        print("token",token)
        pinyin_phonemes = _chinese_pinyin_to_phoneme(token) 拼音转国际音标phoneme
        print("pinyin_phonemes",pinyin_phonemes)
        results += list(pinyin_phonemes)
    else:  # is ponctuation or other 分隔符直接加
        results += list(token)
print("results",results)

例：老虎幼崽与宠物犬玩耍。

results ['l', 'a', 'ʌ', '3', 'x', 'u', '3', ' ', 'i', 'o', '4', ' ', 'd', 'z', 'a', 'i', '3', ' ', 'y', '3', ' ', 'ʈ', 'ʂ', 'o', 'ŋ', '3', 'w', 'u', '4', 't', 'ɕ', 'y', 'ɛ', 'n', '3', ' ', 'w', 'a', 'n', '2', 'ʂ', 'u', 'a', '3', ' ', '。']

return "|".join(results)

|加入 l|a|ʌ|3|x|u|3| |i|o|4| |d|z|a|i|3| |y|3| |ʈ|ʂ|o|ŋ|3|w|u|4|t|ɕ|y|ɛ|n|3| |w|a|n|2|ʂ|u|a|3| |。

音素到序列
phoneme_to_sequence

custom_symbols为TTSDataset里面的属性

custom_symbols ['_', '!', "'", '(', ')', ',', '-', '.', ':', ';', '?', ' ', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '
_phonemes_to_id custom_symbols{'_': 0, '!': 64, "'": 65, '(': 66, ')': 67, '

clean_text 老虎幼崽与宠物犬玩耍。
to_phonemes l|a|ʌ|3|x|u|3| |i|o|4| |d|z|a|i|3| |y|3| |ʈ|ʂ|o|ŋ|3|w|u|4|t|ɕ|y|ɛ|n|3| |w|a|n|2|ʂ|u|a|3| |。

sequence [169, 99, 95, 154, 80, 74, 75, 90, 74, 117, 147, 99, 75, 74, 76, 74, 118, 150, 90, 128, 178, 80, 116, 183, 76, 91, 131, 74, 178, 99, 131, 150, 80, 99, 74]

blank_sequence [191, 169, 191, 99, 191, 95, 191, 154, 191, 80, 191, 74, 191, 75, 191, 90, 191, 74, 191, 117, 191, 147, 191, 99, 191, 75, 191, 74, 191, 76, 191, 74, 191, 118, 191, 150, 191, 90, 191, 128, 191, 178, 191...

-》array

您可能感兴趣的与本文相关的镜像

HunyuanVideo-Foley

语音合成

HunyuanVideo-Foley是由腾讯混元2025年8月28日宣布开源端到端视频音效生成模型，用户只需输入视频和文字，就能为视频匹配电影级音效