【开源项目】Flow Matching 语音合成

部署运行你感兴趣的模型镜像

CFM是一种新技术,已被证明可以改进扩散模型,Meta的Voicebox模型将CFM引入语音合成领域,下面是voicebox的一个工作流程图

Matcha-TTS是第一个开源conditional normalising flows语音合成项目,提供基于 LJSpeech 和 VCTK 数据预训练模型以供测评

Matcha-TTS有两个主要的贡献和其他建议:

1. We propose an improved encoder-decoder TTS architecture that uses a combination of 1D CNNs and Transformers in the decoder. This reduces memory consumption and is fast to evaluate, improving synthesis speed.

相对于Grad-TTS的decoder,使用了1D CNNs替换2D CNNs、并加入Transformers块

2. We train these models using optimal-transport conditional flow matching (OT-CFM) , which is a new method to learn ODEs that sample from a data distribution. Compared to conventional CNFs and score-matching probability flow ODEs, OT-CFM defines simpler paths from source to target, enabling accurate synthesis in fewer steps than DPMs.

使用Flow Matching加速技术

3. 使用旋转位置编码(rotational position embeddings) RoPE,减少存储

4. 使用MAS对齐

5. 使用snake beta激活函数

  开源地址:

https://github.com/shivammehta25/Matcha-TTS

  工程展示:

https://shivammehta25.github.io/Matcha-TTS/

  在线推理:

https://huggingface.co/spaces/shivammehta25/Matcha-TTS

  中文实现:

https://github.com/PlayVoice/Grad-TTS-Chinese 

(Grad-TTS-CFM,其他优化还未集成)

模型架构:

性能指标:

推理界面:

中文测试句子:

时光仿佛有穿越到了从前,在你诗情画意的眼波中,在你舒适浪漫的暇思里,我如风中的思绪徜徉广阔天际,仿佛一片沾染了快乐的羽毛,在云环影绕颤动里浸润着风的呼吸,风的诗韵,那清新的耳语,那婉约的甜蜜,那恬淡的温馨,将一腔情澜染得愈发的缠绵。(Grad-TTS-CFM,使用BigVGAN通用声码器,优化1&3&5还未集成,还有明显发音错误)

您可能感兴趣的与本文相关的镜像

HunyuanVideo-Foley

HunyuanVideo-Foley

语音合成

HunyuanVideo-Foley是由腾讯混元2025年8月28日宣布开源端到端视频音效生成模型,用户只需输入视频和文字,就能为视频匹配电影级音效

### Flow Matching in Machine Learning and Data Processing In the context of identity resolution, flow matching refers to a specific type of probabilistic matching technique where algorithms assess sequences or flows of events rather than static attributes alone. This approach leverages patterns over time or across various interactions to reconcile identities more accurately. #### Conceptual Overview Flow matching extends beyond simple attribute comparison by considering temporal dynamics and interaction histories between entities. For instance, when dealing with user identification within systems like Salesforce's Data Cloud, not only are direct matches on emails considered but also behavioral patterns that could indicate the same individual using different aliases such as Matthew versus Matt[^1]. This method enhances accuracy through: - **Temporal Analysis**: Evaluating how actions evolve over periods. - **Behavioral Patterns**: Recognizing consistent behaviors despite variations in identifiers. - **Context Awareness**: Understanding situational factors influencing identifier usage. #### Implementation Methodology Implementing flow matching involves several stages including preprocessing, feature extraction, model training, and evaluation. Probabilistic programming languages facilitate this process due to their ability to represent complex models incorporating recursive structures and conditional logic[^2]. Here’s a simplified Python code snippet demonstrating basic principles behind implementing flow-based matching: ```python import numpy as np from sklearn.mixture import GaussianMixture def preprocess_data(raw_events): """Preprocess raw event logs into structured format.""" processed = [] for event_sequence in raw_events: features = extract_features(event_sequence) processed.append(features) return np.array(processed) def train_flow_model(data): """Train GMM on preprocessed data representing flows.""" gmm = GaussianMixture(n_components=5, covariance_type='full') gmm.fit(data) return gmm raw_event_logs = [...] # List of lists containing sequential events per entity preprocessed_flows = preprocess_data(raw_event_logs) flow_matching_model = train_flow_model(preprocessed_flows) ``` The above script illustrates fitting a Gaussian Mixture Model (GMM) to learn distributions from extracted features derived from sequences of events associated with each entity. Such models capture underlying structure enabling effective reconciliation even under alias conditions.
评论 1
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值