Diffusion-VITS：VITS与Grad-TTS的融合-编程知识

Diffusion-VITS：VITS与Grad-TTS的融合

Grad-TTS的核心思想：把diffusion当做一个postnet（或者Plug-In）用于特征增强。因此，它可以是一种通用模块应用于任何网络中，典型的作为FastSpeech2的后处理模块。这里，作者以VITS的SVC场景为例，提供Grad-TTS融合进VITS的思想和代码。

思想：

1，训练原始VITS模型

具体实现，略~~~

2，训练插件Diffusion模型

1）冻结原始VITS模型所有参数

2）训练Diffusion模型学习Flow推理结果与wave后验编码结果Z之间的noise

3，Diffusion可以减少Flow推理结果与真值之间的Gap，可以减缓过平滑问题

代码：

VITS歌声转换中实现Plug-In-Diffsuion的代码（语音合成同样适用）：

https://github.com/PlayVoice/so-vits-svc-5.0/tree/plug-in-diffusion

满足MIT协议下，该代码的使用无限制

下面是架构原理图，与操作步骤

Plug-in diffusion based on Grad-TTS from HUAWEI Noah's Ark Lab

Base framework ~~~

Plug-In-Diffusion

Notices

It looks like it's useless, but it seems to be somewhat useful

好像没啥用，好像有点用

训练

Complete the training of the bigvgan-mix-v2 master model

完成 bigvgan-mix-v2 主模型的训练
Create a working path and pull the branch codes: different from the bigvgan-mix-v2

创建工作路径，拉取分支代码：与 bigvgan-mix-v2 不同
install additional dependencies for diffusion:

为 diffusion 安装额外依赖:

pip install einops
Copy bigvgan-mix-v2 training data data_svc and files to the current working directory: same as bigvgan-mix-v2 training data

拷贝 bigvgan-mix-v2 的训练数据 data_svc 与 files 到当前工作目录：与 bigvgan-mix-v2 训练数据一样
Specify the master model path in configs/base.yaml:

在 configs/base.yaml 中指定主模型路径:

pretrain: "bigvgan-mix-v2/chkpt/sovits5.0/sovits5.0_0500.pt"
Start train

启动训练

python svc_trainer.py --config configs/base.yaml --name plug

Check the log to be sure: your master model is loaded


python svc_trainer.py --config configs/base.yaml --name plug
Batch size per GPU : 8
----------10----------
2023-09-06 06:31:23,136 - INFO - Start from 32k pretrain model: sovits5.0_1100. pt
plug.estimator.spk_mlp.0.weight is not in the checkpoint
plug.estimator.spk_mlp.0.bias is not in the checkpoint
plug.estimator.spk_mlp.2.weight is not in the checkpoint
plug.estimator.spk_mlp.2.bias is not in the checkpoint
plug.estimator.mlp.0.weight is not in the checkpoint
plug.estimator.mlp.0.bias is not in the checkpoint
plug.estimator.mlp.2.weight is not in the checkpoint
plug.estimator.mlp.2.bias is not in the checkpoint
plug.estimator.downs.0.0.mlp.1.weight is not in the checkpoint
plug.estimator.downs.0.0.mlp.1.bias is not in the checkpoint
plug.estimator.downs.0.0.block1.block.0.weight is not in the checkpoint
plug.estimator.downs.0.0.block1.block.0.bias is not in the checkpoint

Inference

python svc_inference.py --config configs/base.yaml --model chkpt/plug/plug_***.pt --spk ./data_svc/singer/your_singer.spk.npy --wave test.wav

svc_inference.py has a small changes from bigvgan-mix-v2

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.hqwc.cn/news/106720.html

如若内容造成侵权/违法违规/事实不符，请联系编程知识网进行投诉反馈email:809451989@qq.com，一经查实，立即删除！

Diffusion-VITS：VITS与Grad-TTS的融合

相关文章

ipad手写笔什么牌子好？apple pencil二代平替笔推荐

穷举深搜暴搜回溯剪枝(4)

轻松搭建本地知识库的ChatGLM2-6B

华为云云耀云服务器L实例评测｜教你如何使用云服务器L实例

lv3 嵌入式开发-9 linux TFTP服务器搭建及使用

通过阿贝云免费云服务器部署vue3+vite项目

Java 线程池

ARM接口编程—UART(exynos 4412平台)

景联文科技可为多模态语音翻译模型提供数据采集支持

pytorch从0开始安装

C#__线程池的简单介绍和使用

AI是风口还是泡沫？