From 05645a6593582cc766321c0ba51946e64f83470b Mon Sep 17 00:00:00 2001 From: RVC-Boss <129054828+RVC-Boss@users.noreply.github.com> Date: Tue, 11 Feb 2025 21:25:43 +0800 Subject: [PATCH] gpt_sovits_v3 gpt_sovits_v3 --- README.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 1fa36a7..e118ece 100644 --- a/README.md +++ b/README.md @@ -269,12 +269,12 @@ Use v2 from v1 environment: - [ ] **Features:** - [x] Zero-shot voice conversion (5s) / few-shot voice conversion (1min). - [x] TTS speaking speed control. - - [ ] ~~Enhanced TTS emotion control.~~ + - [ ] ~~Enhanced TTS emotion control.~~ Maybe use pretrained finetuned preset GPT models for better emotion. - [ ] Experiment with changing SoVITS token inputs to probability distribution of GPT vocabs (transformer latent). - [x] Improve English and Japanese text frontend. - [ ] Develop tiny and larger-sized TTS models. - [x] Colab scripts. - - [ ] Try expand training dataset (2k hours -> 10k hours). + - [x] Try expand training dataset (2k hours -> 10k hours). - [x] better sovits base model (enhanced audio quality) - [ ] model mix @@ -321,9 +321,11 @@ Special thanks to the following projects and contributors: - [contentvec](https://github.com/auspicious3000/contentvec/) - [hifi-gan](https://github.com/jik876/hifi-gan) - [fish-speech](https://github.com/fishaudio/fish-speech/blob/main/tools/llama/generate.py#L41) +- [f5-TTS](https://github.com/SWivid/F5-TTS/blob/main/src/f5_tts/model/backbones/dit.py) ### Pretrained Models - [Chinese Speech Pretrain](https://github.com/TencentGameMate/chinese_speech_pretrain) - [Chinese-Roberta-WWM-Ext-Large](https://huggingface.co/hfl/chinese-roberta-wwm-ext-large) +- [BigVGAN](https://github.com/NVIDIA/BigVGAN) ### Text Frontend for Inference - [paddlespeech zh_normalization](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/paddlespeech/t2s/frontend/zh_normalization) - [LangSegment](https://github.com/juntaosun/LangSegment)