From 05645a6593582cc766321c0ba51946e64f83470b Mon Sep 17 00:00:00 2001
From: RVC-Boss <129054828+RVC-Boss@users.noreply.github.com>
Date: Tue, 11 Feb 2025 21:25:43 +0800
Subject: [PATCH] gpt_sovits_v3

gpt_sovits_v3
---
 README.md | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index 1fa36a7..e118ece 100644
--- a/README.md
+++ b/README.md
@@ -269,12 +269,12 @@ Use v2 from v1 environment:
 - [ ] **Features:**
   - [x] Zero-shot voice conversion (5s) / few-shot voice conversion (1min).
   - [x] TTS speaking speed control.
-  - [ ] ~~Enhanced TTS emotion control.~~
+  - [ ] ~~Enhanced TTS emotion control.~~ Maybe use pretrained finetuned preset GPT models for better emotion.
   - [ ] Experiment with changing SoVITS token inputs to probability distribution of GPT vocabs (transformer latent).
   - [x] Improve English and Japanese text frontend.
   - [ ] Develop tiny and larger-sized TTS models.
   - [x] Colab scripts.
-  - [ ] Try expand training dataset (2k hours -> 10k hours).
+  - [x] Try expand training dataset (2k hours -> 10k hours).
   - [x] better sovits base model (enhanced audio quality)
   - [ ] model mix
 
@@ -321,9 +321,11 @@ Special thanks to the following projects and contributors:
 - [contentvec](https://github.com/auspicious3000/contentvec/)
 - [hifi-gan](https://github.com/jik876/hifi-gan)
 - [fish-speech](https://github.com/fishaudio/fish-speech/blob/main/tools/llama/generate.py#L41)
+- [f5-TTS](https://github.com/SWivid/F5-TTS/blob/main/src/f5_tts/model/backbones/dit.py)
 ### Pretrained Models
 - [Chinese Speech Pretrain](https://github.com/TencentGameMate/chinese_speech_pretrain)
 - [Chinese-Roberta-WWM-Ext-Large](https://huggingface.co/hfl/chinese-roberta-wwm-ext-large)
+- [BigVGAN](https://github.com/NVIDIA/BigVGAN)
 ### Text Frontend for Inference
 - [paddlespeech zh_normalization](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/paddlespeech/t2s/frontend/zh_normalization)
 - [LangSegment](https://github.com/juntaosun/LangSegment)