Update README.md

2026-07-04 13:01:09 +08:00 · 2024-01-16 22:29:50 +08:00 · 2024-01-16 22:29:50 +08:00 · 23b5979889
commit 23b5979889
parent 0e2467ace4
1 changed files with 69 additions and 3 deletions
--- a/README.md
+++ b/README.md
@ -4,11 +4,31 @@ I am organizing and uploading the codes. It will be public in one day.

 https://www.bilibili.com/video/BV12g4y1m7Uw/

-todo
+features:
+
+1、input 5s vocal, zero shot TTS
+
+2、1min training dataset, fine tune (few shot TTS. The TTS model trained using few-shot techniques exhibits significantly better similarity and realism in the speaker's voice compared to zero-shot.)
+
+3、Cross lingual (inference another language that is different from the training dataset language), now support English, Japanese and Chinese
+
+4、This WebUI integrates tools such as voice accompaniment separation, automatic segmentation of training sets, Chinese ASR, text labeling, etc., to help beginners quickly create their own training datasets and GPT/SoVITS models.

 # todolist

-todo
+1、zero shot voice conversion(5s) /few shot voice converion(1min)
+
+2、TTS speaking speed control
+
+3、more TTS emotion control
+
+4、experiment about change sovits token inputs to probability distribution of vocabs
+
+5、better English and Japanese text frontend
+
+6、tiny version and larger-sized TTS models
+
+7、colab scripts

 # Requirments (How to install)

@ -77,8 +97,54 @@ to

 tools/uvr5/uvr5_weights

+# dataset format
+
+The format of the TTS annotation .list file:
+
+vocal path|speaker_name|language|text
+
+e.g. D:\GPT-SoVITS\xxx/xxx.wav|xxx|en|I like playing Genshin.
+
+language dictionary:
+
+    'zh': Chinese
+    
+    "ja": Japanese
+    
+    'en': English
+    
+
+
 # Credits

-todo
+https://github.com/innnky/ar-vits
+
+https://github.com/yangdongchao/SoundStorm/tree/master/soundstorm/s1/AR
+
+https://github.com/jaywalnut310/vits
+
+https://github.com/hcy71o/TransferTTS/blob/master/models.py#L556
+
+https://github.com/TencentGameMate/chinese_speech_pretrain
+
+https://github.com/auspicious3000/contentvec/
+
+https://github.com/jik876/hifi-gan
+
+https://huggingface.co/hfl/chinese-roberta-wwm-ext-large
+
+https://github.com/fishaudio/fish-speech/blob/main/tools/llama/generate.py#L41
+
+https://github.com/Anjok07/ultimatevocalremovergui
+
+https://github.com/openvpi/audio-slicer
+
+https://github.com/cronrpc/SubFix
+
+https://modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch
+
+https://github.com/FFmpeg/FFmpeg
+
+https://github.com/gradio-app/gradio