diff --git a/README.md b/README.md index cbd16b7..82d1aa0 100644 --- a/README.md +++ b/README.md @@ -4,11 +4,31 @@ I am organizing and uploading the codes. It will be public in one day. https://www.bilibili.com/video/BV12g4y1m7Uw/ -todo +features: + +1、input 5s vocal, zero shot TTS + +2、1min training dataset, fine tune (few shot TTS. The TTS model trained using few-shot techniques exhibits significantly better similarity and realism in the speaker's voice compared to zero-shot.) + +3、Cross lingual (inference another language that is different from the training dataset language), now support English, Japanese and Chinese + +4、This WebUI integrates tools such as voice accompaniment separation, automatic segmentation of training sets, Chinese ASR, text labeling, etc., to help beginners quickly create their own training datasets and GPT/SoVITS models. # todolist -todo +1、zero shot voice conversion(5s) /few shot voice converion(1min) + +2、TTS speaking speed control + +3、more TTS emotion control + +4、experiment about change sovits token inputs to probability distribution of vocabs + +5、better English and Japanese text frontend + +6、tiny version and larger-sized TTS models + +7、colab scripts # Requirments (How to install) @@ -77,8 +97,54 @@ to tools/uvr5/uvr5_weights +# dataset format + +The format of the TTS annotation .list file: + +vocal path|speaker_name|language|text + +e.g. D:\GPT-SoVITS\xxx/xxx.wav|xxx|en|I like playing Genshin. + +language dictionary: + + 'zh': Chinese + + "ja": Japanese + + 'en': English + + + # Credits -todo +https://github.com/innnky/ar-vits + +https://github.com/yangdongchao/SoundStorm/tree/master/soundstorm/s1/AR + +https://github.com/jaywalnut310/vits + +https://github.com/hcy71o/TransferTTS/blob/master/models.py#L556 + +https://github.com/TencentGameMate/chinese_speech_pretrain + +https://github.com/auspicious3000/contentvec/ + +https://github.com/jik876/hifi-gan + +https://huggingface.co/hfl/chinese-roberta-wwm-ext-large + +https://github.com/fishaudio/fish-speech/blob/main/tools/llama/generate.py#L41 + +https://github.com/Anjok07/ultimatevocalremovergui + +https://github.com/openvpi/audio-slicer + +https://github.com/cronrpc/SubFix + +https://modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch + +https://github.com/FFmpeg/FFmpeg + +https://github.com/gradio-app/gradio