mirror of
https://github.com/RVC-Boss/GPT-SoVITS.git
synced 2025-04-06 03:57:44 +08:00
Update README.md
This commit is contained in:
parent
0e2467ace4
commit
23b5979889
72
README.md
72
README.md
@ -4,11 +4,31 @@ I am organizing and uploading the codes. It will be public in one day.
|
|||||||
|
|
||||||
https://www.bilibili.com/video/BV12g4y1m7Uw/
|
https://www.bilibili.com/video/BV12g4y1m7Uw/
|
||||||
|
|
||||||
todo
|
features:
|
||||||
|
|
||||||
|
1、input 5s vocal, zero shot TTS
|
||||||
|
|
||||||
|
2、1min training dataset, fine tune (few shot TTS. The TTS model trained using few-shot techniques exhibits significantly better similarity and realism in the speaker's voice compared to zero-shot.)
|
||||||
|
|
||||||
|
3、Cross lingual (inference another language that is different from the training dataset language), now support English, Japanese and Chinese
|
||||||
|
|
||||||
|
4、This WebUI integrates tools such as voice accompaniment separation, automatic segmentation of training sets, Chinese ASR, text labeling, etc., to help beginners quickly create their own training datasets and GPT/SoVITS models.
|
||||||
|
|
||||||
# todolist
|
# todolist
|
||||||
|
|
||||||
todo
|
1、zero shot voice conversion(5s) /few shot voice converion(1min)
|
||||||
|
|
||||||
|
2、TTS speaking speed control
|
||||||
|
|
||||||
|
3、more TTS emotion control
|
||||||
|
|
||||||
|
4、experiment about change sovits token inputs to probability distribution of vocabs
|
||||||
|
|
||||||
|
5、better English and Japanese text frontend
|
||||||
|
|
||||||
|
6、tiny version and larger-sized TTS models
|
||||||
|
|
||||||
|
7、colab scripts
|
||||||
|
|
||||||
# Requirments (How to install)
|
# Requirments (How to install)
|
||||||
|
|
||||||
@ -77,8 +97,54 @@ to
|
|||||||
|
|
||||||
tools/uvr5/uvr5_weights
|
tools/uvr5/uvr5_weights
|
||||||
|
|
||||||
|
# dataset format
|
||||||
|
|
||||||
|
The format of the TTS annotation .list file:
|
||||||
|
|
||||||
|
vocal path|speaker_name|language|text
|
||||||
|
|
||||||
|
e.g. D:\GPT-SoVITS\xxx/xxx.wav|xxx|en|I like playing Genshin.
|
||||||
|
|
||||||
|
language dictionary:
|
||||||
|
|
||||||
|
'zh': Chinese
|
||||||
|
|
||||||
|
"ja": Japanese
|
||||||
|
|
||||||
|
'en': English
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
# Credits
|
# Credits
|
||||||
|
|
||||||
todo
|
https://github.com/innnky/ar-vits
|
||||||
|
|
||||||
|
https://github.com/yangdongchao/SoundStorm/tree/master/soundstorm/s1/AR
|
||||||
|
|
||||||
|
https://github.com/jaywalnut310/vits
|
||||||
|
|
||||||
|
https://github.com/hcy71o/TransferTTS/blob/master/models.py#L556
|
||||||
|
|
||||||
|
https://github.com/TencentGameMate/chinese_speech_pretrain
|
||||||
|
|
||||||
|
https://github.com/auspicious3000/contentvec/
|
||||||
|
|
||||||
|
https://github.com/jik876/hifi-gan
|
||||||
|
|
||||||
|
https://huggingface.co/hfl/chinese-roberta-wwm-ext-large
|
||||||
|
|
||||||
|
https://github.com/fishaudio/fish-speech/blob/main/tools/llama/generate.py#L41
|
||||||
|
|
||||||
|
https://github.com/Anjok07/ultimatevocalremovergui
|
||||||
|
|
||||||
|
https://github.com/openvpi/audio-slicer
|
||||||
|
|
||||||
|
https://github.com/cronrpc/SubFix
|
||||||
|
|
||||||
|
https://modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch
|
||||||
|
|
||||||
|
https://github.com/FFmpeg/FFmpeg
|
||||||
|
|
||||||
|
https://github.com/gradio-app/gradio
|
||||||
|
|
||||||
|
|
||||||
|
Loading…
x
Reference in New Issue
Block a user