mirror of
https://github.com/RVC-Boss/GPT-SoVITS.git
synced 2025-06-01 13:19:17 +08:00
gpt_sovits_v3
gpt_sovits_v3
This commit is contained in:
parent
0db482e87d
commit
05645a6593
@ -269,12 +269,12 @@ Use v2 from v1 environment:
|
||||
- [ ] **Features:**
|
||||
- [x] Zero-shot voice conversion (5s) / few-shot voice conversion (1min).
|
||||
- [x] TTS speaking speed control.
|
||||
- [ ] ~~Enhanced TTS emotion control.~~
|
||||
- [ ] ~~Enhanced TTS emotion control.~~ Maybe use pretrained finetuned preset GPT models for better emotion.
|
||||
- [ ] Experiment with changing SoVITS token inputs to probability distribution of GPT vocabs (transformer latent).
|
||||
- [x] Improve English and Japanese text frontend.
|
||||
- [ ] Develop tiny and larger-sized TTS models.
|
||||
- [x] Colab scripts.
|
||||
- [ ] Try expand training dataset (2k hours -> 10k hours).
|
||||
- [x] Try expand training dataset (2k hours -> 10k hours).
|
||||
- [x] better sovits base model (enhanced audio quality)
|
||||
- [ ] model mix
|
||||
|
||||
@ -321,9 +321,11 @@ Special thanks to the following projects and contributors:
|
||||
- [contentvec](https://github.com/auspicious3000/contentvec/)
|
||||
- [hifi-gan](https://github.com/jik876/hifi-gan)
|
||||
- [fish-speech](https://github.com/fishaudio/fish-speech/blob/main/tools/llama/generate.py#L41)
|
||||
- [f5-TTS](https://github.com/SWivid/F5-TTS/blob/main/src/f5_tts/model/backbones/dit.py)
|
||||
### Pretrained Models
|
||||
- [Chinese Speech Pretrain](https://github.com/TencentGameMate/chinese_speech_pretrain)
|
||||
- [Chinese-Roberta-WWM-Ext-Large](https://huggingface.co/hfl/chinese-roberta-wwm-ext-large)
|
||||
- [BigVGAN](https://github.com/NVIDIA/BigVGAN)
|
||||
### Text Frontend for Inference
|
||||
- [paddlespeech zh_normalization](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/paddlespeech/t2s/frontend/zh_normalization)
|
||||
- [LangSegment](https://github.com/juntaosun/LangSegment)
|
||||
|
Loading…
x
Reference in New Issue
Block a user