Jarod's NOTE
Working on turning this into a package. Right now, the API does in fact work to make requests to and this can be installed.
Quick Install and Usage
Ideally, do this all inside of a venv for package isolation
- Install by doing:
pip install git+https://github.com/JarodMica/GPT-SoVITS-Package.git
- Make sure torch is installed with CUDA enabled. Reccomend to run
pip uninstall torch
to uninstall torch, then reinstall with the following. I chose 2.4.0+cu121:
pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu121
Now to use it, so far I've only tested it with the api_v2.py. Given that the install above went fine, you should now be able to run:
gpt_sovits_api
Which will bootup local server that you can make requests to. Checkout test.py
and test_streaming.py
to get an idea for how you might be able to use the API.
Pretrained Models
Probably don't need to follow the instructions for the below, these are just kept here for reference for now.
-
Download pretrained models from GPT-SoVITS Models and place them in
GPT_SoVITS/pretrained_models
. -
Download G2PW models from G2PWModel_1.1.zip, unzip and rename to
G2PWModel
, and then place them inGPT_SoVITS/text
.(Chinese TTS Only) -
For UVR5 (Vocals/Accompaniment Separation & Reverberation Removal, additionally), download models from UVR5 Weights and place them in
tools/uvr5/uvr5_weights
. -
For Chinese ASR (additionally), download models from Damo ASR Model, Damo VAD Model, and Damo Punc Model and place them in
tools/asr/models
. -
For English or Japanese ASR (additionally), download models from Faster Whisper Large V3 and place them in
tools/asr/models
. Also, other models may have the similar effect with smaller disk footprint.
Credits
Special thanks to the RVC-Boss for getting this wonderful tool up and going, as well as all of the other attributions used to build it:
Original Repo: https://github.com/RVC-Boss/GPT-SoVITS