2024-12-23 01:58:31 -08:00
2024-09-03 11:17:42 +08:00
2024-12-23 01:58:31 -08:00
2024-12-23 01:58:31 -08:00
2024-02-21 01:15:31 +00:00
2024-01-15 02:05:22 +08:00
2024-11-16 02:30:20 -08:00
2024-11-16 02:56:29 -08:00
2024-11-16 02:57:56 -08:00
2024-11-16 02:30:20 -08:00
2024-11-16 02:56:29 -08:00
2024-11-16 02:56:29 -08:00

Jarod's NOTE

Working on turning this into a package. Right now, the API does in fact work to make requests to and this can be installed.

Quick Install and Usage

Ideally, do this all inside of a venv for package isolation

  1. Install by doing:
pip install git+https://github.com/JarodMica/GPT-SoVITS-Package.git
  1. Make sure torch is installed with CUDA enabled. Reccomend to run pip uninstall torch to uninstall torch, then reinstall with the following. I chose 2.4.0+cu121:
pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu121

Now to use it, so far I've only tested it with the api_v2.py. Given that the install above went fine, you should now be able to run:

gpt_sovits_api

Which will bootup local server that you can make requests to. Checkout test.py and test_streaming.py to get an idea for how you might be able to use the API.

Pretrained Models

Probably don't need to follow the instructions for the below, these are just kept here for reference for now.

  1. Download pretrained models from GPT-SoVITS Models and place them in GPT_SoVITS/pretrained_models.

  2. Download G2PW models from G2PWModel_1.1.zip, unzip and rename to G2PWModel, and then place them in GPT_SoVITS/text.(Chinese TTS Only)

  3. For UVR5 (Vocals/Accompaniment Separation & Reverberation Removal, additionally), download models from UVR5 Weights and place them in tools/uvr5/uvr5_weights.

  4. For Chinese ASR (additionally), download models from Damo ASR Model, Damo VAD Model, and Damo Punc Model and place them in tools/asr/models.

  5. For English or Japanese ASR (additionally), download models from Faster Whisper Large V3 and place them in tools/asr/models. Also, other models may have the similar effect with smaller disk footprint.

Credits

Special thanks to the RVC-Boss for getting this wonderful tool up and going, as well as all of the other attributions used to build it:

Original Repo: https://github.com/RVC-Boss/GPT-SoVITS

Description
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Readme MIT 136 MiB
Languages
Python 96.6%
Shell 0.9%
Jupyter Notebook 0.9%
Cuda 0.6%
PowerShell 0.5%
Other 0.4%