GPT-SoVITS/README.md

# Jarod's NOTE
Working on turning this into a package.  Right now, the API *does in fact* work to make requests to and this can be installed.

## Quick Install and Usage
Ideally, do this all inside of a venv for package isolation
1. Install by doing:
  ```
  pip install git+https://github.com/JarodMica/GPT-SoVITS-Package.git
  ```
2. Make sure torch is installed with CUDA enabled.  Reccomend to run `pip uninstall torch` to uninstall torch, then reinstall with the following.  I chose 2.4.0+cu121:
  ```
  pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu121
  ```

Now to use it, so far I've only tested it with the api_v2.py.  Given that the install above went fine, you should now be able to run:
```
gpt_sovits_api
```
Which will bootup local server that you can make requests to.  Checkout `test.py` and `test_streaming.py` to get an idea for how you might be able to use the API.

## Pretrained Models
Probably don't need to follow the instructions for the below, these are just kept here for reference for now.

1. Download pretrained models from [GPT-SoVITS Models](https://huggingface.co/lj1995/GPT-SoVITS) and place them in `GPT_SoVITS/pretrained_models`.

2. Download G2PW models from [G2PWModel_1.1.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/g2p/G2PWModel_1.1.zip), unzip and rename to `G2PWModel`, and then place them in `GPT_SoVITS/text`.(Chinese TTS Only)

3. For UVR5 (Vocals/Accompaniment Separation & Reverberation Removal, additionally), download models from [UVR5 Weights](https://huggingface.co/lj1995/VoiceConversionWebUI/tree/main/uvr5_weights) and place them in `tools/uvr5/uvr5_weights`.

4. For Chinese ASR (additionally), download models from [Damo ASR Model](https://modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/files), [Damo VAD Model](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/files), and [Damo Punc Model](https://modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/files) and place them in `tools/asr/models`.

5. For English or Japanese ASR (additionally), download models from [Faster Whisper Large V3](https://huggingface.co/Systran/faster-whisper-large-v3) and place them in `tools/asr/models`. Also, [other models](https://huggingface.co/Systran) may have the similar effect with smaller disk footprint.

## Credits

Special thanks to the RVC-Boss for getting this wonderful tool up and going, as well as all of the other attributions used to build it:

**Original Repo:** https://github.com/RVC-Boss/GPT-SoVITS