GPT-SoVITS/README.md
2024-11-16 02:57:56 -08:00

38 lines
2.6 KiB
Markdown

# Jarod's NOTE
Working on turning this into a package. Right now, the API *does in fact* work to make requests to and this can be installed.
## Quick Install and Usage
Ideally, do this all inside of a venv for package isolation
1. Install by doing:
```
pip install git+https://github.com/JarodMica/GPT-SoVITS-Package.git
```
2. Make sure torch is installed with CUDA enabled. Reccomend to run `pip uninstall torch` to uninstall torch, then reinstall with the following. I chose 2.4.0+cu121:
```
pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu121
```
Now to use it, so far I've only tested it with the api_v2.py. Given that the install above went fine, you should now be able to run:
```
gpt_sovits_api
```
Which will bootup local server that you can make requests to. Checkout `test.py` and `test_streaming.py` to get an idea for how you might be able to use the API.
## Pretrained Models
Probably don't need to follow the instructions for the below, these are just kept here for reference for now.
1. Download pretrained models from [GPT-SoVITS Models](https://huggingface.co/lj1995/GPT-SoVITS) and place them in `GPT_SoVITS/pretrained_models`.
2. Download G2PW models from [G2PWModel_1.1.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/g2p/G2PWModel_1.1.zip), unzip and rename to `G2PWModel`, and then place them in `GPT_SoVITS/text`.(Chinese TTS Only)
3. For UVR5 (Vocals/Accompaniment Separation & Reverberation Removal, additionally), download models from [UVR5 Weights](https://huggingface.co/lj1995/VoiceConversionWebUI/tree/main/uvr5_weights) and place them in `tools/uvr5/uvr5_weights`.
4. For Chinese ASR (additionally), download models from [Damo ASR Model](https://modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/files), [Damo VAD Model](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/files), and [Damo Punc Model](https://modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/files) and place them in `tools/asr/models`.
5. For English or Japanese ASR (additionally), download models from [Faster Whisper Large V3](https://huggingface.co/Systran/faster-whisper-large-v3) and place them in `tools/asr/models`. Also, [other models](https://huggingface.co/Systran) may have the similar effect with smaller disk footprint.
## Credits
Special thanks to the RVC-Boss for getting this wonderful tool up and going, as well as all of the other attributions used to build it:
**Original Repo:** https://github.com/RVC-Boss/GPT-SoVITS