mirror of
https://github.com/RVC-Boss/GPT-SoVITS.git
synced 2025-04-06 03:57:44 +08:00
Update README.md
This commit is contained in:
parent
5c2841f5bd
commit
5afe04cf3a
49
README.md
49
README.md
@ -23,6 +23,8 @@ A Powerful Few-shot Voice Conversion and Text-to-Speech WebUI.<br><br>
|
|||||||
|
|
||||||
> Check out our [demo video](https://www.bilibili.com/video/BV12g4y1m7Uw) here!
|
> Check out our [demo video](https://www.bilibili.com/video/BV12g4y1m7Uw) here!
|
||||||
|
|
||||||
|
Unseen speakers few-shot fine-tuning demo:
|
||||||
|
|
||||||
https://github.com/RVC-Boss/GPT-SoVITS/assets/129054828/05bee1fa-bdd8-4d85-9350-80c060ab47fb
|
https://github.com/RVC-Boss/GPT-SoVITS/assets/129054828/05bee1fa-bdd8-4d85-9350-80c060ab47fb
|
||||||
|
|
||||||
For users in China region, you can use AutoDL Cloud Docker to experience the full functionality online: https://www.codewithgpu.com/i/RVC-Boss/GPT-SoVITS/GPT-SoVITS-Official
|
For users in China region, you can use AutoDL Cloud Docker to experience the full functionality online: https://www.codewithgpu.com/i/RVC-Boss/GPT-SoVITS/GPT-SoVITS-Official
|
||||||
@ -48,27 +50,6 @@ If you are a Windows user (tested with win>=10) you can install directly via the
|
|||||||
|
|
||||||
_Note: numba==0.56.4 require py<3.11_
|
_Note: numba==0.56.4 require py<3.11_
|
||||||
|
|
||||||
### For Mac Users
|
|
||||||
If you are a Mac user, make sure you meet the following conditions for training and inferencing with GPU:
|
|
||||||
- Mac computers with Apple silicon or AMD GPUs
|
|
||||||
- macOS 12.3 or later
|
|
||||||
- Xcode command-line tools installed by running `xcode-select --install`
|
|
||||||
|
|
||||||
_Other Macs can do inference with CPU only._
|
|
||||||
|
|
||||||
Then install by using the following commands:
|
|
||||||
#### Create Environment
|
|
||||||
```bash
|
|
||||||
conda create -n GPTSoVits python=3.9
|
|
||||||
conda activate GPTSoVits
|
|
||||||
```
|
|
||||||
#### Install Requirements
|
|
||||||
```bash
|
|
||||||
pip install -r requirements.txt
|
|
||||||
pip uninstall torch torchaudio
|
|
||||||
pip3 install --pre torch torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu
|
|
||||||
```
|
|
||||||
_Note: For preprocessing with UVR5, it is recommended to [download the original project GUI](https://github.com/Anjok07/ultimatevocalremovergui) and select "GPU Conversion". Additionally, there might be memory leak issues, especially during inference. Restarting the inference webUI can help._
|
|
||||||
### Quick Install with Conda
|
### Quick Install with Conda
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
@ -119,6 +100,26 @@ For Chinese ASR (additionally), download models from [Damo ASR Model](https://mo
|
|||||||
|
|
||||||
For UVR5 (Vocals/Accompaniment Separation & Reverberation Removal, additionally), download models from [UVR5 Weights](https://huggingface.co/lj1995/VoiceConversionWebUI/tree/main/uvr5_weights) and place them in `tools/uvr5/uvr5_weights`.
|
For UVR5 (Vocals/Accompaniment Separation & Reverberation Removal, additionally), download models from [UVR5 Weights](https://huggingface.co/lj1995/VoiceConversionWebUI/tree/main/uvr5_weights) and place them in `tools/uvr5/uvr5_weights`.
|
||||||
|
|
||||||
|
### For Mac Users
|
||||||
|
If you are a Mac user, make sure you meet the following conditions for training and inferencing with GPU:
|
||||||
|
- Mac computers with Apple silicon or AMD GPUs
|
||||||
|
- macOS 12.3 or later
|
||||||
|
- Xcode command-line tools installed by running `xcode-select --install`
|
||||||
|
|
||||||
|
_Other Macs can do inference with CPU only._
|
||||||
|
|
||||||
|
Then install by using the following commands:
|
||||||
|
#### Create Environment
|
||||||
|
```bash
|
||||||
|
conda create -n GPTSoVits python=3.9
|
||||||
|
conda activate GPTSoVits
|
||||||
|
```
|
||||||
|
#### Install Requirements
|
||||||
|
```bash
|
||||||
|
pip install -r requirements.txt
|
||||||
|
pip uninstall torch torchaudio
|
||||||
|
pip3 install --pre torch torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu
|
||||||
|
```
|
||||||
|
|
||||||
### Using Docker
|
### Using Docker
|
||||||
|
|
||||||
@ -168,9 +169,9 @@ D:\GPT-SoVITS\xxx/xxx.wav|xxx|en|I like playing Genshin.
|
|||||||
## Todo List
|
## Todo List
|
||||||
|
|
||||||
- [ ] **High Priority:**
|
- [ ] **High Priority:**
|
||||||
- [ ] Localization in Japanese and English.
|
- [x] Localization in Japanese and English.
|
||||||
- [ ] User guide.
|
- [ ] User guide.
|
||||||
- [ ] Japanese and English dataset fine tune training.
|
- [x] Japanese and English dataset fine tune training.
|
||||||
|
|
||||||
- [ ] **Features:**
|
- [ ] **Features:**
|
||||||
- [ ] Zero-shot voice conversion (5s) / few-shot voice conversion (1min).
|
- [ ] Zero-shot voice conversion (5s) / few-shot voice conversion (1min).
|
||||||
@ -179,7 +180,7 @@ D:\GPT-SoVITS\xxx/xxx.wav|xxx|en|I like playing Genshin.
|
|||||||
- [ ] Experiment with changing SoVITS token inputs to probability distribution of vocabs.
|
- [ ] Experiment with changing SoVITS token inputs to probability distribution of vocabs.
|
||||||
- [ ] Improve English and Japanese text frontend.
|
- [ ] Improve English and Japanese text frontend.
|
||||||
- [ ] Develop tiny and larger-sized TTS models.
|
- [ ] Develop tiny and larger-sized TTS models.
|
||||||
- [ ] Colab scripts.
|
- [x] Colab scripts.
|
||||||
- [ ] Try expand training dataset (2k hours -> 10k hours).
|
- [ ] Try expand training dataset (2k hours -> 10k hours).
|
||||||
- [ ] better sovits base model (enhanced audio quality)
|
- [ ] better sovits base model (enhanced audio quality)
|
||||||
- [ ] model mix
|
- [ ] model mix
|
||||||
|
Loading…
x
Reference in New Issue
Block a user