This commit is contained in:
XXXXRT666 2024-08-07 03:30:08 +08:00
parent a5c4920a2a
commit 442c2a5fc0
2 changed files with 95 additions and 17 deletions

View File

@ -79,12 +79,6 @@ pip install -r requirements.txt
### Install Manually ### Install Manually
#### Install Dependences
```bash
pip install -r requirements.txt
```
#### Install FFmpeg #### Install FFmpeg
##### Conda Users ##### Conda Users
@ -105,11 +99,19 @@ conda install -c conda-forge 'ffmpeg<7'
Download and place [ffmpeg.exe](https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/ffmpeg.exe) and [ffprobe.exe](https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/ffprobe.exe) in the GPT-SoVITS root. Download and place [ffmpeg.exe](https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/ffmpeg.exe) and [ffprobe.exe](https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/ffprobe.exe) in the GPT-SoVITS root.
Install [Visual Studio 2022](https://visualstudio.microsoft.com/downloads/) (Korean TTS Only)
##### MacOS Users ##### MacOS Users
```bash ```bash
brew install ffmpeg brew install ffmpeg
``` ```
#### Install Dependences
```bash
pip install -r requirements.txt
```
### Using Docker ### Using Docker
#### docker-compose.yaml configuration #### docker-compose.yaml configuration
@ -141,16 +143,22 @@ docker run --rm -it --gpus=all --env=is_half=False --volume=G:\GPT-SoVITS-Docker
Download pretrained models from [GPT-SoVITS Models](https://huggingface.co/lj1995/GPT-SoVITS) and place them in `GPT_SoVITS/pretrained_models`. Download pretrained models from [GPT-SoVITS Models](https://huggingface.co/lj1995/GPT-SoVITS) and place them in `GPT_SoVITS/pretrained_models`.
Download G2PW models from [G2PWModel-v2-onnx.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/g2p/G2PWModel_1.1.zip), unzip and rename to `G2PWModel`, and then place them in `GPT_SoVITS\text`.(Chinese TTS Only)
For UVR5 (Vocals/Accompaniment Separation & Reverberation Removal, additionally), download models from [UVR5 Weights](https://huggingface.co/lj1995/VoiceConversionWebUI/tree/main/uvr5_weights) and place them in `tools/uvr5/uvr5_weights`. For UVR5 (Vocals/Accompaniment Separation & Reverberation Removal, additionally), download models from [UVR5 Weights](https://huggingface.co/lj1995/VoiceConversionWebUI/tree/main/uvr5_weights) and place them in `tools/uvr5/uvr5_weights`.
Users in the China region can download these two models by entering the links below and clicking "Download a copy"(Log out if you encounter errors while downloading.) Users in the China region can download these two models by entering the links below and clicking "Download a copy" (Log out if you encounter errors while downloading.)
- [GPT-SoVITS Models](https://www.icloud.com.cn/iclouddrive/056y_Xog_HXpALuVUjscIwTtg#GPT-SoVITS_Models) - [GPT-SoVITS Models](https://www.icloud.com/iclouddrive/044boFMiOHHt22SNr-c-tirbA#pretrained_models)
- [UVR5 Weights](https://www.icloud.com.cn/iclouddrive/0bekRKDiJXboFhbfm3lM2fVbA#UVR5_Weights) - [UVR5 Weights](https://www.icloud.com.cn/iclouddrive/0bekRKDiJXboFhbfm3lM2fVbA#UVR5_Weights)
- [G2PWModel_1.1.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/g2p/G2PWModel_1.1.zip)Download G2PW models, unzip and rename to `G2PWModel`, and then place them in `GPT_SoVITS\text`.
For Chinese ASR (additionally), download models from [Damo ASR Model](https://modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/files), [Damo VAD Model](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/files), and [Damo Punc Model](https://modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/files) and place them in `tools/asr/models`. For Chinese ASR (additionally), download models from [Damo ASR Model](https://modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/files), [Damo VAD Model](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/files), and [Damo Punc Model](https://modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/files) and place them in `tools/asr/models`.
Or Download FunASR Model from [FunASR Model](https://www.icloud.com/iclouddrive/0b52_7SQWYr75kHkPoPXgpeQA#models), unzip and replace `tools/asr/models`.(Log out if you encounter errors while downloading.)
For English or Japanese ASR (additionally), download models from [Faster Whisper Large V3](https://huggingface.co/Systran/faster-whisper-large-v3) and place them in `tools/asr/models`. Also, [other models](https://huggingface.co/Systran) may have the similar effect with smaller disk footprint. For English or Japanese ASR (additionally), download models from [Faster Whisper Large V3](https://huggingface.co/Systran/faster-whisper-large-v3) and place them in `tools/asr/models`. Also, [other models](https://huggingface.co/Systran) may have the similar effect with smaller disk footprint.
Users in the China region can download this model by entering the links below Users in the China region can download this model by entering the links below
@ -181,6 +189,72 @@ Example:
D:\GPT-SoVITS\xxx/xxx.wav|xxx|en|I like playing Genshin. D:\GPT-SoVITS\xxx/xxx.wav|xxx|en|I like playing Genshin.
``` ```
## Finetune and inference
### Open WebUI
#### Integrated Package Users
Double-click `go-webui.bat`or use `go-webui.ps`
if you want to switch to V1,then double-click`go-webui-v1.bat` or use `go-webui-v1.ps`
#### Others
```bash
python webui.py <language(optional)>
```
if you want to switch to V1,then
```bash
python webui.py v1 <language(optional)>
```
Or maunally switch version in WebUI
### Finetune
#### Path Auto-filling is now supported
1.Fill in the audio path
2.Slice the audio into small chunks
3.Denoise(optinal)
4.ASR
5.Proofreading ASR transcriptions
6.Go to the next Tab, then finetune the model
### Open Inference WebUI
#### Integrated Package Users
Double-click `go-webui-v2.bat` or use `go-webui-v2.ps` ,then open the inference webui at `1-GPT-SoVITS-TTS/1C-inference`
#### Others
```bash
python GPT_SoVITS/inference_webui.py <language(optional)>
```
OR
```bash
python webui.py
```
then open the inference webui at `1-GPT-SoVITS-TTS/1C-inference`
## V2 Release Notes
New Features:
1.Support Korean and Cantonese
2.An optimized text frontend
3.Pre-trained model extended from 2k hours to 5k hours
## Todo List ## Todo List
- [ ] **High Priority:** - [ ] **High Priority:**
@ -206,10 +280,10 @@ Use the command line to open the WebUI for UVR5
``` ```
python tools/uvr5/webui.py "<infer_device>" <is_half> <webui_port_uvr5> python tools/uvr5/webui.py "<infer_device>" <is_half> <webui_port_uvr5>
``` ```
If you can't open a browser, follow the format below for UVR processing,This is using mdxnet for audio processing <!-- If you can't open a browser, follow the format below for UVR processing,This is using mdxnet for audio processing
``` ```
python mdxnet.py --model --input_root --output_vocal --output_ins --agg_level --format --device --is_half_precision python mdxnet.py --model --input_root --output_vocal --output_ins --agg_level --format --device --is_half_precision
``` ``` -->
This is how the audio segmentation of the dataset is done using the command line This is how the audio segmentation of the dataset is done using the command line
``` ```
python audio_slicer.py \ python audio_slicer.py \
@ -250,6 +324,9 @@ Special thanks to the following projects and contributors:
### Text Frontend for Inference ### Text Frontend for Inference
- [paddlespeech zh_normalization](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/paddlespeech/t2s/frontend/zh_normalization) - [paddlespeech zh_normalization](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/paddlespeech/t2s/frontend/zh_normalization)
- [LangSegment](https://github.com/juntaosun/LangSegment) - [LangSegment](https://github.com/juntaosun/LangSegment)
- [g2pW](https://github.com/GitYCC/g2pW)
- [pypinyin-g2pW](https://github.com/mozillazg/pypinyin-g2pW)
- [paddlespeech g2pw](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/paddlespeech/t2s/frontend/g2pw)
### WebUI Tools ### WebUI Tools
- [ultimatevocalremovergui](https://github.com/Anjok07/ultimatevocalremovergui) - [ultimatevocalremovergui](https://github.com/Anjok07/ultimatevocalremovergui)
- [audio-slicer](https://github.com/openvpi/audio-slicer) - [audio-slicer](https://github.com/openvpi/audio-slicer)

View File

@ -51,7 +51,7 @@ _注: numba==0.56.4 需要 python<3.11_
### Windows ### Windows
如果你是 Windows 用户(已在 win>=10 上测试),可以下载[下载整合包](https://huggingface.co/lj1995/GPT-SoVITS-windows-package/resolve/main/GPT-SoVITS-beta.7z?download=true),解压后双击 go-webui-v2.bat 即可启动 GPT-SoVITS-WebUI。 如果你是 Windows 用户(已在 win>=10 上测试),可以下载[下载整合包](https://huggingface.co/lj1995/GPT-SoVITS-windows-package/resolve/main/GPT-SoVITS-beta.7z?download=true),解压后双击 go-webui.bat 即可启动 GPT-SoVITS-WebUI。
中国地区用户可以通过点击链接并选择“下载副本”[下载整合包](https://www.icloud.com.cn/iclouddrive/030K8WjGJ9xMXhpzJVIMEWPzQ#GPT-SoVITS-beta0706fix1)。(如果下载时遇到错误,请退出登录) 中国地区用户可以通过点击链接并选择“下载副本”[下载整合包](https://www.icloud.com.cn/iclouddrive/030K8WjGJ9xMXhpzJVIMEWPzQ#GPT-SoVITS-beta0706fix1)。(如果下载时遇到错误,请退出登录)
@ -99,7 +99,7 @@ conda install -c conda-forge 'ffmpeg<7'
下载并将 [ffmpeg.exe](https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/ffmpeg.exe) 和 [ffprobe.exe](https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/ffprobe.exe) 放置在 GPT-SoVITS 根目录下。 下载并将 [ffmpeg.exe](https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/ffmpeg.exe) 和 [ffprobe.exe](https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/ffprobe.exe) 放置在 GPT-SoVITS 根目录下。
安装 [Visual Studio 2022](https://visualstudio.microsoft.com/zh-hans/downloads/) 环境(仅限韩语) 安装 [Visual Studio 2022](https://visualstudio.microsoft.com/zh-hans/downloads/) 环境(仅限韩语TTS)
##### MacOS 用户 ##### MacOS 用户
```bash ```bash
@ -111,6 +111,7 @@ brew install ffmpeg
```bash ```bash
pip install -r requirements.txt pip install -r requirements.txt
``` ```
### 在 Docker 中使用 ### 在 Docker 中使用
#### docker-compose.yaml 设置 #### docker-compose.yaml 设置
@ -142,7 +143,7 @@ docker run --rm -it --gpus=all --env=is_half=False --volume=G:\GPT-SoVITS-Docker
从 [GPT-SoVITS Models](https://huggingface.co/lj1995/GPT-SoVITS) 下载预训练模型,并将它们放置在 `GPT_SoVITS\pretrained_models` 中。 从 [GPT-SoVITS Models](https://huggingface.co/lj1995/GPT-SoVITS) 下载预训练模型,并将它们放置在 `GPT_SoVITS\pretrained_models` 中。
从 [G2PWModel-v2-onnx.zip](https://storage.googleapis.com/esun-ai/g2pW/G2PWModel-v2-onnx.zip) 下载G2PW模型,并将它们解压重命名为`G2PWModel` 后放置在 `GPT_SoVITS\text` 中。 从 [G2PWModel-v2-onnx.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/g2p/G2PWModel_1.1.zip) 下载G2PW模型,并将它们解压重命名为`G2PWModel` 后放置在 `GPT_SoVITS\text` 中。仅限中文TTS
对于 UVR5人声/伴奏分离和混响移除,附加),从 [UVR5 Weights](https://huggingface.co/lj1995/VoiceConversionWebUI/tree/main/uvr5_weights) 下载模型,并将它们放置在 `tools/uvr5/uvr5_weights` 中。 对于 UVR5人声/伴奏分离和混响移除,附加),从 [UVR5 Weights](https://huggingface.co/lj1995/VoiceConversionWebUI/tree/main/uvr5_weights) 下载模型,并将它们放置在 `tools/uvr5/uvr5_weights` 中。
@ -179,8 +180,8 @@ vocal_path|speaker_name|language|text
- 'zh': 中文 - 'zh': 中文
- 'ja': 日语 - 'ja': 日语
- 'en': 英语 - 'en': 英语
- 'yue': 粤语
- 'ko': 韩语 - 'ko': 韩语
- 'yue': 粤语
示例: 示例:
@ -188,7 +189,7 @@ vocal_path|speaker_name|language|text
D:\GPT-SoVITS\xxx/xxx.wav|xxx|zh|我爱玩原神。 D:\GPT-SoVITS\xxx/xxx.wav|xxx|zh|我爱玩原神。
``` ```
## 使用 ## 微调与推理
### 打开WebUI ### 打开WebUI
@ -210,7 +211,7 @@ python webui.py v1 <language(optional)>
``` ```
或者在webUI内动态切换 或者在webUI内动态切换
### 训练 ### 微调
#### 现已支持自动填充路径 #### 现已支持自动填充路径
@ -252,7 +253,7 @@ python webui.py
2.更好的文本前端 2.更好的文本前端
3.底由2k小时扩展至5k小时 3.底由2k小时扩展至5k小时
## 待办事项清单 ## 待办事项清单