diff --git a/README.md b/README.md index 22ba5bb2..12ce861c 100644 --- a/README.md +++ b/README.md @@ -79,12 +79,6 @@ pip install -r requirements.txt ### Install Manually -#### Install Dependences - -```bash -pip install -r requirements.txt -``` - #### Install FFmpeg ##### Conda Users @@ -105,11 +99,19 @@ conda install -c conda-forge 'ffmpeg<7' Download and place [ffmpeg.exe](https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/ffmpeg.exe) and [ffprobe.exe](https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/ffprobe.exe) in the GPT-SoVITS root. +Install [Visual Studio 2022](https://visualstudio.microsoft.com/downloads/) (Korean TTS Only) + ##### MacOS Users ```bash brew install ffmpeg ``` +#### Install Dependences + +```bash +pip install -r requirements.txt +``` + ### Using Docker #### docker-compose.yaml configuration @@ -141,16 +143,22 @@ docker run --rm -it --gpus=all --env=is_half=False --volume=G:\GPT-SoVITS-Docker Download pretrained models from [GPT-SoVITS Models](https://huggingface.co/lj1995/GPT-SoVITS) and place them in `GPT_SoVITS/pretrained_models`. +Download G2PW models from [G2PWModel-v2-onnx.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/g2p/G2PWModel_1.1.zip), unzip and rename to `G2PWModel`, and then place them in `GPT_SoVITS\text`.(Chinese TTS Only) + For UVR5 (Vocals/Accompaniment Separation & Reverberation Removal, additionally), download models from [UVR5 Weights](https://huggingface.co/lj1995/VoiceConversionWebUI/tree/main/uvr5_weights) and place them in `tools/uvr5/uvr5_weights`. -Users in the China region can download these two models by entering the links below and clicking "Download a copy"(Log out if you encounter errors while downloading.) +Users in the China region can download these two models by entering the links below and clicking "Download a copy" (Log out if you encounter errors while downloading.) -- [GPT-SoVITS Models](https://www.icloud.com.cn/iclouddrive/056y_Xog_HXpALuVUjscIwTtg#GPT-SoVITS_Models) +- [GPT-SoVITS Models](https://www.icloud.com/iclouddrive/044boFMiOHHt22SNr-c-tirbA#pretrained_models) - [UVR5 Weights](https://www.icloud.com.cn/iclouddrive/0bekRKDiJXboFhbfm3lM2fVbA#UVR5_Weights) +- [G2PWModel_1.1.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/g2p/G2PWModel_1.1.zip)(Download G2PW models, unzip and rename to `G2PWModel`, and then place them in `GPT_SoVITS\text`. + For Chinese ASR (additionally), download models from [Damo ASR Model](https://modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/files), [Damo VAD Model](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/files), and [Damo Punc Model](https://modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/files) and place them in `tools/asr/models`. +Or Download FunASR Model from [FunASR Model](https://www.icloud.com/iclouddrive/0b52_7SQWYr75kHkPoPXgpeQA#models), unzip and replace `tools/asr/models`.(Log out if you encounter errors while downloading.) + For English or Japanese ASR (additionally), download models from [Faster Whisper Large V3](https://huggingface.co/Systran/faster-whisper-large-v3) and place them in `tools/asr/models`. Also, [other models](https://huggingface.co/Systran) may have the similar effect with smaller disk footprint. Users in the China region can download this model by entering the links below @@ -181,6 +189,72 @@ Example: D:\GPT-SoVITS\xxx/xxx.wav|xxx|en|I like playing Genshin. ``` +## Finetune and inference + + ### Open WebUI + + #### Integrated Package Users + + Double-click `go-webui.bat`or use `go-webui.ps` + if you want to switch to V1,then double-click`go-webui-v1.bat` or use `go-webui-v1.ps` + + #### Others + + ```bash + python webui.py + ``` + + if you want to switch to V1,then + + ```bash + python webui.py v1 + ``` +Or maunally switch version in WebUI + + ### Finetune + + #### Path Auto-filling is now supported + + 1.Fill in the audio path + + 2.Slice the audio into small chunks + + 3.Denoise(optinal) + + 4.ASR + + 5.Proofreading ASR transcriptions + + 6.Go to the next Tab, then finetune the model + + ### Open Inference WebUI + + #### Integrated Package Users + + Double-click `go-webui-v2.bat` or use `go-webui-v2.ps` ,then open the inference webui at `1-GPT-SoVITS-TTS/1C-inference` + + #### Others + + ```bash + python GPT_SoVITS/inference_webui.py + ``` + OR + + ```bash + python webui.py + ``` +then open the inference webui at `1-GPT-SoVITS-TTS/1C-inference` + + ## V2 Release Notes + +New Features: + + 1.Support Korean and Cantonese + + 2.An optimized text frontend + + 3.Pre-trained model extended from 2k hours to 5k hours + ## Todo List - [ ] **High Priority:** @@ -206,10 +280,10 @@ Use the command line to open the WebUI for UVR5 ``` python tools/uvr5/webui.py "" ``` -If you can't open a browser, follow the format below for UVR processing,This is using mdxnet for audio processing + This is how the audio segmentation of the dataset is done using the command line ``` python audio_slicer.py \ @@ -250,6 +324,9 @@ Special thanks to the following projects and contributors: ### Text Frontend for Inference - [paddlespeech zh_normalization](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/paddlespeech/t2s/frontend/zh_normalization) - [LangSegment](https://github.com/juntaosun/LangSegment) +- [g2pW](https://github.com/GitYCC/g2pW) +- [pypinyin-g2pW](https://github.com/mozillazg/pypinyin-g2pW) +- [paddlespeech g2pw](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/paddlespeech/t2s/frontend/g2pw) ### WebUI Tools - [ultimatevocalremovergui](https://github.com/Anjok07/ultimatevocalremovergui) - [audio-slicer](https://github.com/openvpi/audio-slicer) diff --git a/docs/cn/README.md b/docs/cn/README.md index e8b63f46..b3341068 100644 --- a/docs/cn/README.md +++ b/docs/cn/README.md @@ -51,7 +51,7 @@ _注: numba==0.56.4 需要 python<3.11_ ### Windows -如果你是 Windows 用户(已在 win>=10 上测试),可以下载[下载整合包](https://huggingface.co/lj1995/GPT-SoVITS-windows-package/resolve/main/GPT-SoVITS-beta.7z?download=true),解压后双击 go-webui-v2.bat 即可启动 GPT-SoVITS-WebUI。 +如果你是 Windows 用户(已在 win>=10 上测试),可以下载[下载整合包](https://huggingface.co/lj1995/GPT-SoVITS-windows-package/resolve/main/GPT-SoVITS-beta.7z?download=true),解压后双击 go-webui.bat 即可启动 GPT-SoVITS-WebUI。 中国地区用户可以通过点击链接并选择“下载副本”[下载整合包](https://www.icloud.com.cn/iclouddrive/030K8WjGJ9xMXhpzJVIMEWPzQ#GPT-SoVITS-beta0706fix1)。(如果下载时遇到错误,请退出登录) @@ -99,7 +99,7 @@ conda install -c conda-forge 'ffmpeg<7' 下载并将 [ffmpeg.exe](https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/ffmpeg.exe) 和 [ffprobe.exe](https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/ffprobe.exe) 放置在 GPT-SoVITS 根目录下。 -安装 [Visual Studio 2022](https://visualstudio.microsoft.com/zh-hans/downloads/) 环境(仅限韩语) +安装 [Visual Studio 2022](https://visualstudio.microsoft.com/zh-hans/downloads/) 环境(仅限韩语TTS) ##### MacOS 用户 ```bash @@ -111,6 +111,7 @@ brew install ffmpeg ```bash pip install -r requirements.txt ``` + ### 在 Docker 中使用 #### docker-compose.yaml 设置 @@ -142,7 +143,7 @@ docker run --rm -it --gpus=all --env=is_half=False --volume=G:\GPT-SoVITS-Docker 从 [GPT-SoVITS Models](https://huggingface.co/lj1995/GPT-SoVITS) 下载预训练模型,并将它们放置在 `GPT_SoVITS\pretrained_models` 中。 -从 [G2PWModel-v2-onnx.zip](https://storage.googleapis.com/esun-ai/g2pW/G2PWModel-v2-onnx.zip) 下载G2PW模型,并将它们解压重命名为`G2PWModel` 后放置在 `GPT_SoVITS\text` 中。 +从 [G2PWModel-v2-onnx.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/g2p/G2PWModel_1.1.zip) 下载G2PW模型,并将它们解压重命名为`G2PWModel` 后放置在 `GPT_SoVITS\text` 中。(仅限中文TTS) 对于 UVR5(人声/伴奏分离和混响移除,附加),从 [UVR5 Weights](https://huggingface.co/lj1995/VoiceConversionWebUI/tree/main/uvr5_weights) 下载模型,并将它们放置在 `tools/uvr5/uvr5_weights` 中。 @@ -179,8 +180,8 @@ vocal_path|speaker_name|language|text - 'zh': 中文 - 'ja': 日语 - 'en': 英语 -- 'yue': 粤语 - 'ko': 韩语 +- 'yue': 粤语 示例: @@ -188,7 +189,7 @@ vocal_path|speaker_name|language|text D:\GPT-SoVITS\xxx/xxx.wav|xxx|zh|我爱玩原神。 ``` -## 使用 +## 微调与推理 ### 打开WebUI @@ -210,7 +211,7 @@ python webui.py v1 ``` 或者在webUI内动态切换 -### 训练 +### 微调 #### 现已支持自动填充路径 @@ -252,7 +253,7 @@ python webui.py 2.更好的文本前端 - 3.底膜由2k小时扩展至5k小时 + 3.底模由2k小时扩展至5k小时 ## 待办事项清单