* README (#1412)

* README
2025-10-15 05:13:16 +08:00 · 2024-08-07 14:54:49 +08:00 · 2024-08-07 14:54:49 +08:00 · e685299077
commit e685299077
parent 21f05ee471
19 changed files with 251 additions and 56 deletions
--- a/GPT_SoVITS/download.py
+++ b/GPT_SoVITS/download.py
@ -0,0 +1,5 @@
 import os, sys
 now_dir = os.getcwd()
 sys.path.insert(0, now_dir)
 from .text.g2pw import G2PWPinyin
 g2pw = G2PWPinyin(model_dir="GPT_SoVITS/text/G2PWModel",model_source="GPT_SoVITS/pretrained_models/chinese-roberta-wwm-ext-large",v_to_u=False, neutral_tone_with_five=True)
--- a/GPT_SoVITS/inference_webui.py
+++ b/GPT_SoVITS/inference_webui.py
@ -87,12 +87,9 @@ from module.mel_processing import spectrogram_torch
 from tools.my_utils import load_audio
 from tools.i18n.i18n import I18nAuto, scan_language_list
-language=os.environ.get("language","auto")
+language=os.environ.get("language","Auto")
 language=sys.argv[-1] if sys.argv[-1] in scan_language_list() else language
-if language != 'auto':
+i18n = I18nAuto(language=language)
    i18n = I18nAuto(language=language)
 else:
    i18n = I18nAuto()
 # os.environ['PYTORCH_ENABLE_MPS_FALLBACK'] = '1'  # 确保直接启动推理UI时也能够设置。
--- a/README.md
+++ b/README.md
@ -74,18 +74,11 @@ bash install.sh
 ```bash
 conda create -n GPTSoVits python=3.9
 conda activate GPTSoVits
 pip install -r requirements.txt
 ```
 ### Install Manually
 #### Install Dependences
 ```bash
 pip install -r requirements.txt
 ```
 #### Install FFmpeg
 ##### Conda Users
@ -106,11 +99,19 @@ conda install -c conda-forge 'ffmpeg<7'
 Download and place [ffmpeg.exe](https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/ffmpeg.exe) and [ffprobe.exe](https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/ffprobe.exe) in the GPT-SoVITS root.
 Install [Visual Studio 2022](https://visualstudio.microsoft.com/downloads/) (Korean TTS Only)
 ##### MacOS Users
 ```bash
 brew install ffmpeg
 ```
 #### Install Dependences
 ```bash
 pip install -r requirements.txt
 ```
 ### Using Docker
 #### docker-compose.yaml configuration
@ -142,16 +143,22 @@ docker run --rm -it --gpus=all --env=is_half=False --volume=G:\GPT-SoVITS-Docker
 Download pretrained models from [GPT-SoVITS Models](https://huggingface.co/lj1995/GPT-SoVITS) and place them in `GPT_SoVITS/pretrained_models`.
 Download G2PW models from [G2PWModel-v2-onnx.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/g2p/G2PWModel_1.1.zip), unzip and rename to `G2PWModel`, and then place them in `GPT_SoVITS\text`.(Chinese TTS Only)
 For UVR5 (Vocals/Accompaniment Separation & Reverberation Removal, additionally), download models from [UVR5 Weights](https://huggingface.co/lj1995/VoiceConversionWebUI/tree/main/uvr5_weights) and place them in `tools/uvr5/uvr5_weights`.
-Users in the China region can download these two models by entering the links below and clicking "Download a copy"(Log out if you encounter errors while downloading.)
+Users in the China region can download these two models by entering the links below and clicking "Download a copy" (Log out if you encounter errors while downloading.)
- [GPT-SoVITS Models](https://www.icloud.com.cn/iclouddrive/056y_Xog_HXpALuVUjscIwTtg#GPT-SoVITS_Models)
+- [GPT-SoVITS Models](https://www.icloud.com/iclouddrive/044boFMiOHHt22SNr-c-tirbA#pretrained_models)
 - [UVR5 Weights](https://www.icloud.com.cn/iclouddrive/0bekRKDiJXboFhbfm3lM2fVbA#UVR5_Weights)
 - [G2PWModel_1.1.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/g2p/G2PWModel_1.1.zip)（Download G2PW models,  unzip and rename to `G2PWModel`, and then place them in `GPT_SoVITS\text`.
 For Chinese ASR (additionally), download models from [Damo ASR Model](https://modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/files), [Damo VAD Model](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/files), and [Damo Punc Model](https://modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/files) and place them in `tools/asr/models`.
 Or Download FunASR Model from [FunASR Model](https://www.icloud.com/iclouddrive/0b52_7SQWYr75kHkPoPXgpeQA#models), unzip and replace `tools/asr/models`.(Log out if you encounter errors while downloading.)
 For English or Japanese ASR (additionally), download models from [Faster Whisper Large V3](https://huggingface.co/Systran/faster-whisper-large-v3) and place them in `tools/asr/models`. Also, [other models](https://huggingface.co/Systran) may have the similar effect with smaller disk footprint. 
 Users in the China region can download this model by entering the links below
@ -182,6 +189,72 @@ Example:
 D:\GPT-SoVITS\xxx/xxx.wav|xxx|en|I like playing Genshin.
 ```
 ## Finetune and inference
 ### Open WebUI
 #### Integrated Package Users
 Double-click `go-webui.bat`or use `go-webui.ps`
 if you want to switch to V1,then double-click`go-webui-v1.bat` or use `go-webui-v1.ps`
 #### Others
 ```bash
 python webui.py <language(optional)>
 ```
 if you want to switch to V1,then
 ```bash
 python webui.py v1 <language(optional)>
 ```
 Or maunally switch version in WebUI
 ### Finetune
 #### Path Auto-filling is now supported
     1.Fill in the audio path
     2.Slice the audio into small chunks
     3.Denoise(optinal)
     4.ASR
     5.Proofreading ASR transcriptions
     6.Go to the next Tab, then finetune the model
 ### Open Inference WebUI
 #### Integrated Package Users
 Double-click `go-webui-v2.bat` or use `go-webui-v2.ps` ,then open the inference webui at  `1-GPT-SoVITS-TTS/1C-inference` 
 #### Others
 ```bash
 python GPT_SoVITS/inference_webui.py <language(optional)>
 ```
 OR
 ```bash
 python webui.py
 ```
 then open the inference webui at `1-GPT-SoVITS-TTS/1C-inference`
 ## V2 Release Notes
 New Features:
     1.Support Korean and Cantonese
     2.An optimized text frontend
     3.Pre-trained model extended from 2k hours to 5k hours
 ## Todo List
 - [ ] **High Priority:**
@ -207,10 +280,10 @@ Use the command line to open the WebUI for UVR5
 ```
 python tools/uvr5/webui.py "<infer_device>" <is_half> <webui_port_uvr5>
 ```
-If you can't open a browser, follow the format below for UVR processing,This is using mdxnet for audio processing
+<!-- If you can't open a browser, follow the format below for UVR processing,This is using mdxnet for audio processing
 ```
 python mdxnet.py --model --input_root --output_vocal --output_ins --agg_level --format --device --is_half_precision 
-```
+``` -->
 This is how the audio segmentation of the dataset is done using the command line
 ```
 python audio_slicer.py \
@ -251,6 +324,9 @@ Special thanks to the following projects and contributors:
 ### Text Frontend for Inference
 - [paddlespeech zh_normalization](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/paddlespeech/t2s/frontend/zh_normalization)
 - [LangSegment](https://github.com/juntaosun/LangSegment)
 - [g2pW](https://github.com/GitYCC/g2pW)
 - [pypinyin-g2pW](https://github.com/mozillazg/pypinyin-g2pW)
 - [paddlespeech g2pw](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/paddlespeech/t2s/frontend/g2pw)
 ### WebUI Tools
 - [ultimatevocalremovergui](https://github.com/Anjok07/ultimatevocalremovergui)
 - [audio-slicer](https://github.com/openvpi/audio-slicer)
--- a/docs/cn/README.md
+++ b/docs/cn/README.md
@ -74,18 +74,11 @@ bash install.sh
 ```bash
 conda create -n GPTSoVits python=3.9
 conda activate GPTSoVits
 pip install -r requirements.txt
 ```
 ### 手动安装
 #### 安装依赖
 ```bash
 pip install -r requirements.txt
 ```
 #### 安装 FFmpeg
 ##### Conda 用户
@ -106,11 +99,19 @@ conda install -c conda-forge 'ffmpeg<7'
 下载并将 [ffmpeg.exe](https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/ffmpeg.exe) 和 [ffprobe.exe](https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/ffprobe.exe) 放置在 GPT-SoVITS 根目录下。
 安装 [Visual Studio 2022](https://visualstudio.microsoft.com/zh-hans/downloads/) 环境(仅限韩语TTS)
 ##### MacOS 用户
 ```bash
 brew install ffmpeg
 ```
 #### 安装依赖
 ```bash
 pip install -r requirements.txt
 ```
 ### 在 Docker 中使用
 #### docker-compose.yaml 设置
@ -142,22 +143,28 @@ docker run --rm -it --gpus=all --env=is_half=False --volume=G:\GPT-SoVITS-Docker
 从 [GPT-SoVITS Models](https://huggingface.co/lj1995/GPT-SoVITS) 下载预训练模型，并将它们放置在 `GPT_SoVITS\pretrained_models` 中。
 从 [G2PWModel-v2-onnx.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/g2p/G2PWModel_1.1.zip) 下载G2PW模型,并将它们解压重命名为`G2PWModel` 后放置在 `GPT_SoVITS\text` 中。（仅限中文TTS）
 对于 UVR5（人声/伴奏分离和混响移除，附加），从 [UVR5 Weights](https://huggingface.co/lj1995/VoiceConversionWebUI/tree/main/uvr5_weights) 下载模型，并将它们放置在 `tools/uvr5/uvr5_weights` 中。
 中国地区用户可以进入以下链接并点击“下载副本”下载以上两个模型（如果下载时遇到错误，请退出登录）：
- [GPT-SoVITS Models](https://www.icloud.com.cn/iclouddrive/056y_Xog_HXpALuVUjscIwTtg#GPT-SoVITS_Models)
+- [GPT-SoVITS Models](https://www.icloud.com/iclouddrive/044boFMiOHHt22SNr-c-tirbA#pretrained_models)
 - [UVR5 Weights](https://www.icloud.com.cn/iclouddrive/0bekRKDiJXboFhbfm3lM2fVbA#UVR5_Weights)
 - [G2PWModel_1.1.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/g2p/G2PWModel_1.1.zip)（下载G2PW模型,并将它们解压重命名为 `G2PWModel` 后放置在 `GPT_SoVITS\text` 中）
 对于中文自动语音识别（附加），从 [Damo ASR Model](https://modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/files), [Damo VAD Model](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/files), 和 [Damo Punc Model](https://modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/files) 下载模型，并将它们放置在 `tools/asr/models` 中。
 或者从[FunASR模型链接](https://www.icloud.com/iclouddrive/0b52_7SQWYr75kHkPoPXgpeQA#models)下载模型，并将它们解压后替换 `tools/asr/models` 。（点击“下载副本”，如果下载时遇到错误，请退出登录）
 对于英语与日语自动语音识别（附加）,从 [Faster Whisper Large V3](https://huggingface.co/Systran/faster-whisper-large-v3) 下载模型，并将它们放置在 `tools/asr/models` 中。 此外，[其他模型](https://huggingface.co/Systran)可能具有类似效果，但占用更小的磁盘空间。
 中国地区用户可以通过以下链接下载：
- [Faster Whisper Large V3](https://www.icloud.com/iclouddrive/00bUEp9_mcjMq_dhHu_vrAFDQ#faster-whisper-large-v3)(点击“下载副本”，如果下载时遇到错误，请退出登录）
+- [Faster Whisper Large V3](https://www.icloud.com/iclouddrive/00bUEp9_mcjMq_dhHu_vrAFDQ#faster-whisper-large-v3)（点击“下载副本”，如果下载时遇到错误，请退出登录）
- [Faster Whisper Large V3](https://hf-mirror.com/Systran/faster-whisper-large-v3)(Hugging Face镜像站)
+- [Faster Whisper Large V3](https://hf-mirror.com/Systran/faster-whisper-large-v3)（Hugging Face镜像站）
 ## 数据集格式
@ -170,16 +177,84 @@ vocal_path|speaker_name|language|text
 语言字典：
- 'zh': Chinese
+- 'zh': 中文
- 'ja': Japanese
+- 'ja': 日语
- 'en': English
+- 'en': 英语
 - 'ko': 韩语
 - 'yue': 粤语
 示例：
 ```
-D:\GPT-SoVITS\xxx/xxx.wav|xxx|en|I like playing Genshin.
+D:\GPT-SoVITS\xxx/xxx.wav|xxx|zh|我爱玩原神。
 ```
 ## 微调与推理
 ### 打开WebUI
 #### 整合包用户
 双击`go-webui.bat`或者使用`go-webui.ps`
 若想使用V1,则双击`go-webui-v1.bat`或者使用`go-webui-v1.ps`
 #### 其他
 ```bash
 python webui.py <language(optional)>
 ```
 若想使用V1,则
 ```bash
 python webui.py v1 <language(optional)>
 ```
 或者在webUI内动态切换
 ### 微调
 #### 现已支持自动填充路径
    1.填入训练音频路径
    2.切割音频
    3.进行降噪(可选)
    4.进行ASR
    5.校对标注
    6.前往下一个窗口,点击训练
 ### 打开推理WebUI
 #### 整合包用户
 双击 `go-webui.bat` 或者使用 `go-webui.ps` ,然后在 `1-GPT-SoVITS-TTS/1C-推理` 中打开推理webUI
 #### 其他
 ```bash
 python GPT_SoVITS/inference_webui.py <language(optional)>
 ```
 或者
 ```bash
 python webui.py
 ```
 然后在 `1-GPT-SoVITS-TTS/1C-推理` 中打开推理webUI
 ## V2发布说明
 新特性:
    1.支持韩语及粤语
    2.更好的文本前端
    3.底模由2k小时扩展至5k小时
 ## 待办事项清单
 - [ ] **高优先级：**
@ -205,10 +280,10 @@ D:\GPT-SoVITS\xxx/xxx.wav|xxx|en|I like playing Genshin.
 ````
 python tools/uvr5/webui.py "<infer_device>" <is_half> <webui_port_uvr5>
 ````
-如果打不开浏览器，请按照下面的格式进行UVR处理，这是使用mdxnet进行音频处理的方式
+<!-- 如果打不开浏览器，请按照下面的格式进行UVR处理，这是使用mdxnet进行音频处理的方式
 ````
 python mdxnet.py --model --input_root --output_vocal --output_ins --agg_level --format --device --is_half_precision 
-````
+```` -->
 这是使用命令行完成数据集的音频切分的方式
 ````
 python audio_slicer.py \
@ -249,6 +324,9 @@ python ./tools/asr/fasterwhisper_asr.py -i <input> -o <output> -l <language> -p
 ### 推理用文本前端
 - [paddlespeech zh_normalization](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/paddlespeech/t2s/frontend/zh_normalization)
 - [LangSegment](https://github.com/juntaosun/LangSegment)
 - [g2pW](https://github.com/GitYCC/g2pW)
 - [pypinyin-g2pW](https://github.com/mozillazg/pypinyin-g2pW)
 - [paddlespeech g2pw](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/paddlespeech/t2s/frontend/g2pw)
 ### WebUI 工具
 - [ultimatevocalremovergui](https://github.com/Anjok07/ultimatevocalremovergui)
 - [audio-slicer](https://github.com/openvpi/audio-slicer)
--- a/requirements.txt
+++ b/requirements.txt
@ -33,5 +33,4 @@ g2pk2
 ko_pron
 opencc; sys_platform != 'linux'
 opencc==1.1.1; sys_platform == 'linux'
 eunjeon; sys_platform == 'win32'
 python_mecab_ko; sys_platform != 'win32'
--- a/tools/i18n/locale/en_US.json
+++ b/tools/i18n/locale/en_US.json
@ -68,6 +68,7 @@
    "也可批量输入音频文件, 二选一, 优先读文件夹": "Multiple audio files can also be imported. If a folder path exists, this input is ignored.",
    "人声伴奏分离批量处理， 使用UVR5模型。": "Batch processing for vocal and instrumental separation, using the UVR5 model.",
    "人声提取激进程度": "Vocal extraction aggressiveness",
    "以下文件或文件夹不存在:": "No Such File or Folder:",
    "以下模型不存在:": "No Such Model:",
    "伴奏人声分离&去混响&去回声": "Vocals/Accompaniment Separation & Reverberation Removal",
    "使用无参考文本模式时建议使用微调的GPT，听不清参考音频说的啥(不晓得写啥)可以开。<br>开启后无视填写的参考文本。": "When using the no-reference text mode, it is recommended to use a fine-tuned GPT. If the reference audio is unclear and you don't know what to write, you can enable this feature, which will ignore the reference text you've entered.",
@ -151,6 +152,8 @@
    "语音切割进程输出信息": "Audio slicer output log",
    "语音降噪进程输出信息": "Voice Denoiser Process Output Information",
    "请上传3~10秒内参考音频，超过会报错！": "Please upload a reference audio within the 3-10 second range; if it exceeds this duration, it will raise errors.",
    "请上传参考音频": "Please Upload the Reference Audio",
    "请填入推理文本": "Please Fill in the Terget Text",
    "请输入有效文本": "Please enter valid text.",
    "转换": "Convert",
    "输入待处理音频文件夹路径": "Enter the path of the audio folder to be processed:",
--- a/tools/i18n/locale/es_ES.json
+++ b/tools/i18n/locale/es_ES.json
@ -68,6 +68,7 @@
    "也可批量输入音频文件, 二选一, 优先读文件夹": "También se pueden ingresar archivos de audio por lotes, seleccionar uno, prioridad para leer carpetas",
    "人声伴奏分离批量处理， 使用UVR5模型。": "Procesamiento por lotes de separación de voz y acompañamiento utilizando el modelo UVR5",
    "人声提取激进程度": "Nivel de agresividad en la extracción de voz",
    "以下文件或文件夹不存在:": "No Existe Tal Archivo o Carpeta:",
    "以下模型不存在:": "No Existe tal Modelo:",
    "伴奏人声分离&去混响&去回声": "Separación de acompañamiento y voz principal y eliminación de reverberación y eco",
    "使用无参考文本模式时建议使用微调的GPT，听不清参考音频说的啥(不晓得写啥)可以开。<br>开启后无视填写的参考文本。": "Se recomienda usar un GPT ajustado en modo sin texto de referencia; habilítelo si no puede entender el audio de referencia (si no sabe qué escribir). Una vez habilitado, ignorará el texto de referencia ingresado.",
@ -151,6 +152,8 @@
    "语音切割进程输出信息": "Información de salida del proceso de división de voz",
    "语音降噪进程输出信息": "Información de salida del proceso de reducción de ruido de voz",
    "请上传3~10秒内参考音频，超过会报错！": "Por favor, suba un audio de referencia de entre 3 y 10 segundos, ¡más de eso causará un error!",
    "请上传参考音频": "Por Favor, Suba el Audio de Referencia",
    "请填入推理文本": "Por Favor, Ingrese el Texto Objetivo",
    "请输入有效文本": "Por favor, introduzca un texto válido",
    "转换": "Convertir",
    "输入待处理音频文件夹路径": "Ingrese la ruta de la carpeta de audio a procesar",
--- a/tools/i18n/locale/fr_FR.json
+++ b/tools/i18n/locale/fr_FR.json
@ -68,6 +68,7 @@
    "也可批量输入音频文件, 二选一, 优先读文件夹": "Également possible d'entrer en lot des fichiers audio, au choix, privilégiez la lecture du dossier",
    "人声伴奏分离批量处理， 使用UVR5模型。": "Traitement par lot de séparation voix-accompagnement en utilisant le modèle UVR5.",
    "人声提取激进程度": "Degré d'extraction des voix",
    "以下文件或文件夹不存在:": "Aucun fichier ou dossier de ce type:",
    "以下模型不存在:": "Aucun Modèle de ce Type:",
    "伴奏人声分离&去混响&去回声": "Séparation de la voix et de l'accompagnement, suppression de la réverbération et de l'écho",
    "使用无参考文本模式时建议使用微调的GPT，听不清参考音频说的啥(不晓得写啥)可以开。<br>开启后无视填写的参考文本。": "Il est recommandé d'utiliser GPT finement ajusté en mode sans texte de référence. Si vous ne comprenez pas ce que dit l'audio de référence (vous ne savez pas quoi écrire), vous pouvez l'activer ; une fois activé, ignorez le texte de référence saisi.",
@ -151,6 +152,8 @@
    "语音切割进程输出信息": "Informations de processus de découpage vocal",
    "语音降噪进程输出信息": "Informations de sortie du processus de réduction du bruit vocal",
    "请上传3~10秒内参考音频，超过会报错！": "Veuillez télécharger une référence audio de 3 à 10 secondes ; les fichiers plus longs généreront une erreur!",
    "请上传参考音频": "Veuillez télécharger l'audio de référence",
    "请填入推理文本": "Veuillez remplir le texte cible",
    "请输入有效文本": "Veuillez entrer un texte valide",
    "转换": "Conversion",
    "输入待处理音频文件夹路径": "Entrez le chemin du dossier audio à traiter",
--- a/tools/i18n/locale/it_IT.json
+++ b/tools/i18n/locale/it_IT.json
@ -68,6 +68,7 @@
    "也可批量输入音频文件, 二选一, 优先读文件夹": "È possibile anche inserire file audio in batch, una delle due opzioni, con priorità alla lettura della cartella",
    "人声伴奏分离批量处理， 使用UVR5模型。": "Separazione voce-accompagnamento in batch, utilizza il modello UVR5.",
    "人声提取激进程度": "Grado di aggressività dell'estrazione vocale",
    "以下文件或文件夹不存在:": "Nessun file o cartella trovati:",
    "以下模型不存在:": "Nessun Modello del Genere:",
    "伴奏人声分离&去混响&去回声": "Separazione tra accompagnamento e voce & Rimozione dell'eco & Rimozione dell'eco",
    "使用无参考文本模式时建议使用微调的GPT，听不清参考音频说的啥(不晓得写啥)可以开。<br>开启后无视填写的参考文本。": "Si consiglia di utilizzare GPT fine-tuned quando si utilizza la modalità senza testo di riferimento. Se non si riesce a capire cosa dice l'audio di riferimento (e non si sa cosa scrivere), è possibile abilitare questa opzione, ignorando il testo di riferimento inserito.",
@ -151,6 +152,8 @@
    "语音切割进程输出信息": "Informazioni sull'output del processo di segmentazione vocale",
    "语音降噪进程输出信息": "Informazioni sull'output del processo di riduzione del rumore vocale",
    "请上传3~10秒内参考音频，超过会报错！": "Carica un audio di riferimento della durata compresa tra 3 e 10 secondi. Superiore a questo, verrà generato un errore!",
    "请上传参考音频": "Si prega di caricare l'audio di riferimento",
    "请填入推理文本": "Si prega di inserire il testo di destinazione",
    "请输入有效文本": "Inserisci un testo valido",
    "转换": "Converti",
    "输入待处理音频文件夹路径": "Inserisci il percorso della cartella dei file audio da elaborare",
--- a/tools/i18n/locale/ja_JP.json
+++ b/tools/i18n/locale/ja_JP.json
@ -68,6 +68,7 @@
    "也可批量输入音频文件, 二选一, 优先读文件夹": "複数のオーディオファイルもインポートできます。フォルダパスが存在する場合、この入力は無視されます。",
    "人声伴奏分离批量处理， 使用UVR5模型。": "人声と伴奏の分離をバッチ処理で行い、UVR5モデルを使用します。",
    "人声提取激进程度": "人声抽出の積極性",
    "以下文件或文件夹不存在:": "そのようなファイルやフォルダは存在しません:",
    "以下模型不存在:": "モデルが存在しません:",
    "伴奏人声分离&去混响&去回声": "ボーカル/伴奏の分離と残響の除去",
    "使用无参考文本模式时建议使用微调的GPT，听不清参考音频说的啥(不晓得写啥)可以开。<br>开启后无视填写的参考文本。": "参考テキストなしモードを使用する場合は、微調整されたGPTの使用をお勧めします。参考音声が聞き取れない場合（何を書けば良いかわからない場合）は、有効にすると、入力した参考テキストを無視します。",
@ -151,6 +152,8 @@
    "语音切割进程输出信息": "音声分割プロセスの出力情報",
    "语音降噪进程输出信息": "音声ノイズ除去プロセスの出力情報",
    "请上传3~10秒内参考音频，超过会报错！": "3～10秒以内の参照音声をアップロードしてください。それを超えるとエラーが発生します！",
    "请上传参考音频": "リファレンスオーディオをアップロードしてください",
    "请填入推理文本": "ターゲットテキストを入力してください",
    "请输入有效文本": "有効なテキストを入力してください",
    "转换": "変換",
    "输入待处理音频文件夹路径": "処理するオーディオフォルダのパスを入力してください:",
--- a/tools/i18n/locale/ko_KR.json
+++ b/tools/i18n/locale/ko_KR.json
@ -68,6 +68,7 @@
    "也可批量输入音频文件, 二选一, 优先读文件夹": "오디오 파일을 일괄로 입력할 수도 있습니다. 둘 중 하나를 선택하고 폴더를 읽기를 우선합니다.",
    "人声伴奏分离批量处理， 使用UVR5模型。": "보컬과 반주 분리 배치 처리, UVR5 모델 사용.",
    "人声提取激进程度": "보컬 추출의 공격성",
    "以下文件或文件夹不存在:": "해당 파일 또는 폴더가 존재하지 않습니다:",
    "以下模型不存在:": "해당 모델이 존재하지 않습니다:",
    "伴奏人声分离&去混响&去回声": "반주 및 보컬 분리 & 리버브 제거 & 에코 제거",
    "使用无参考文本模式时建议使用微调的GPT，听不清参考音频说的啥(不晓得写啥)可以开。<br>开启后无视填写的参考文本。": "참고 텍스트가 없을 때는 미세 조정된 GPT를 사용하는 것이 좋습니다. 참고 오디오에서 무엇을 말하는지 잘 들리지 않으면 이 모드를 켜서 입력한 참고 텍스트를 무시할 수 있습니다.",
@ -151,6 +152,8 @@
    "语音切割进程输出信息": "음성 분리 프로세스 출력 정보",
    "语音降噪进程输出信息": "음성 노이즈 제거 프로세스 출력 정보",
    "请上传3~10秒内参考音频，超过会报错！": "3~10초 이내의 참고 오디오를 업로드하십시오. 초과하면 오류가 발생합니다!",
    "请上传参考音频": "참고 오디오를 업로드하세요",
    "请填入推理文本": "목표 텍스트를 입력하세요",
    "请输入有效文本": "유효한 텍스트를 입력하세요",
    "转换": "변환",
    "输入待处理音频文件夹路径": "처리 대기 중인 오디오 폴더 경로 입력",
--- a/tools/i18n/locale/pt_BR.json
+++ b/tools/i18n/locale/pt_BR.json
@ -68,6 +68,7 @@
    "也可批量输入音频文件, 二选一, 优先读文件夹": "Também é possível inserir arquivos de áudio em lote; escolha uma opção, preferencialmente leia a pasta.",
    "人声伴奏分离批量处理， 使用UVR5模型。": "Processamento em lote de separação de voz e acompanhamento, usando o modelo UVR5.",
    "人声提取激进程度": "Grau de agressividade da extração de voz",
    "以下文件或文件夹不存在:": "Nenhum Arquivo ou Pasta Encontrado:",
    "以下模型不存在:": "Nenhum Modelo Tal:",
    "伴奏人声分离&去混响&去回声": "Separação de acompanhamento e voz & remoção de reverberação & remoção de eco",
    "使用无参考文本模式时建议使用微调的GPT，听不清参考音频说的啥(不晓得写啥)可以开。<br>开启后无视填写的参考文本。": "Ao usar o modo sem texto de referência, recomenda-se usar um GPT ajustado. Se não conseguir ouvir claramente o áudio de referência (não sabe o que escrever), você pode ativar o modo e ignorar o texto de referência fornecido.",
@ -151,6 +152,8 @@
    "语音切割进程输出信息": "Informações de saída do processo de corte de voz",
    "语音降噪进程输出信息": "Informações de saída do processo de redução de ruído de voz",
    "请上传3~10秒内参考音频，超过会报错！": "Por favor, faça upload de um áudio de referência com duração entre 3 e 10 segundos. Áudios fora dessa faixa causarão erro!",
    "请上传参考音频": "Por Favor, Carregue o Áudio de Referência",
    "请填入推理文本": "Por Favor, Preencha o Texto de Inferência",
    "请输入有效文本": "Por favor, insira um texto válido",
    "转换": "Converter",
    "输入待处理音频文件夹路径": "Caminho da pasta de arquivos de áudio a ser processados",
--- a/tools/i18n/locale/ru_RU.json
+++ b/tools/i18n/locale/ru_RU.json
@ -68,6 +68,7 @@
    "也可批量输入音频文件, 二选一, 优先读文件夹": "Можно также импортировать несколько аудиофайлов. Если путь к папке существует, то этот ввод игнорируется.",
    "人声伴奏分离批量处理， 使用UVR5模型。": "Обработка разделения вокала и аккомпанемента пакетно с использованием модели UVR5.",
    "人声提取激进程度": "Степень агрессивности извлечения вокала",
    "以下文件或文件夹不存在:": "Нет такого файла или папки:",
    "以下模型不存在:": "Этот модель не существует",
    "伴奏人声分离&去混响&去回声": "Разделение вокала/аккомпанемента и удаление эхо",
    "使用无参考文本模式时建议使用微调的GPT，听不清参考音频说的啥(不晓得写啥)可以开。<br>开启后无视填写的参考文本。": "При использовании режима без референсного текста рекомендуется использовать настроенную модель GPT. Если не удается разобрать, что говорит референсное аудио (не знаете, что писать), можете включить этот режим, и он проигнорирует введенный референсный текст.",
@ -151,6 +152,8 @@
    "语音切割进程输出信息": "Информация о процессе разрезания речи",
    "语音降噪进程输出信息": "Информация о процессе шумоподавления",
    "请上传3~10秒内参考音频，超过会报错！": "Пожалуйста, загрузите референтное аудио длительностью от 3 до 10 секунд, иначе будет ошибка!",
    "请上传参考音频": "Пожалуйста, загрузите эталонное аудио",
    "请填入推理文本": "Пожалуйста, введите целевой текст",
    "请输入有效文本": "Введите действительный текст",
    "转换": "Преобразовать",
    "输入待处理音频文件夹路径": "Путь к папке с аудиофайлами для обработки:",
--- a/tools/i18n/locale/tr_TR.json
+++ b/tools/i18n/locale/tr_TR.json
@ -68,6 +68,7 @@
    "也可批量输入音频文件, 二选一, 优先读文件夹": "Ses dosyaları ayrıca toplu olarak, iki seçimle, öncelikli okuma klasörüyle içe aktarılabilir",
    "人声伴奏分离批量处理， 使用UVR5模型。": "Vokal ve akor ayırma toplu işleme, UVR5 modelini kullanarak.",
    "人声提取激进程度": "Vokal çıkarma agresiflik derecesi",
    "以下文件或文件夹不存在:": "Böyle Bir Dosya veya Klasör Yok:",
    "以下模型不存在:": "Böyle bir model yok:",
    "伴奏人声分离&去混响&去回声": "Vokal/Müzik Ayrıştırma ve Yankı Giderme",
    "使用无参考文本模式时建议使用微调的GPT，听不清参考音频说的啥(不晓得写啥)可以开。<br>开启后无视填写的参考文本。": "Referans metin modu olmadan kullanıldığında, referans sesi net duyulmadığında (ne yazılacağı bilinmiyorsa) açık bırakılması önerilir, bu durumda girilen referans metni göz ardı edilir.",
@ -151,6 +152,8 @@
    "语音切割进程输出信息": "Ses kesim işlemi çıktı bilgisi",
    "语音降噪进程输出信息": "Gürültü azaltma işlemi çıktı bilgisi",
    "请上传3~10秒内参考音频，超过会报错！": "Lütfen 3~10 saniye arasında bir referans ses dosyası yükleyin, aşım durumunda hata verilecektir!",
    "请上传参考音频": "Lütfen Referans Sesi Yükleyin",
    "请填入推理文本": "Lütfen Hedef Metni Girin",
    "请输入有效文本": "Geçerli metin girin",
    "转换": "Dönüştür",
    "输入待处理音频文件夹路径": "İşlenecek ses klasörünün yolunu girin:",
--- a/tools/i18n/locale/zh_CN.json
+++ b/tools/i18n/locale/zh_CN.json
@ -68,6 +68,7 @@
    "也可批量输入音频文件, 二选一, 优先读文件夹": "也可批量输入音频文件, 二选一, 优先读文件夹",
    "人声伴奏分离批量处理， 使用UVR5模型。": "人声伴奏分离批量处理， 使用UVR5模型。",
    "人声提取激进程度": "人声提取激进程度",
    "以下文件或文件夹不存在:": "以下文件或文件夹不存在:",
    "以下模型不存在:": "以下模型不存在:",
    "伴奏人声分离&去混响&去回声": "伴奏人声分离&去混响&去回声",
    "使用无参考文本模式时建议使用微调的GPT，听不清参考音频说的啥(不晓得写啥)可以开。<br>开启后无视填写的参考文本。": "使用无参考文本模式时建议使用微调的GPT，听不清参考音频说的啥(不晓得写啥)可以开。<br>开启后无视填写的参考文本。",
@ -151,6 +152,8 @@
    "语音切割进程输出信息": "语音切割进程输出信息",
    "语音降噪进程输出信息": "语音降噪进程输出信息",
    "请上传3~10秒内参考音频，超过会报错！": "请上传3~10秒内参考音频，超过会报错！",
    "请上传参考音频": "请上传参考音频",
    "请填入推理文本": "请填入推理文本",
    "请输入有效文本": "请输入有效文本",
    "转换": "转换",
    "输入待处理音频文件夹路径": "输入待处理音频文件夹路径",
--- a/tools/i18n/locale/zh_HK.json
+++ b/tools/i18n/locale/zh_HK.json
@ -68,6 +68,7 @@
    "也可批量输入音频文件, 二选一, 优先读文件夹": "也可批量输入音频文件, 二选一, 优先读文件夹",
    "人声伴奏分离批量处理， 使用UVR5模型。": "人聲伴奏分離批量處理， 使用UVR5模型。",
    "人声提取激进程度": "人聲提取激進程度",
    "以下文件或文件夹不存在:": "沒有這樣的檔案或文件夾:",
    "以下模型不存在:": "以下模型不存在:",
    "伴奏人声分离&去混响&去回声": "伴奏人聲分離&去混響&去回聲",
    "使用无参考文本模式时建议使用微调的GPT，听不清参考音频说的啥(不晓得写啥)可以开。<br>开启后无视填写的参考文本。": "使用無參考文本模式時建議使用微調的GPT，聽不清參考音頻說的是啥（不知道寫啥）可以開啟，開啟後無視填寫的參考文本。",
@ -151,6 +152,8 @@
    "语音切割进程输出信息": "語音切割進程輸出信息",
    "语音降噪进程输出信息": "語音降噪進程輸出信息",
    "请上传3~10秒内参考音频，超过会报错！": "請上傳3~10秒內參考音頻，超過會報錯！",
    "请上传参考音频": "請上傳參考音頻",
    "请填入推理文本": "請填入推理文本",
    "请输入有效文本": "請輸入有效文本",
    "转换": "轉換",
    "输入待处理音频文件夹路径": "輸入待處理音頻資料夾路徑",
--- a/tools/i18n/locale/zh_SG.json
+++ b/tools/i18n/locale/zh_SG.json
@ -68,6 +68,7 @@
    "也可批量输入音频文件, 二选一, 优先读文件夹": "也可批量输入音频文件, 二选一, 优先读文件夹",
    "人声伴奏分离批量处理， 使用UVR5模型。": "人聲伴奏分離批量處理， 使用UVR5模型。",
    "人声提取激进程度": "人聲提取激進程度",
    "以下文件或文件夹不存在:": "沒有這樣的檔案或文件夾:",
    "以下模型不存在:": "以下模型不存在",
    "伴奏人声分离&去混响&去回声": "伴奏人聲分離&去混響&去回聲",
    "使用无参考文本模式时建议使用微调的GPT，听不清参考音频说的啥(不晓得写啥)可以开。<br>开启后无视填写的参考文本。": "使用無參考文本模式時建議使用微調的GPT，聽不清參考音頻說的啥(不曉得寫啥)可以開，開啟後無視填寫的參考文本。",
@ -151,6 +152,8 @@
    "语音切割进程输出信息": "語音切割進程輸出資訊",
    "语音降噪进程输出信息": "語音降噪進程輸出資訊",
    "请上传3~10秒内参考音频，超过会报错！": "請上傳3~10秒內參考音頻，超過會報錯！",
    "请上传参考音频": "請上傳參考音頻",
    "请填入推理文本": "請填入推理文本",
    "请输入有效文本": "請輸入有效文本",
    "转换": "轉換",
    "输入待处理音频文件夹路径": "輸入待處理音頻資料夾路徑",
--- a/tools/i18n/locale/zh_TW.json
+++ b/tools/i18n/locale/zh_TW.json
@ -68,6 +68,7 @@
    "也可批量输入音频文件, 二选一, 优先读文件夹": "也可批量输入音频文件, 二选一, 优先读文件夹",
    "人声伴奏分离批量处理， 使用UVR5模型。": "人聲伴奏分離批量處理， 使用UVR5模型。",
    "人声提取激进程度": "人聲提取激進程度",
    "以下文件或文件夹不存在:": "沒有這樣的檔案或文件夾:",
    "以下模型不存在:": "#以下模型不存在",
    "伴奏人声分离&去混响&去回声": "伴奏人聲分離&去混響&去回聲",
    "使用无参考文本模式时建议使用微调的GPT，听不清参考音频说的啥(不晓得写啥)可以开。<br>开启后无视填写的参考文本。": "使用無參考文本模式時建議使用微調的GPT，聽不清參考音頻說的啥(不曉得寫啥)可以開，開啟後無視填寫的參考文本。",
@ -151,6 +152,8 @@
    "语音切割进程输出信息": "語音切割進程輸出資訊",
    "语音降噪进程输出信息": "語音降噪進程輸出資訊",
    "请上传3~10秒内参考音频，超过会报错！": "請上傳3~10秒內參考音頻，超過會報錯！",
    "请上传参考音频": "請上傳參考音頻",
    "请填入推理文本": "請填入推理文本",
    "请输入有效文本": "請輸入有效文本",
    "转换": "轉換",
    "输入待处理音频文件夹路径": "輸入待處理音頻資料夾路徑",
--- a/webui.py
+++ b/webui.py
@ -52,12 +52,9 @@ from subprocess import Popen
 import signal
 from config import python_exec,infer_device,is_half,exp_root,webui_port_main,webui_port_infer_tts,webui_port_uvr5,webui_port_subfix,is_share
 from tools.i18n.i18n import I18nAuto, scan_language_list
-language=sys.argv[-1] if sys.argv[-1] in scan_language_list() else "auto"
+language=sys.argv[-1] if sys.argv[-1] in scan_language_list() else "Auto"
 os.environ["language"]=language
-if language != 'auto':
+i18n = I18nAuto(language=language)
    i18n = I18nAuto(language=language)
 else:
    i18n = I18nAuto()
 from scipy.io import wavfile
 from tools.my_utils import load_audio
 from multiprocessing import cpu_count
@ -440,7 +437,7 @@ def open1a(inp_text,inp_wav_dir,exp_name,gpu_numbers,bert_pretrained_dir):
    global ps1a
    inp_text = my_utils.clean_path(inp_text)
    inp_wav_dir = my_utils.clean_path(inp_wav_dir)
-    check_for_exists([inp_text,inp_wav_dir])
+    check_for_exists([inp_text,inp_wav_dir], is_dataset_processing=True)
    if (ps1a == []):
        opt_dir="%s/%s"%(exp_root,exp_name)
        config={
@ -502,7 +499,7 @@ def open1b(inp_text,inp_wav_dir,exp_name,gpu_numbers,ssl_pretrained_dir):
    global ps1b
    inp_text = my_utils.clean_path(inp_text)
    inp_wav_dir = my_utils.clean_path(inp_wav_dir)
-    check_for_exists([inp_text,inp_wav_dir])
+    check_for_exists([inp_text,inp_wav_dir], is_dataset_processing=True)
    if (ps1b == []):
        config={
            "inp_text":inp_text,
@ -550,7 +547,7 @@ ps1c=[]
 def open1c(inp_text,exp_name,gpu_numbers,pretrained_s2G_path):
    global ps1c
    inp_text = my_utils.clean_path(inp_text)
-    check_for_exists([inp_text])
+    check_for_exists([inp_text,''], is_dataset_processing=True)
    if (ps1c == []):
        opt_dir="%s/%s"%(exp_root,exp_name)
        config={
@ -746,8 +743,8 @@ def switch_version(version_):
        gr.Warning(i18n(f'未下载{version.upper()}模型'))
    return  {'__type__':'update', 'value':pretrained_sovits_name[-int(version[-1])+2]}, {'__type__':'update', 'value':pretrained_sovits_name[-int(version[-1])+2].replace("s2G","s2D")}, {'__type__':'update', 'value':pretrained_gpt_name[-int(version[-1])+2]}, {'__type__':'update', 'value':pretrained_gpt_name[-int(version[-1])+2]}, {'__type__':'update', 'value':pretrained_sovits_name[-int(version[-1])+2]}
-def check_for_exists(file_list=[],is_train=False):
+def check_for_exists(file_list=None,is_train=False,is_dataset_processing=False):
-    _=[]
+    missing_files=[]
    if is_train == True and file_list:
        file_list.append(os.path.join(file_list[0],'2-name2text.txt'))
        file_list.append(os.path.join(file_list[0],'3-bert'))
@ -756,24 +753,28 @@ def check_for_exists(file_list=[],is_train=False):
        file_list.append(os.path.join(file_list[0],'6-name2semantic.tsv'))
    for file in file_list:
        if os.path.exists(file):pass
-        else:_.append(file)
+        else:missing_files.append(file)
-    if _:
+    if missing_files:
        if is_train:
-            for i in _:
+            for missing_file in missing_files:
-                if i != '':
+                if missing_file != '':
-                    gr.Warning(i)
+                    gr.Warning(missing_file)
            gr.Warning(i18n('以下文件或文件夹不存在:'))
        else:
-            if len(_) == 1:
+            for missing_file in missing_files:
-                if _[0]:
+                if missing_file != '':
-                    gr.Warning(i)
+                    gr.Warning(missing_file)
-                gr.Warning(i18n('文件或文件夹不存在:'))
+            if file_list[-1]==[''] and is_dataset_processing:
                pass
            else:
                for i in _:
                    if i != '':
                        gr.Warning(i)
                gr.Warning(i18n('以下文件或文件夹不存在:'))
 if os.path.exists('GPT_SoVITS/text/G2PWModel'):...
 else:
    cmd = '"%s" GPT_SoVITS/download.py'%python_exec
    p = Popen(cmd, shell=True)
    p.wait()
 with gr.Blocks(title="GPT-SoVITS WebUI") as app:
    gr.Markdown(
        value=