GPT-SoVITS/tools/asr/fasterwhisper_asr.py
ChasonJiang 29f22115fb
[fast_inference] 回退策略,减少padding影响,开放选项,同步代码 (#986)
* Update README

* Optimize-English-G2P

* docs: change akward expression

* docs: update Changelog_KO.md

* Fix CN punc in EN,add 's match

* Adjust normalize and g2p logic

* Update zh_CN.json

* Update README (#827)

Update README.md
Update some outdated file paths and commands

* 修复英文多音字,调整字典热加载,新增姓名匹配 (#869)

* Fix homograph dict

* Add JSON in dict

* Adjust hot dict to hot reload

* Add English name dict

* Adjust get name dict logic

* Make API Great Again (#894)

* Add zh/jp/en mix

* Optimize code readability and formatted output.

* Try OGG streaming

* Add stream mode arg

* Add media type arg

* Add cut punc arg

* Eliminate punc risk

* Update README (#895)

* Update README

* Update README

* update README

* update README

* fix typo s/Licence /License (#904)

* fix reformat cmd (#917)

Co-authored-by: starylan <starylan@outlook.com>

* Update README.md

* Normalize chinese arithmetic operations (#947)

* 改变训练和推理时的mask策略,以修复当batch_size>1时,产生的复读现象

* 同步main分支代码,增加“保持随机”选项

* 在colab中运行colab_webui.ipynb发生的uvr5模型缺失问题 (#968)

在colab中使用git下载uvr5模型时报错:
fatal: destination path 'uvr5_weights' already exists and is not an empty directory.
通过在下载前将原本从本仓库下载的uvr5_weights文件夹删除可以解决问题。

* [ASR] 修复FasterWhisper遍历输入路径失败 (#956)

* remove glob

* rename

* reset mirror pos

* 回退mask策略;
回退pad策略;
在T2SBlock中添加padding_mask,以减少pad的影响;
开放repetition_penalty参数,让用户自行调整重复惩罚的强度;
增加parallel_infer参数,用于开启或关闭并行推理,关闭时与0307版本保持一致;
在webui中增加“保持随机”选项;
同步main分支代码。

* 删除无用注释

---------

Co-authored-by: Lion <drain.daters.0p@icloud.com>
Co-authored-by: RVC-Boss <129054828+RVC-Boss@users.noreply.github.com>
Co-authored-by: KamioRinn <snowsdream@live.com>
Co-authored-by: Pengoose <pengoose_dev@naver.com>
Co-authored-by: Yuan-Man <68322456+Yuan-ManX@users.noreply.github.com>
Co-authored-by: XXXXRT666 <157766680+XXXXRT666@users.noreply.github.com>
Co-authored-by: KamioRinn <63162909+KamioRinn@users.noreply.github.com>
Co-authored-by: Lion-Wu <130235128+Lion-Wu@users.noreply.github.com>
Co-authored-by: digger yu <digger-yu@outlook.com>
Co-authored-by: SapphireLab <36986837+SapphireLab@users.noreply.github.com>
Co-authored-by: starylan <starylan@outlook.com>
Co-authored-by: shadow01a <141255649+shadow01a@users.noreply.github.com>
2024-04-19 14:35:28 +08:00

115 lines
4.3 KiB
Python

import argparse
import os
import traceback
os.environ["HF_ENDPOINT"] = "https://hf-mirror.com"
os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"
import torch
from faster_whisper import WhisperModel
from tqdm import tqdm
from tools.asr.config import check_fw_local_models
language_code_list = [
"af", "am", "ar", "as", "az",
"ba", "be", "bg", "bn", "bo",
"br", "bs", "ca", "cs", "cy",
"da", "de", "el", "en", "es",
"et", "eu", "fa", "fi", "fo",
"fr", "gl", "gu", "ha", "haw",
"he", "hi", "hr", "ht", "hu",
"hy", "id", "is", "it", "ja",
"jw", "ka", "kk", "km", "kn",
"ko", "la", "lb", "ln", "lo",
"lt", "lv", "mg", "mi", "mk",
"ml", "mn", "mr", "ms", "mt",
"my", "ne", "nl", "nn", "no",
"oc", "pa", "pl", "ps", "pt",
"ro", "ru", "sa", "sd", "si",
"sk", "sl", "sn", "so", "sq",
"sr", "su", "sv", "sw", "ta",
"te", "tg", "th", "tk", "tl",
"tr", "tt", "uk", "ur", "uz",
"vi", "yi", "yo", "zh", "yue",
"auto"]
def execute_asr(input_folder, output_folder, model_size, language, precision):
if '-local' in model_size:
model_size = model_size[:-6]
model_path = f'tools/asr/models/faster-whisper-{model_size}'
else:
model_path = model_size
if language == 'auto':
language = None #不设置语种由模型自动输出概率最高的语种
print("loading faster whisper model:",model_size,model_path)
device = 'cuda' if torch.cuda.is_available() else 'cpu'
try:
model = WhisperModel(model_path, device=device, compute_type=precision)
except:
return print(traceback.format_exc())
input_file_names = os.listdir(input_folder)
input_file_names.sort()
output = []
output_file_name = os.path.basename(input_folder)
for file_name in tqdm(input_file_names):
try:
file_path = os.path.join(input_folder, file_name)
segments, info = model.transcribe(
audio = file_path,
beam_size = 5,
vad_filter = True,
vad_parameters = dict(min_silence_duration_ms=700),
language = language)
text = ''
if info.language == "zh":
print("检测为中文文本, 转 FunASR 处理")
if("only_asr"not in globals()):
from tools.asr.funasr_asr import \
only_asr # #如果用英文就不需要导入下载模型
text = only_asr(file_path)
if text == '':
for segment in segments:
text += segment.text
output.append(f"{file_path}|{output_file_name}|{info.language.upper()}|{text}")
except:
return print(traceback.format_exc())
output_folder = output_folder or "output/asr_opt"
os.makedirs(output_folder, exist_ok=True)
output_file_path = os.path.abspath(f'{output_folder}/{output_file_name}.list')
with open(output_file_path, "w", encoding="utf-8") as f:
f.write("\n".join(output))
print(f"ASR 任务完成->标注文件路径: {output_file_path}\n")
return output_file_path
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument("-i", "--input_folder", type=str, required=True,
help="Path to the folder containing WAV files.")
parser.add_argument("-o", "--output_folder", type=str, required=True,
help="Output folder to store transcriptions.")
parser.add_argument("-s", "--model_size", type=str, default='large-v3',
choices=check_fw_local_models(),
help="Model Size of Faster Whisper")
parser.add_argument("-l", "--language", type=str, default='ja',
choices=language_code_list,
help="Language of the audio files.")
parser.add_argument("-p", "--precision", type=str, default='float16', choices=['float16','float32'],
help="fp16 or fp32")
cmd = parser.parse_args()
output_file_path = execute_asr(
input_folder = cmd.input_folder,
output_folder = cmd.output_folder,
model_size = cmd.model_size,
language = cmd.language,
precision = cmd.precision,
)