Merge pull request #400 from ShadowLoveElysia/main

添加了基于的Faster Whisper多语言自动打标,添加了预处理仅使用命令行操作的README
This commit is contained in:
RVC-Boss 2024-02-07 16:44:17 +08:00 committed by GitHub
commit 7e0789c292
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
3 changed files with 76 additions and 1 deletions

View File

@ -197,8 +197,40 @@ D:\GPT-SoVITS\xxx/xxx.wav|xxx|en|I like playing Genshin.
- [ ] better sovits base model (enhanced audio quality)
- [ ] model mix
## (Optional) If you need, here will provide the command line operation mode
Use the command line to open the WebUI for UVR5
```
python tools/uvr5/webui.py "<infer_device>" <is_half> <webui_port_uvr5>
```
If you can't open a browser, follow the format below for UVR processing,This is using mdxnet for audio processing
```
python mdxnet.py --model --input_root --output_vocal --output_ins --agg_level --format --device --is_half_precision
```
This is how the audio segmentation of the dataset is done using the command line
```
python audio_slicer.py \
--input_path "<path_to_original_audio_file_or_directory>" \
--output_root "<directory_where_subdivided_audio_clips_will_be_saved>" \
--threshold <volume_threshold> \
--min_length <minimum_duration_of_each_subclip> \
--min_interval <shortest_time_gap_between_adjacent_subclips>
--hop_size <step_size_for_computing_volume_curve>
```
This is how dataset ASR processing is done using the command line(Only Chinese)
```
python tools/damo_asr/cmd-asr.py "<Path to the directory containing input audio files>"
```
ASR processing is performed through Faster_Whisper(ASR marking except Chinese)
(No progress bars, GPU performance may cause time delays)
```
python ./tools/damo_asr/WhisperASR.py -i <input> -o <output> -f <file_name.list> -l <language>
```
A custom list save path is enabled
## Credits
Special thanks to the following projects and contributors:
- [ar-vits](https://github.com/innnky/ar-vits)

View File

@ -9,7 +9,7 @@ gradio_client==0.8.1
ffmpeg-python
onnxruntime
tqdm
funasr>=1.0.0
funasr==1.0.0
cn2an
pypinyin
pyopenjtalk
@ -24,3 +24,4 @@ psutil
jieba_fast
jieba
LangSegment
Faster_Whisper

View File

@ -0,0 +1,42 @@
import os
import argparse
import os
os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"
from glob import glob
from faster_whisper import WhisperModel
def main(input_folder, output_folder, output_filename, language):
model = WhisperModel("large-v3", device="cuda", compute_type="float16")
output_file = os.path.join(output_folder, output_filename)
if not os.path.exists(output_folder):
os.makedirs(output_folder)
with open(output_file, 'w', encoding='utf-8') as f:
for file in glob(os.path.join(input_folder, '**/*.wav'), recursive=True):
segments, _ = model.transcribe(file, beam_size=10, vad_filter=True,
vad_parameters=dict(min_silence_duration_ms=700), language=language)
segments = list(segments)
filename = os.path.basename(file).replace('.wav', '')
directory = os.path.dirname(file)
result_line = f"{file}|{language.upper()}|{segments[0].text}\n"
f.write(result_line)
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument("-i", "--input_folder", type=str, required=True,
help="Path to the folder containing WAV files.")
parser.add_argument("-o", "--output_folder", type=str, required=True, help="Output folder to store transcriptions.")
parser.add_argument("-f", "--output_filename", type=str, default="transcriptions.txt", help="Name of the output text file.")
parser.add_argument("-l", "--language", type=str, default='zh', choices=['zh', 'en', ...],
help="Language of the audio files.")
cmd = parser.parse_args()
input_folder = cmd.input_folder
output_folder = cmd.output_folder
output_filename = cmd.output_filename
language = cmd.language
main(input_folder, output_folder, output_filename, language)