54 Commits

Author SHA1 Message Date
baicai-1145
c94de2f2cb Enhance TTS audio processing with improved resampling and profiling metrics
Refactor the audio preparation workflow to utilize torchaudio for resampling, replacing librosa for better performance. Introduce a caching mechanism for resampling transforms and update the PrepareRefSemanticBatchWorker to include detailed timing metrics for profiling. Additionally, implement a new CPU limiter for managing resource allocation during audio processing. These changes improve the efficiency and maintainability of the TTS system.
2026-03-13 16:45:00 +08:00
baicai-1145
bc1f3f32de Enhance audio processing in TTS framework with resampling and profiling improvements
Add resampling capabilities using torchaudio to prepare reference audio at 16kHz, replacing librosa for better performance. Introduce a caching mechanism for resampling transforms to optimize resource usage. Update batch processing methods to include timing metrics for profiling, enhancing the ability to monitor and improve performance in the TTS system. This update improves the maintainability and efficiency of audio preparation workflows.
2026-03-13 02:03:25 +08:00
baicai-1145
17cb2e5acf Implement G2PW processing enhancements in TTS framework
Add support for G2PW processing in the TTS system by introducing new methods and classes for handling G2PW segments. Update PrepareCoordinator to manage G2PW worker threads and integrate G2PW profiling into the existing framework. Enhance text preprocessing to identify segments requiring G2PW and streamline the resolution of these segments. This update improves the overall performance and maintainability of the TTS system by optimizing the handling of Chinese text processing.
2026-03-12 23:04:39 +08:00
baicai-1145
5cf68a91d3 Add g2pw submodule and enhance TTS processing with AsyncStageGate
Introduce a new submodule for g2pw and implement AsyncStageGate in PrepareCoordinator to manage concurrent task inflight limits. Update PrepareTextCpuWorker and PrepareRefSemanticBatchWorker to support asynchronous task submission and completion notifications. Enhance profiling capabilities in TTS to track g2pw processing times, improving overall performance and maintainability of the TTS system.
2026-03-12 23:03:33 +08:00
baicai-1145
6a822b28c3 Enhance TTS API with improved request handling and asynchronous processing
Refactor api_v2.py and api_v3.py to update sampling parameters and weight paths for better clarity and support for v3/v4 vocoders. Introduce new methods in PrepareCoordinator for handling empty text features and improve profiling capabilities. Additionally, update unified engine components to streamline audio processing and state management, enhancing overall performance and maintainability of the TTS system.
2026-03-12 01:27:19 +08:00
baicai-1145
d453a8e47c Add unified engine stage components for TTS processing orchestration
Introduce new modules including EngineDecodeStageMixin, EngineDispatchStageMixin, EngineFinalizeStageMixin, EnginePrepareStageMixin, and EngineStageFutureMixin. These components enhance the TTS framework by providing structured methods for managing engine stages, including decoding, dispatching, finalizing, and preparing tasks. The new architecture supports improved state management and asynchronous operations, significantly enhancing the maintainability and performance of the TTS system.
2026-03-11 21:15:19 +08:00
baicai-1145
a3a5aad157 Add unified engine components for TTS processing and state management
Introduce new modules including unified_engine_component_models, unified_engine_component_policy, unified_engine_component_registry, unified_engine_component_runtime, unified_engine_worker_completion, and unified_engine_worker_decode. These additions enhance the TTS framework by providing structured models for request handling, engine policies, and worker execution, significantly improving the architecture and maintainability of the system. The new components support asynchronous operations and optimize overall performance through better state management and processing capabilities.
2026-03-11 20:49:41 +08:00
baicai-1145
3fd4f48651 Add unified engine API modules for direct and scheduler-based TTS processing
Introduce new modules including unified_engine_api_direct, unified_engine_api_profile, unified_engine_api_request, and unified_engine_api_scheduler. These additions enhance the TTS system by providing structured interfaces for direct TTS execution and scheduler-based processing. The new components support improved request handling, profiling, and state management, significantly enhancing the architecture and maintainability of the TTS framework.
2026-03-11 18:36:24 +08:00
baicai-1145
b046a093d3 Add unified engine delegates and orchestration components for enhanced TTS processing
Introduce new modules including EngineApiDelegates, EngineBridgeDelegates, EngineRegistryBridgeFacade, EngineRuntimeBridgeFacade, EngineStageBridgeFacade, and EngineStageOrchestrator. These additions provide a structured approach to managing TTS requests, engine states, and orchestration, significantly improving the architecture and maintainability of the TTS system. The new components support asynchronous operations and enhance overall performance through better request handling and processing capabilities.
2026-03-11 18:35:47 +08:00
baicai-1145
800f01790e Refactor EngineApiFacade and EngineApiDelegates for improved method naming and structure
Rename several methods in EngineApiFacade to follow a consistent private naming convention, enhancing code clarity. Update EngineApiDelegates to remove redundant method definitions, streamlining the interface. Introduce EnginePublicInterface to encapsulate public API methods, improving organization and maintainability of the TTS system. Additionally, update the EngineCompositionBuilder to use the new scheduler worker state retrieval method.
2026-03-11 17:58:20 +08:00
baicai-1145
d1ec7d9e54 Add unified engine components and API for enhanced TTS processing
Introduce multiple new modules including unified_engine_api, unified_engine_audio, unified_engine_bridge, unified_engine_builder, unified_engine_components, unified_engine_delegates, and unified_engine_runtime. These additions provide a comprehensive framework for managing TTS requests, audio packing, and engine state management, significantly improving the architecture and maintainability of the TTS system. The new structure supports asynchronous operations and enhances overall performance through better request handling and processing capabilities.
2026-03-11 08:32:56 +08:00
baicai-1145
06d6b67f73 Add PreparedCpuStage data class and refactor prepare_cpu_stage_profiled_async method in PrepareCoordinator for improved CPU profiling. Introduce prepare_gpu_stage_profiled_async method to streamline GPU stage preparation using the new data class, enhancing overall performance and maintainability. 2026-03-11 05:29:30 +08:00
baicai-1145
6a427b4f54 Update TTS API to support asynchronous execution by replacing synchronous TTS calls with asynchronous counterparts in both api_v2.py and api_v3.py. Introduce new data classes in unified_engine.py for enhanced request handling and state management, improving overall system performance and maintainability. 2026-03-10 21:25:14 +08:00
baicai-1145
69ac7f9027 Integrate UnifiedTTSEngine into TTS API for improved audio processing and control. Refactor tts_handle and control endpoints to utilize the new engine, enhancing error handling and response management. Update set_refer_audio and set_gpt_weights endpoints to return payloads from the engine, streamlining audio configuration processes. 2026-03-10 06:59:28 +08:00
baicai-1145
827d6ea47c Refactor TTS and scheduler components to enhance text processing and batching capabilities. Introduce PrepareCoordinator for managing text feature preparation asynchronously, and update SchedulerDebugWorker to support new finalize task management. Implement batch processing in PrepareBertBatchWorker with improved admission control and profiling metrics. Add text CPU preprocessing utilities for better text segmentation and normalization. 2026-03-10 06:58:53 +08:00
baicai-1145
a45e171ff5 Enhance sampling functions in TTS by adding support for previous token masks in logits_to_probs. Implement batch processing for sampling with padded token sequences and contiguous sampling groups. Refactor sampling logic in T2S scheduler to utilize new functionalities, improving efficiency and flexibility in token generation. 2026-03-09 21:24:16 +08:00
baicai-1145
845b181360 Implement batch processing for BERT and reference semantic tasks in TTS. Introduce StageLimiter for managing concurrent processing and enhance the TTS class with new methods for handling audio and semantic extraction. Update profiling metrics for better performance tracking during inference. 2026-03-09 05:19:28 +08:00
baicai-1145
d245eb169c Refactor T2S scheduler and inference handling to improve attention mask management and memory tracking. Update T2SRunningRequest and T2SActiveBatch classes to include optional key padding masks. Introduce new benchmarking tools for API performance and memory usage analysis, enhancing overall system efficiency. 2026-03-09 01:42:04 +08:00
baicai-1145
dc37b0b9ef Add WebAPI documentation and implement TTS API with endpoints for text-to-speech inference, control commands, and model switching. Enhance TTS class with methods for extracting prompt semantics and reference audio specifications. Introduce a scheduler prototype for managing T2S requests. 2026-03-09 00:22:59 +08:00
baicai-1145
30a4557d8d Implement last inference statistics tracking in Text2SemanticDecoder and enhance TTS class with prompt semantic extraction. This includes methods for setting and retrieving inference stats, as well as improvements to audio processing and feature extraction in TTS. 2026-03-08 23:08:27 +08:00
ChasonJiang
abe984395c
对齐gpt topk默认采样参数 (#2696) 2025-12-19 16:05:36 +08:00
ChasonJiang
36b3231c6f
bug fix (#2689) 2025-12-15 14:23:06 +08:00
ChasonJiang
92ab59c553
更细粒度的流式推理模式 (#2671)
* 更好的流式推理模式

* 清理无用代码

* modified:   GPT_SoVITS/AR/models/t2s_model.py
	modified:   GPT_SoVITS/TTS_infer_pack/TTS.py
	modified:   GPT_SoVITS/module/models.py

* modified:   GPT_SoVITS/TTS_infer_pack/TTS.py

* modified:   .gitignore
	modified:   GPT_SoVITS/AR/models/t2s_model.py
	modified:   GPT_SoVITS/TTS_infer_pack/TTS.py
	modified:   GPT_SoVITS/module/models.py

* modified:   GPT_SoVITS/AR/models/t2s_model.py
	modified:   GPT_SoVITS/TTS_infer_pack/TTS.py
	modified:   GPT_SoVITS/module/models.py
	modified:   api_v2.py

* modified:   GPT_SoVITS/TTS_infer_pack/TTS.py

* 更正拼写错误

* 支持固定chunk长度的流式推理,优化sola算法

* 修复api_v2的ogg格式传输问题
2025-11-28 21:12:41 +08:00
ChasonJiang
b9211657d8
优化TTS_Config的代码逻辑 (#2536)
* 优化TTS_Config的代码逻辑

* 在载入vits权重之后保存tts_config
2025-07-18 11:54:40 +08:00
RVC-Boss
4d8ebf8523
Update TTS.py 2025-07-16 15:43:26 +08:00
jiangsier-xyz
e476b01f30
解决 TTS.py 无法识别真正支持版本 v2Pro、v2ProPlus 的问题 (#2490)
同时更新一版默认配置。

Co-authored-by: jiangsier-xyz <jiangsier131@gmail.com>
2025-07-16 15:42:36 +08:00
KamioRinn
6df61f58e4
语言分割及格式化优化 (#2488)
* better LangSegmenter

* add version num2str

* better version num2str

* sync fast infer

* sync api

* remove duplicate spaces

* remove unnecessary code

---------

Co-authored-by: RVC-Boss <129054828+RVC-Boss@users.noreply.github.com>
2025-06-27 11:58:41 +08:00
XXXXRT666
6fdc67ca83
Fix bugs in install.sh, reduce log noise, and improve error reporting (#2464)
* Update Install.sh

* Format Code

* Delete dev null

* Update README, Support Dark Mode in CSS/JS
2025-06-17 15:21:36 +08:00
RVC-Boss
cd6de7398e
Merge pull request #2449 from KamioRinn/maga
support v4 v2Pro v2ProPlus for api & optimize LangSegmenter
2025-06-11 10:29:39 +08:00
YYuX-1145
dd2b9253aa
Update TTS.py (#2450) 2025-06-11 10:28:42 +08:00
KamioRinn
746cb536c6 Fix LangSegmenter 2025-06-10 19:18:05 +08:00
Jialiang Zhu
035dcbad03
Fix AttributeError when prompt_cache['refer_spec'][0] is a tuple (#2428)
Co-authored-by: tzrain <tz_rain@foxmail.com>
2025-06-05 10:55:21 +08:00
RVC-Boss
584fcae9a5
support sovits v2Pro v2ProPlus
support sovits v2Pro v2ProPlus
2025-06-04 15:25:52 +08:00
RVC-Boss
92819d0b31
support sovits v2Pro v2ProPlus
support sovits v2Pro v2ProPlus
2025-06-04 15:19:20 +08:00
XXXXRT666
d5e479dad6
Introduce Docker and Windows CI Workflow, Pre-commit Formatting, and Language Resource Auto-Download (#2351)
* Docker Auto-Build Workflow

* Rename

* Update

* Fix Bugs

* Disable Progress Bar When workflows triggered

* Fix Wget

* Fix Bugs

* Fix Bugs

* Update Wget

* Update Workflows

* Accelerate Docker Image Building

* Fix Install.sh

* Add Skip-Check For Action Runner

* Fix Dockerfile

* .

* .

* .

* .

* Delete File in Runner

* Add Sort

* Delete More Files

* Delete More

* .

* .

* .

* Add Pre-Commit Hook
Update Docker

* Add Code Spell Check

* [pre-commit.ci] trigger

* [pre-commit.ci] trigger

* [pre-commit.ci] trigger

* Fix Bugs

* .

* Disable Progress Bar and Logs while using GitHub Actions

* .

* .

* Fix Bugs

* update conda

* fix bugs

* Fix Bugs

* fix bugs

* .

* .

* Quiet Installation

* fix bugs

* .

* fix bug

* .

* Fix pre-commit.ci and Docker

* fix bugs

* .

* Update Docker & Pre-Commit

* fix  bugs

* Update Req

* Update Req

* Update OpenCC

* update precommit

* .

* Update .pre-commit-config.yaml

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update Docs and fix bugs

* Fix \

* Fix MacOS

* .

* test

* .

* Add Tag Alias

* .

* fix bugs

* fix bugs

* make image smaller

* update pre-commit config

* .

* .

* fix bugs

* use miniconda

* Fix Wrong Path

* .

* debug

* debug

* revert

* Fix Bugs

* Update Docs, Add Dict Auto Download in install.sh

* update docker_build

* Update Docs for Install.sh

* update docker docs about architecture

* Add Xcode-Commandline-Tool Installation

* Update Docs

1. Add Missing VC17
2. Modufied the Order of FFmpeg Installation and Requirements Installation
3. Remove Duplicate FFmpeg

* Fix Wrong Cuda Version

* Update TESTED ENV

* Add PYTHONNOUSERSITE(-s)

* Fix Wrapper

* Update install.sh For Robustness

* Ignore .git

* Preload CUDNN For Ctranslate2

* Remove Gradio Warnings

* Update Colab

* Fix OpenCC Problems

* Update Win DLL Strategy

* Fix Onnxruntime-gpu NVRTC Error

* Fix Path Problems

* Add Windows Packages Workflow

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* .

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* WIP

* Fix Path

* Fix Path

* Enable Logging

* Set 7-Zip compression level to maximum (-mx=9)

* Use Multithread in ONNX Session

* Fix Tag Bugs

* Add Time

* Add Time

* Add Time

* Compress More

* Copy DLL to Solve VC Runtime DLL Missing Issues

* Expose FFmpeg Errors, Copy Only Part of Visual C++ Runtime

* Update build_windows_packages.ps1

* Update build_windows_packages.ps1

* Update build_windows_packages.ps1

* Update build_windows_packages.ps1

* WIP

* WIP

* WIP

* Update build_windows_packages.ps1

* Update install.sh

* Update build_windows_packages.ps1

* Update docker-publish.yaml

* Update install.sh

* Update Dockerfile

* Update docker_build.sh

* Update miniconda_install.sh

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update Colab-WebUI.ipynb

* Update Colab-Inference.ipynb

* Update docker-compose.yaml

* 更新 build_windows_packages.ps1

* Update install.sh

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-05-26 10:45:14 +08:00
ChasonJiang
a19f49604f
修复v3传参 (#2309) 2025-04-22 10:10:44 +08:00
RVC-Boss
590c83d766
修复v3推理传参问题 2025-04-22 00:20:33 +08:00
ChasonJiang
e0f2818df7
为并行推理版本适配v4 (#2307)
* 适配v4版本

* 适配v4版本

* modified:   GPT_SoVITS/inference_webui_fast.py

* 合并main分支

* fallback config

* modified:   GPT_SoVITS/TTS_infer_pack/TTS.py

* fix bug

* modified:   GPT_SoVITS/TTS_infer_pack/TTS.py

* modified:   GPT_SoVITS/inference_webui_fast.py
2025-04-21 23:20:20 +08:00
RVC-Boss
839ff9ce5b
适配v4并行推理(还没写完) 2025-04-21 22:43:46 +08:00
XXXXRT666
53cac93589
Refactor: Format Code with Ruff and Update Deprecated G2PW Link (#2255)
* ruff check --fix

* ruff format --line-length 120 --target-version py39

* Change the link for G2PW Model

* update pytorch version and colab
2025-04-07 16:42:47 +08:00
RVC-Boss
7abae557fb
删除加载v3sovits模型缺少enc_q告警
删除加载v3sovits模型缺少enc_q告警
2025-04-01 16:31:15 +08:00
ChasonJiang
03b662a769
为sovits_v3 适配并行推理 (#2241)
* 为sovits_v3 适配并行推理

* 清理无用代码
2025-03-31 11:56:05 +08:00
lishq
fef65d40fe
fix: prevent concurrent access to BERT model with thread lock (#2165)
Added thread lock to protect get_phones_and_bert method from potential race conditions during concurrent access. This addresses issue #1844 where multiple threads accessing the BERT model simultaneously could cause data inconsistency or crashes.

Co-authored-by: RVC-Boss <129054828+RVC-Boss@users.noreply.github.com>
2025-03-26 15:03:36 +08:00
ChasonJiang
7394dc7b0c
为api_v2和inference_webui_fast适配V3版本 (#2188)
* modified:   GPT_SoVITS/TTS_infer_pack/TTS.py
	modified:   GPT_SoVITS/TTS_infer_pack/TextPreprocessor.py
	modified:   GPT_SoVITS/inference_webui_fast.py

* 适配V3版本

* api_v2.py和inference_webui_fast.py的v3适配

* 修改了个远古bug,增加了更友好的提示信息

* 优化webui

* 修改为正确的path

* 修复v3 lora模型的载入问题

* 修复读取tts_infer.yaml文件时遇到的编码不匹配的问题
2025-03-26 14:34:51 +08:00
ChasonJiang
165882d64f
修复多余的注释导致的bug (#2158) 2025-03-05 18:22:01 +08:00
ChasonJiang
053a356ffe
修复gpt的padding mask的问题 (#2153)
* 修复gpt的padding mask的问题

* rollback tts_config
2025-03-05 17:14:43 +08:00
ChasonJiang
6dd2f72090
更改gpt并行推理时的mask策略为padding left (#2144)
* 更改gpt并行推理时的mask策略为padding left,使batch_infer更接近于naive_infer
减少冗余操作并使用torch_sdpa,以提升推理速度

* rollback tts_infer.yaml
2025-03-04 16:45:37 +08:00
RVC-Boss
a68e3c4354
Update TTS.py 2025-02-27 22:14:51 +08:00
KamioRinn
c17dd642c7
Add en_normalization and fix LangSegmenter (#2062) 2025-02-17 18:41:30 +08:00
StaryLan
15cbd1b673
Update Documentation (#2032)
* add exception handling

* Fill in the missing content

* remove GB code

* Simplify i18n text and remove trailing spaces

* ignore v3 model dir

* Update Changelog

* Fix encoding
2025-02-12 11:27:35 +08:00