415 Commits

Author SHA1 Message Date
baicai-1145
c94de2f2cb Enhance TTS audio processing with improved resampling and profiling metrics
Refactor the audio preparation workflow to utilize torchaudio for resampling, replacing librosa for better performance. Introduce a caching mechanism for resampling transforms and update the PrepareRefSemanticBatchWorker to include detailed timing metrics for profiling. Additionally, implement a new CPU limiter for managing resource allocation during audio processing. These changes improve the efficiency and maintainability of the TTS system.
2026-03-13 16:45:00 +08:00
baicai-1145
bc1f3f32de Enhance audio processing in TTS framework with resampling and profiling improvements
Add resampling capabilities using torchaudio to prepare reference audio at 16kHz, replacing librosa for better performance. Introduce a caching mechanism for resampling transforms to optimize resource usage. Update batch processing methods to include timing metrics for profiling, enhancing the ability to monitor and improve performance in the TTS system. This update improves the maintainability and efficiency of audio preparation workflows.
2026-03-13 02:03:25 +08:00
baicai-1145
17cb2e5acf Implement G2PW processing enhancements in TTS framework
Add support for G2PW processing in the TTS system by introducing new methods and classes for handling G2PW segments. Update PrepareCoordinator to manage G2PW worker threads and integrate G2PW profiling into the existing framework. Enhance text preprocessing to identify segments requiring G2PW and streamline the resolution of these segments. This update improves the overall performance and maintainability of the TTS system by optimizing the handling of Chinese text processing.
2026-03-12 23:04:39 +08:00
baicai-1145
5cf68a91d3 Add g2pw submodule and enhance TTS processing with AsyncStageGate
Introduce a new submodule for g2pw and implement AsyncStageGate in PrepareCoordinator to manage concurrent task inflight limits. Update PrepareTextCpuWorker and PrepareRefSemanticBatchWorker to support asynchronous task submission and completion notifications. Enhance profiling capabilities in TTS to track g2pw processing times, improving overall performance and maintainability of the TTS system.
2026-03-12 23:03:33 +08:00
baicai-1145
6a822b28c3 Enhance TTS API with improved request handling and asynchronous processing
Refactor api_v2.py and api_v3.py to update sampling parameters and weight paths for better clarity and support for v3/v4 vocoders. Introduce new methods in PrepareCoordinator for handling empty text features and improve profiling capabilities. Additionally, update unified engine components to streamline audio processing and state management, enhancing overall performance and maintainability of the TTS system.
2026-03-12 01:27:19 +08:00
baicai-1145
d453a8e47c Add unified engine stage components for TTS processing orchestration
Introduce new modules including EngineDecodeStageMixin, EngineDispatchStageMixin, EngineFinalizeStageMixin, EnginePrepareStageMixin, and EngineStageFutureMixin. These components enhance the TTS framework by providing structured methods for managing engine stages, including decoding, dispatching, finalizing, and preparing tasks. The new architecture supports improved state management and asynchronous operations, significantly enhancing the maintainability and performance of the TTS system.
2026-03-11 21:15:19 +08:00
baicai-1145
a3a5aad157 Add unified engine components for TTS processing and state management
Introduce new modules including unified_engine_component_models, unified_engine_component_policy, unified_engine_component_registry, unified_engine_component_runtime, unified_engine_worker_completion, and unified_engine_worker_decode. These additions enhance the TTS framework by providing structured models for request handling, engine policies, and worker execution, significantly improving the architecture and maintainability of the system. The new components support asynchronous operations and optimize overall performance through better state management and processing capabilities.
2026-03-11 20:49:41 +08:00
baicai-1145
3fd4f48651 Add unified engine API modules for direct and scheduler-based TTS processing
Introduce new modules including unified_engine_api_direct, unified_engine_api_profile, unified_engine_api_request, and unified_engine_api_scheduler. These additions enhance the TTS system by providing structured interfaces for direct TTS execution and scheduler-based processing. The new components support improved request handling, profiling, and state management, significantly enhancing the architecture and maintainability of the TTS framework.
2026-03-11 18:36:24 +08:00
baicai-1145
b046a093d3 Add unified engine delegates and orchestration components for enhanced TTS processing
Introduce new modules including EngineApiDelegates, EngineBridgeDelegates, EngineRegistryBridgeFacade, EngineRuntimeBridgeFacade, EngineStageBridgeFacade, and EngineStageOrchestrator. These additions provide a structured approach to managing TTS requests, engine states, and orchestration, significantly improving the architecture and maintainability of the TTS system. The new components support asynchronous operations and enhance overall performance through better request handling and processing capabilities.
2026-03-11 18:35:47 +08:00
baicai-1145
800f01790e Refactor EngineApiFacade and EngineApiDelegates for improved method naming and structure
Rename several methods in EngineApiFacade to follow a consistent private naming convention, enhancing code clarity. Update EngineApiDelegates to remove redundant method definitions, streamlining the interface. Introduce EnginePublicInterface to encapsulate public API methods, improving organization and maintainability of the TTS system. Additionally, update the EngineCompositionBuilder to use the new scheduler worker state retrieval method.
2026-03-11 17:58:20 +08:00
baicai-1145
d1ec7d9e54 Add unified engine components and API for enhanced TTS processing
Introduce multiple new modules including unified_engine_api, unified_engine_audio, unified_engine_bridge, unified_engine_builder, unified_engine_components, unified_engine_delegates, and unified_engine_runtime. These additions provide a comprehensive framework for managing TTS requests, audio packing, and engine state management, significantly improving the architecture and maintainability of the TTS system. The new structure supports asynchronous operations and enhances overall performance through better request handling and processing capabilities.
2026-03-11 08:32:56 +08:00
baicai-1145
06d6b67f73 Add PreparedCpuStage data class and refactor prepare_cpu_stage_profiled_async method in PrepareCoordinator for improved CPU profiling. Introduce prepare_gpu_stage_profiled_async method to streamline GPU stage preparation using the new data class, enhancing overall performance and maintainability. 2026-03-11 05:29:30 +08:00
baicai-1145
6a427b4f54 Update TTS API to support asynchronous execution by replacing synchronous TTS calls with asynchronous counterparts in both api_v2.py and api_v3.py. Introduce new data classes in unified_engine.py for enhanced request handling and state management, improving overall system performance and maintainability. 2026-03-10 21:25:14 +08:00
baicai-1145
69ac7f9027 Integrate UnifiedTTSEngine into TTS API for improved audio processing and control. Refactor tts_handle and control endpoints to utilize the new engine, enhancing error handling and response management. Update set_refer_audio and set_gpt_weights endpoints to return payloads from the engine, streamlining audio configuration processes. 2026-03-10 06:59:28 +08:00
baicai-1145
827d6ea47c Refactor TTS and scheduler components to enhance text processing and batching capabilities. Introduce PrepareCoordinator for managing text feature preparation asynchronously, and update SchedulerDebugWorker to support new finalize task management. Implement batch processing in PrepareBertBatchWorker with improved admission control and profiling metrics. Add text CPU preprocessing utilities for better text segmentation and normalization. 2026-03-10 06:58:53 +08:00
baicai-1145
a45e171ff5 Enhance sampling functions in TTS by adding support for previous token masks in logits_to_probs. Implement batch processing for sampling with padded token sequences and contiguous sampling groups. Refactor sampling logic in T2S scheduler to utilize new functionalities, improving efficiency and flexibility in token generation. 2026-03-09 21:24:16 +08:00
baicai-1145
845b181360 Implement batch processing for BERT and reference semantic tasks in TTS. Introduce StageLimiter for managing concurrent processing and enhance the TTS class with new methods for handling audio and semantic extraction. Update profiling metrics for better performance tracking during inference. 2026-03-09 05:19:28 +08:00
baicai-1145
d245eb169c Refactor T2S scheduler and inference handling to improve attention mask management and memory tracking. Update T2SRunningRequest and T2SActiveBatch classes to include optional key padding masks. Introduce new benchmarking tools for API performance and memory usage analysis, enhancing overall system efficiency. 2026-03-09 01:42:04 +08:00
baicai-1145
dc37b0b9ef Add WebAPI documentation and implement TTS API with endpoints for text-to-speech inference, control commands, and model switching. Enhance TTS class with methods for extracting prompt semantics and reference audio specifications. Introduce a scheduler prototype for managing T2S requests. 2026-03-09 00:22:59 +08:00
baicai-1145
30a4557d8d Implement last inference statistics tracking in Text2SemanticDecoder and enhance TTS class with prompt semantic extraction. This includes methods for setting and retrieving inference stats, as well as improvements to audio processing and feature extraction in TTS. 2026-03-08 23:08:27 +08:00
baicai-1145
b250e62402 Enhance G2PW model input handling by introducing polyphonic context character support and updating the data preparation method to return additional query IDs. This improves the processing of polyphonic characters in sentences. 2026-03-08 03:01:20 +08:00
baicai-1145
800acd45ff Enhance G2P processing by implementing batch input handling in _g2p function, improving efficiency. Update prepare_onnx_input to utilize caching for tokenization and add optional parameters for character ID mapping and phoneme masks. Refactor G2PWOnnxConverter to streamline model loading and configuration management. 2026-03-07 05:47:22 +08:00
ChasonJiang
c767f0b83b
修复bug (#2704)
* 修复bug

* fallbak and bug fix
2025-12-30 16:00:21 +08:00
ChasonJiang
9080a967d5
修复采样错误 (#2703) 2025-12-30 15:21:03 +08:00
ChasonJiang
bfca0f6b2d
对齐naive_infer的解码策略,防止吞句 (#2697) 2025-12-19 17:37:19 +08:00
ChasonJiang
abe984395c
对齐gpt topk默认采样参数 (#2696) 2025-12-19 16:05:36 +08:00
ChasonJiang
36b3231c6f
bug fix (#2689) 2025-12-15 14:23:06 +08:00
RVC-Boss
cb00840c4e
Add files via upload 2025-11-28 22:02:03 +08:00
wzy3650
60a4a214af
vq distributed training support (#2577)
Co-authored-by: wangzeyuan <wangzeyuan@agora.io>
2025-11-28 21:57:13 +08:00
zzz
6375bbe316
尝试 stream infer (#2469)
* 尝试 stream infer

* 在 stream_infer 脚本中绘制生成的音频

* stream_infer 增加导出部分。

* stream_infer: 更方便找规律的图

* stream_infer: 在拼接音频时进行相关性搜索,减少拼接带来基频断裂的情况

* stream_infer: 导出 `find_best_audio_offset_fast`

* stream_infer: 优化波形显示,方便对比

* stream_v2pro.py 从命令行读取参数

* stream_v2pro.py 减少用于导出的文本长度

* stream_v2pro: 修复由于 spectrogram_torch 输入是 half 导致 spec 溢出最终没有声音的问题

* stream_v2pro: 新增 --lang 参数提示参考文字的语言类型
2025-11-28 21:36:57 +08:00
KamioRinn
e00ca92140
Fix ASMD (#2636) 2025-11-28 21:22:43 +08:00
ChasonJiang
92ab59c553
更细粒度的流式推理模式 (#2671)
* 更好的流式推理模式

* 清理无用代码

* modified:   GPT_SoVITS/AR/models/t2s_model.py
	modified:   GPT_SoVITS/TTS_infer_pack/TTS.py
	modified:   GPT_SoVITS/module/models.py

* modified:   GPT_SoVITS/TTS_infer_pack/TTS.py

* modified:   .gitignore
	modified:   GPT_SoVITS/AR/models/t2s_model.py
	modified:   GPT_SoVITS/TTS_infer_pack/TTS.py
	modified:   GPT_SoVITS/module/models.py

* modified:   GPT_SoVITS/AR/models/t2s_model.py
	modified:   GPT_SoVITS/TTS_infer_pack/TTS.py
	modified:   GPT_SoVITS/module/models.py
	modified:   api_v2.py

* modified:   GPT_SoVITS/TTS_infer_pack/TTS.py

* 更正拼写错误

* 支持固定chunk长度的流式推理,优化sola算法

* 修复api_v2的ogg格式传输问题
2025-11-28 21:12:41 +08:00
ChasonJiang
b5a67e6247
修复gpt的loss计算问题 (#2537)
* 修复gpt的loss计算问题

* fallback tts config
2025-07-18 14:59:59 +08:00
ChasonJiang
b9211657d8
优化TTS_Config的代码逻辑 (#2536)
* 优化TTS_Config的代码逻辑

* 在载入vits权重之后保存tts_config
2025-07-18 11:54:40 +08:00
RVC-Boss
2d09bbe63a
Update tts_infer.yaml 2025-07-16 15:44:04 +08:00
RVC-Boss
4d8ebf8523
Update TTS.py 2025-07-16 15:43:26 +08:00
jiangsier-xyz
e476b01f30
解决 TTS.py 无法识别真正支持版本 v2Pro、v2ProPlus 的问题 (#2490)
同时更新一版默认配置。

Co-authored-by: jiangsier-xyz <jiangsier131@gmail.com>
2025-07-16 15:42:36 +08:00
RVC-Boss
426e1a2bb4
提升推理进程优先级 2025-07-10 18:16:45 +08:00
RVC-Boss
4e3c69043c
Update inference_webui.py 2025-07-10 18:16:24 +08:00
Yixiao Chen
8c579d46dd
Update export_torch_script.py (#2494)
Avoid dtype inconsistency when exporting
2025-07-02 22:48:28 +08:00
KamioRinn
6df61f58e4
语言分割及格式化优化 (#2488)
* better LangSegmenter

* add version num2str

* better version num2str

* sync fast infer

* sync api

* remove duplicate spaces

* remove unnecessary code

---------

Co-authored-by: RVC-Boss <129054828+RVC-Boss@users.noreply.github.com>
2025-06-27 11:58:41 +08:00
KamioRinn
90ebefa78f
make sure ort providers available (#2489) 2025-06-27 10:41:52 +08:00
XXXXRT666
6fdc67ca83
Fix bugs in install.sh, reduce log noise, and improve error reporting (#2464)
* Update Install.sh

* Format Code

* Delete dev null

* Update README, Support Dark Mode in CSS/JS
2025-06-17 15:21:36 +08:00
zzz
7dec5f5bb0
Merge pull request #2460 from L-jasmine/export_v2pro
优化 torch_script 导出模型
2025-06-13 22:10:11 +08:00
csh
5c91e66d2e export_torch_script.py support v2Pro & v2ProPlus 2025-06-12 21:53:14 +08:00
RVC-Boss
ed89a02337
修复“修复ge.sum数值可能爆炸的”可能导致的训练爆炸的问题
修复“修复ge.sum数值可能爆炸的”可能导致的训练爆炸的问题
2025-06-11 23:14:52 +08:00
RVC-Boss
cd6de7398e
Merge pull request #2449 from KamioRinn/maga
support v4 v2Pro v2ProPlus for api & optimize LangSegmenter
2025-06-11 10:29:39 +08:00
YYuX-1145
dd2b9253aa
Update TTS.py (#2450) 2025-06-11 10:28:42 +08:00
KamioRinn
746cb536c6 Fix LangSegmenter 2025-06-10 19:18:05 +08:00
Emmanuel Ferdman
0d2f273402
Resolve Python Logger warnings (#2379)
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
2025-06-10 18:03:23 +08:00