GPT-SoVITS

mirror of https://github.com/RVC-Boss/GPT-SoVITS.git synced 2026-07-14 12:13:30 +08:00

Author	SHA1	Message	Date
baicai-1145	c94de2f2cb	Enhance TTS audio processing with improved resampling and profiling metrics Refactor the audio preparation workflow to utilize torchaudio for resampling, replacing librosa for better performance. Introduce a caching mechanism for resampling transforms and update the PrepareRefSemanticBatchWorker to include detailed timing metrics for profiling. Additionally, implement a new CPU limiter for managing resource allocation during audio processing. These changes improve the efficiency and maintainability of the TTS system.	2026-03-13 16:45:00 +08:00
baicai-1145	bc1f3f32de	Enhance audio processing in TTS framework with resampling and profiling improvements Add resampling capabilities using torchaudio to prepare reference audio at 16kHz, replacing librosa for better performance. Introduce a caching mechanism for resampling transforms to optimize resource usage. Update batch processing methods to include timing metrics for profiling, enhancing the ability to monitor and improve performance in the TTS system. This update improves the maintainability and efficiency of audio preparation workflows.	2026-03-13 02:03:25 +08:00
baicai-1145	17cb2e5acf	Implement G2PW processing enhancements in TTS framework Add support for G2PW processing in the TTS system by introducing new methods and classes for handling G2PW segments. Update PrepareCoordinator to manage G2PW worker threads and integrate G2PW profiling into the existing framework. Enhance text preprocessing to identify segments requiring G2PW and streamline the resolution of these segments. This update improves the overall performance and maintainability of the TTS system by optimizing the handling of Chinese text processing.	2026-03-12 23:04:39 +08:00
baicai-1145	5cf68a91d3	Add g2pw submodule and enhance TTS processing with AsyncStageGate Introduce a new submodule for g2pw and implement AsyncStageGate in PrepareCoordinator to manage concurrent task inflight limits. Update PrepareTextCpuWorker and PrepareRefSemanticBatchWorker to support asynchronous task submission and completion notifications. Enhance profiling capabilities in TTS to track g2pw processing times, improving overall performance and maintainability of the TTS system.	2026-03-12 23:03:33 +08:00
baicai-1145	6a822b28c3	Enhance TTS API with improved request handling and asynchronous processing Refactor api_v2.py and api_v3.py to update sampling parameters and weight paths for better clarity and support for v3/v4 vocoders. Introduce new methods in PrepareCoordinator for handling empty text features and improve profiling capabilities. Additionally, update unified engine components to streamline audio processing and state management, enhancing overall performance and maintainability of the TTS system.	2026-03-12 01:27:19 +08:00
baicai-1145	d453a8e47c	Add unified engine stage components for TTS processing orchestration Introduce new modules including EngineDecodeStageMixin, EngineDispatchStageMixin, EngineFinalizeStageMixin, EnginePrepareStageMixin, and EngineStageFutureMixin. These components enhance the TTS framework by providing structured methods for managing engine stages, including decoding, dispatching, finalizing, and preparing tasks. The new architecture supports improved state management and asynchronous operations, significantly enhancing the maintainability and performance of the TTS system.	2026-03-11 21:15:19 +08:00
baicai-1145	a3a5aad157	Add unified engine components for TTS processing and state management Introduce new modules including unified_engine_component_models, unified_engine_component_policy, unified_engine_component_registry, unified_engine_component_runtime, unified_engine_worker_completion, and unified_engine_worker_decode. These additions enhance the TTS framework by providing structured models for request handling, engine policies, and worker execution, significantly improving the architecture and maintainability of the system. The new components support asynchronous operations and optimize overall performance through better state management and processing capabilities.	2026-03-11 20:49:41 +08:00
baicai-1145	3fd4f48651	Add unified engine API modules for direct and scheduler-based TTS processing Introduce new modules including unified_engine_api_direct, unified_engine_api_profile, unified_engine_api_request, and unified_engine_api_scheduler. These additions enhance the TTS system by providing structured interfaces for direct TTS execution and scheduler-based processing. The new components support improved request handling, profiling, and state management, significantly enhancing the architecture and maintainability of the TTS framework.	2026-03-11 18:36:24 +08:00
baicai-1145	b046a093d3	Add unified engine delegates and orchestration components for enhanced TTS processing Introduce new modules including EngineApiDelegates, EngineBridgeDelegates, EngineRegistryBridgeFacade, EngineRuntimeBridgeFacade, EngineStageBridgeFacade, and EngineStageOrchestrator. These additions provide a structured approach to managing TTS requests, engine states, and orchestration, significantly improving the architecture and maintainability of the TTS system. The new components support asynchronous operations and enhance overall performance through better request handling and processing capabilities.	2026-03-11 18:35:47 +08:00
baicai-1145	800f01790e	Refactor EngineApiFacade and EngineApiDelegates for improved method naming and structure Rename several methods in EngineApiFacade to follow a consistent private naming convention, enhancing code clarity. Update EngineApiDelegates to remove redundant method definitions, streamlining the interface. Introduce EnginePublicInterface to encapsulate public API methods, improving organization and maintainability of the TTS system. Additionally, update the EngineCompositionBuilder to use the new scheduler worker state retrieval method.	2026-03-11 17:58:20 +08:00
baicai-1145	d1ec7d9e54	Add unified engine components and API for enhanced TTS processing Introduce multiple new modules including unified_engine_api, unified_engine_audio, unified_engine_bridge, unified_engine_builder, unified_engine_components, unified_engine_delegates, and unified_engine_runtime. These additions provide a comprehensive framework for managing TTS requests, audio packing, and engine state management, significantly improving the architecture and maintainability of the TTS system. The new structure supports asynchronous operations and enhances overall performance through better request handling and processing capabilities.	2026-03-11 08:32:56 +08:00
baicai-1145	06d6b67f73	Add PreparedCpuStage data class and refactor prepare_cpu_stage_profiled_async method in PrepareCoordinator for improved CPU profiling. Introduce prepare_gpu_stage_profiled_async method to streamline GPU stage preparation using the new data class, enhancing overall performance and maintainability.	2026-03-11 05:29:30 +08:00
baicai-1145	6a427b4f54	Update TTS API to support asynchronous execution by replacing synchronous TTS calls with asynchronous counterparts in both api_v2.py and api_v3.py. Introduce new data classes in unified_engine.py for enhanced request handling and state management, improving overall system performance and maintainability.	2026-03-10 21:25:14 +08:00
baicai-1145	69ac7f9027	Integrate UnifiedTTSEngine into TTS API for improved audio processing and control. Refactor tts_handle and control endpoints to utilize the new engine, enhancing error handling and response management. Update set_refer_audio and set_gpt_weights endpoints to return payloads from the engine, streamlining audio configuration processes.	2026-03-10 06:59:28 +08:00
baicai-1145	827d6ea47c	Refactor TTS and scheduler components to enhance text processing and batching capabilities. Introduce PrepareCoordinator for managing text feature preparation asynchronously, and update SchedulerDebugWorker to support new finalize task management. Implement batch processing in PrepareBertBatchWorker with improved admission control and profiling metrics. Add text CPU preprocessing utilities for better text segmentation and normalization.	2026-03-10 06:58:53 +08:00
baicai-1145	a45e171ff5	Enhance sampling functions in TTS by adding support for previous token masks in logits_to_probs. Implement batch processing for sampling with padded token sequences and contiguous sampling groups. Refactor sampling logic in T2S scheduler to utilize new functionalities, improving efficiency and flexibility in token generation.	2026-03-09 21:24:16 +08:00
baicai-1145	845b181360	Implement batch processing for BERT and reference semantic tasks in TTS. Introduce StageLimiter for managing concurrent processing and enhance the TTS class with new methods for handling audio and semantic extraction. Update profiling metrics for better performance tracking during inference.	2026-03-09 05:19:28 +08:00
baicai-1145	d245eb169c	Refactor T2S scheduler and inference handling to improve attention mask management and memory tracking. Update T2SRunningRequest and T2SActiveBatch classes to include optional key padding masks. Introduce new benchmarking tools for API performance and memory usage analysis, enhancing overall system efficiency.	2026-03-09 01:42:04 +08:00
baicai-1145	dc37b0b9ef	Add WebAPI documentation and implement TTS API with endpoints for text-to-speech inference, control commands, and model switching. Enhance TTS class with methods for extracting prompt semantics and reference audio specifications. Introduce a scheduler prototype for managing T2S requests.	2026-03-09 00:22:59 +08:00
baicai-1145	30a4557d8d	Implement last inference statistics tracking in Text2SemanticDecoder and enhance TTS class with prompt semantic extraction. This includes methods for setting and retrieving inference stats, as well as improvements to audio processing and feature extraction in TTS.	2026-03-08 23:08:27 +08:00
ChasonJiang	abe984395c	对齐gpt topk默认采样参数 (#2696 )	2025-12-19 16:05:36 +08:00
ChasonJiang	36b3231c6f	bug fix (#2689 )	2025-12-15 14:23:06 +08:00
ChasonJiang	92ab59c553	更细粒度的流式推理模式 (#2671 ) * 更好的流式推理模式 * 清理无用代码 * modified: GPT_SoVITS/AR/models/t2s_model.py modified: GPT_SoVITS/TTS_infer_pack/TTS.py modified: GPT_SoVITS/module/models.py * modified: GPT_SoVITS/TTS_infer_pack/TTS.py * modified: .gitignore modified: GPT_SoVITS/AR/models/t2s_model.py modified: GPT_SoVITS/TTS_infer_pack/TTS.py modified: GPT_SoVITS/module/models.py * modified: GPT_SoVITS/AR/models/t2s_model.py modified: GPT_SoVITS/TTS_infer_pack/TTS.py modified: GPT_SoVITS/module/models.py modified: api_v2.py * modified: GPT_SoVITS/TTS_infer_pack/TTS.py * 更正拼写错误 * 支持固定chunk长度的流式推理，优化sola算法 * 修复api_v2的ogg格式传输问题	2025-11-28 21:12:41 +08:00
ChasonJiang	b9211657d8	优化TTS_Config的代码逻辑 (#2536 ) * 优化TTS_Config的代码逻辑 * 在载入vits权重之后保存tts_config	2025-07-18 11:54:40 +08:00
RVC-Boss	4d8ebf8523	Update TTS.py	2025-07-16 15:43:26 +08:00
jiangsier-xyz	e476b01f30	解决 TTS.py 无法识别真正支持版本 v2Pro、v2ProPlus 的问题 (#2490 ) 同时更新一版默认配置。 Co-authored-by: jiangsier-xyz <jiangsier131@gmail.com>	2025-07-16 15:42:36 +08:00
KamioRinn	6df61f58e4	语言分割及格式化优化 (#2488 ) * better LangSegmenter * add version num2str * better version num2str * sync fast infer * sync api * remove duplicate spaces * remove unnecessary code --------- Co-authored-by: RVC-Boss <129054828+RVC-Boss@users.noreply.github.com>	2025-06-27 11:58:41 +08:00
XXXXRT666	6fdc67ca83	Fix bugs in `install.sh`, reduce log noise, and improve error reporting (#2464 ) * Update Install.sh * Format Code * Delete dev null * Update README, Support Dark Mode in CSS/JS	2025-06-17 15:21:36 +08:00
RVC-Boss	cd6de7398e	Merge pull request #2449 from KamioRinn/maga support v4 v2Pro v2ProPlus for api & optimize LangSegmenter	2025-06-11 10:29:39 +08:00
YYuX-1145	dd2b9253aa	Update TTS.py (#2450 )	2025-06-11 10:28:42 +08:00
KamioRinn	746cb536c6	Fix LangSegmenter	2025-06-10 19:18:05 +08:00
Jialiang Zhu	035dcbad03	Fix AttributeError when prompt_cache['refer_spec'][0] is a tuple (#2428 ) Co-authored-by: tzrain <tz_rain@foxmail.com>	2025-06-05 10:55:21 +08:00
RVC-Boss	584fcae9a5	support sovits v2Pro v2ProPlus support sovits v2Pro v2ProPlus	2025-06-04 15:25:52 +08:00
RVC-Boss	92819d0b31	support sovits v2Pro v2ProPlus support sovits v2Pro v2ProPlus	2025-06-04 15:19:20 +08:00
XXXXRT666	d5e479dad6	Introduce Docker and Windows CI Workflow, Pre-commit Formatting, and Language Resource Auto-Download (#2351 ) * Docker Auto-Build Workflow * Rename * Update * Fix Bugs * Disable Progress Bar When workflows triggered * Fix Wget * Fix Bugs * Fix Bugs * Update Wget * Update Workflows * Accelerate Docker Image Building * Fix Install.sh * Add Skip-Check For Action Runner * Fix Dockerfile * . * . * . * . * Delete File in Runner * Add Sort * Delete More Files * Delete More * . * . * . * Add Pre-Commit Hook Update Docker * Add Code Spell Check * [pre-commit.ci] trigger * [pre-commit.ci] trigger * [pre-commit.ci] trigger * Fix Bugs * . * Disable Progress Bar and Logs while using GitHub Actions * . * . * Fix Bugs * update conda * fix bugs * Fix Bugs * fix bugs * . * . * Quiet Installation * fix bugs * . * fix bug * . * Fix pre-commit.ci and Docker * fix bugs * . * Update Docker & Pre-Commit * fix bugs * Update Req * Update Req * Update OpenCC * update precommit * . * Update .pre-commit-config.yaml * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update Docs and fix bugs * Fix \ * Fix MacOS * . * test * . * Add Tag Alias * . * fix bugs * fix bugs * make image smaller * update pre-commit config * . * . * fix bugs * use miniconda * Fix Wrong Path * . * debug * debug * revert * Fix Bugs * Update Docs, Add Dict Auto Download in install.sh * update docker_build * Update Docs for Install.sh * update docker docs about architecture * Add Xcode-Commandline-Tool Installation * Update Docs 1. Add Missing VC17 2. Modufied the Order of FFmpeg Installation and Requirements Installation 3. Remove Duplicate FFmpeg * Fix Wrong Cuda Version * Update TESTED ENV * Add PYTHONNOUSERSITE(-s) * Fix Wrapper * Update install.sh For Robustness * Ignore .git * Preload CUDNN For Ctranslate2 * Remove Gradio Warnings * Update Colab * Fix OpenCC Problems * Update Win DLL Strategy * Fix Onnxruntime-gpu NVRTC Error * Fix Path Problems * Add Windows Packages Workflow * WIP * WIP * WIP * WIP * WIP * WIP * . * WIP * WIP * WIP * WIP * WIP * WIP * WIP * WIP * WIP * WIP * WIP * WIP * WIP * WIP * WIP * WIP * WIP * WIP * WIP * WIP * WIP * WIP * WIP * WIP * WIP * WIP * WIP * WIP * WIP * WIP * WIP * WIP * WIP * WIP * WIP * WIP * WIP * Fix Path * Fix Path * Enable Logging * Set 7-Zip compression level to maximum (-mx=9) * Use Multithread in ONNX Session * Fix Tag Bugs * Add Time * Add Time * Add Time * Compress More * Copy DLL to Solve VC Runtime DLL Missing Issues * Expose FFmpeg Errors, Copy Only Part of Visual C++ Runtime * Update build_windows_packages.ps1 * Update build_windows_packages.ps1 * Update build_windows_packages.ps1 * Update build_windows_packages.ps1 * WIP * WIP * WIP * Update build_windows_packages.ps1 * Update install.sh * Update build_windows_packages.ps1 * Update docker-publish.yaml * Update install.sh * Update Dockerfile * Update docker_build.sh * Update miniconda_install.sh * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update Colab-WebUI.ipynb * Update Colab-Inference.ipynb * Update docker-compose.yaml * 更新 build_windows_packages.ps1 * Update install.sh --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2025-05-26 10:45:14 +08:00
ChasonJiang	a19f49604f	修复v3传参 (#2309 )	2025-04-22 10:10:44 +08:00
RVC-Boss	590c83d766	修复v3推理传参问题	2025-04-22 00:20:33 +08:00
ChasonJiang	e0f2818df7	为并行推理版本适配v4 (#2307 ) * 适配v4版本 * 适配v4版本 * modified: GPT_SoVITS/inference_webui_fast.py * 合并main分支 * fallback config * modified: GPT_SoVITS/TTS_infer_pack/TTS.py * fix bug * modified: GPT_SoVITS/TTS_infer_pack/TTS.py * modified: GPT_SoVITS/inference_webui_fast.py	2025-04-21 23:20:20 +08:00
RVC-Boss	839ff9ce5b	适配v4并行推理（还没写完）	2025-04-21 22:43:46 +08:00
XXXXRT666	53cac93589	Refactor: Format Code with Ruff and Update Deprecated G2PW Link (#2255 ) * ruff check --fix * ruff format --line-length 120 --target-version py39 * Change the link for G2PW Model * update pytorch version and colab	2025-04-07 16:42:47 +08:00
RVC-Boss	7abae557fb	删除加载v3sovits模型缺少enc_q告警删除加载v3sovits模型缺少enc_q告警	2025-04-01 16:31:15 +08:00
ChasonJiang	03b662a769	为sovits_v3 适配并行推理 (#2241 ) * 为sovits_v3 适配并行推理 * 清理无用代码	2025-03-31 11:56:05 +08:00
lishq	fef65d40fe	fix: prevent concurrent access to BERT model with thread lock (#2165 ) Added thread lock to protect get_phones_and_bert method from potential race conditions during concurrent access. This addresses issue #1844 where multiple threads accessing the BERT model simultaneously could cause data inconsistency or crashes. Co-authored-by: RVC-Boss <129054828+RVC-Boss@users.noreply.github.com>	2025-03-26 15:03:36 +08:00
ChasonJiang	7394dc7b0c	为api_v2和inference_webui_fast适配V3版本 (#2188 ) * modified: GPT_SoVITS/TTS_infer_pack/TTS.py modified: GPT_SoVITS/TTS_infer_pack/TextPreprocessor.py modified: GPT_SoVITS/inference_webui_fast.py * 适配V3版本 * api_v2.py和inference_webui_fast.py的v3适配 * 修改了个远古bug,增加了更友好的提示信息 * 优化webui * 修改为正确的path * 修复v3 lora模型的载入问题 * 修复读取tts_infer.yaml文件时遇到的编码不匹配的问题	2025-03-26 14:34:51 +08:00
ChasonJiang	165882d64f	修复多余的注释导致的bug (#2158 )	2025-03-05 18:22:01 +08:00
ChasonJiang	053a356ffe	修复gpt的padding mask的问题 (#2153 ) * 修复gpt的padding mask的问题 * rollback tts_config	2025-03-05 17:14:43 +08:00
ChasonJiang	6dd2f72090	更改gpt并行推理时的mask策略为padding left (#2144 ) * 更改gpt并行推理时的mask策略为padding left，使batch_infer更接近于naive_infer 减少冗余操作并使用torch_sdpa，以提升推理速度 * rollback tts_infer.yaml	2025-03-04 16:45:37 +08:00
RVC-Boss	a68e3c4354	Update TTS.py	2025-02-27 22:14:51 +08:00
KamioRinn	c17dd642c7	Add en_normalization and fix LangSegmenter (#2062 )	2025-02-17 18:41:30 +08:00
StaryLan	15cbd1b673	Update Documentation (#2032 ) * add exception handling * Fill in the missing content * remove GB code * Simplify i18n text and remove trailing spaces * ignore v3 model dir * Update Changelog * Fix encoding	2025-02-12 11:27:35 +08:00

1 2

54 Commits