GPT-SoVITS

mirror of https://github.com/RVC-Boss/GPT-SoVITS.git synced 2026-05-12 04:48:10 +08:00

Author	SHA1	Message	Date
baicai-1145	c94de2f2cb	Enhance TTS audio processing with improved resampling and profiling metrics Refactor the audio preparation workflow to utilize torchaudio for resampling, replacing librosa for better performance. Introduce a caching mechanism for resampling transforms and update the PrepareRefSemanticBatchWorker to include detailed timing metrics for profiling. Additionally, implement a new CPU limiter for managing resource allocation during audio processing. These changes improve the efficiency and maintainability of the TTS system.	2026-03-13 16:45:00 +08:00
baicai-1145	bc1f3f32de	Enhance audio processing in TTS framework with resampling and profiling improvements Add resampling capabilities using torchaudio to prepare reference audio at 16kHz, replacing librosa for better performance. Introduce a caching mechanism for resampling transforms to optimize resource usage. Update batch processing methods to include timing metrics for profiling, enhancing the ability to monitor and improve performance in the TTS system. This update improves the maintainability and efficiency of audio preparation workflows.	2026-03-13 02:03:25 +08:00
baicai-1145	17cb2e5acf	Implement G2PW processing enhancements in TTS framework Add support for G2PW processing in the TTS system by introducing new methods and classes for handling G2PW segments. Update PrepareCoordinator to manage G2PW worker threads and integrate G2PW profiling into the existing framework. Enhance text preprocessing to identify segments requiring G2PW and streamline the resolution of these segments. This update improves the overall performance and maintainability of the TTS system by optimizing the handling of Chinese text processing.	2026-03-12 23:04:39 +08:00
baicai-1145	5cf68a91d3	Add g2pw submodule and enhance TTS processing with AsyncStageGate Introduce a new submodule for g2pw and implement AsyncStageGate in PrepareCoordinator to manage concurrent task inflight limits. Update PrepareTextCpuWorker and PrepareRefSemanticBatchWorker to support asynchronous task submission and completion notifications. Enhance profiling capabilities in TTS to track g2pw processing times, improving overall performance and maintainability of the TTS system.	2026-03-12 23:03:33 +08:00
baicai-1145	6a822b28c3	Enhance TTS API with improved request handling and asynchronous processing Refactor api_v2.py and api_v3.py to update sampling parameters and weight paths for better clarity and support for v3/v4 vocoders. Introduce new methods in PrepareCoordinator for handling empty text features and improve profiling capabilities. Additionally, update unified engine components to streamline audio processing and state management, enhancing overall performance and maintainability of the TTS system.	2026-03-12 01:27:19 +08:00
baicai-1145	d453a8e47c	Add unified engine stage components for TTS processing orchestration Introduce new modules including EngineDecodeStageMixin, EngineDispatchStageMixin, EngineFinalizeStageMixin, EnginePrepareStageMixin, and EngineStageFutureMixin. These components enhance the TTS framework by providing structured methods for managing engine stages, including decoding, dispatching, finalizing, and preparing tasks. The new architecture supports improved state management and asynchronous operations, significantly enhancing the maintainability and performance of the TTS system.	2026-03-11 21:15:19 +08:00
baicai-1145	a3a5aad157	Add unified engine components for TTS processing and state management Introduce new modules including unified_engine_component_models, unified_engine_component_policy, unified_engine_component_registry, unified_engine_component_runtime, unified_engine_worker_completion, and unified_engine_worker_decode. These additions enhance the TTS framework by providing structured models for request handling, engine policies, and worker execution, significantly improving the architecture and maintainability of the system. The new components support asynchronous operations and optimize overall performance through better state management and processing capabilities.	2026-03-11 20:49:41 +08:00
baicai-1145	3fd4f48651	Add unified engine API modules for direct and scheduler-based TTS processing Introduce new modules including unified_engine_api_direct, unified_engine_api_profile, unified_engine_api_request, and unified_engine_api_scheduler. These additions enhance the TTS system by providing structured interfaces for direct TTS execution and scheduler-based processing. The new components support improved request handling, profiling, and state management, significantly enhancing the architecture and maintainability of the TTS framework.	2026-03-11 18:36:24 +08:00
baicai-1145	b046a093d3	Add unified engine delegates and orchestration components for enhanced TTS processing Introduce new modules including EngineApiDelegates, EngineBridgeDelegates, EngineRegistryBridgeFacade, EngineRuntimeBridgeFacade, EngineStageBridgeFacade, and EngineStageOrchestrator. These additions provide a structured approach to managing TTS requests, engine states, and orchestration, significantly improving the architecture and maintainability of the TTS system. The new components support asynchronous operations and enhance overall performance through better request handling and processing capabilities.	2026-03-11 18:35:47 +08:00
baicai-1145	800f01790e	Refactor EngineApiFacade and EngineApiDelegates for improved method naming and structure Rename several methods in EngineApiFacade to follow a consistent private naming convention, enhancing code clarity. Update EngineApiDelegates to remove redundant method definitions, streamlining the interface. Introduce EnginePublicInterface to encapsulate public API methods, improving organization and maintainability of the TTS system. Additionally, update the EngineCompositionBuilder to use the new scheduler worker state retrieval method.	2026-03-11 17:58:20 +08:00
baicai-1145	d1ec7d9e54	Add unified engine components and API for enhanced TTS processing Introduce multiple new modules including unified_engine_api, unified_engine_audio, unified_engine_bridge, unified_engine_builder, unified_engine_components, unified_engine_delegates, and unified_engine_runtime. These additions provide a comprehensive framework for managing TTS requests, audio packing, and engine state management, significantly improving the architecture and maintainability of the TTS system. The new structure supports asynchronous operations and enhances overall performance through better request handling and processing capabilities.	2026-03-11 08:32:56 +08:00
baicai-1145	06d6b67f73	Add PreparedCpuStage data class and refactor prepare_cpu_stage_profiled_async method in PrepareCoordinator for improved CPU profiling. Introduce prepare_gpu_stage_profiled_async method to streamline GPU stage preparation using the new data class, enhancing overall performance and maintainability.	2026-03-11 05:29:30 +08:00
baicai-1145	6a427b4f54	Update TTS API to support asynchronous execution by replacing synchronous TTS calls with asynchronous counterparts in both api_v2.py and api_v3.py. Introduce new data classes in unified_engine.py for enhanced request handling and state management, improving overall system performance and maintainability.	2026-03-10 21:25:14 +08:00
baicai-1145	d1a97fd04d	Refactor TTS API to streamline audio processing by removing unused packing functions and optimizing the tts_handle method for asynchronous execution. Update type hints and clean up imports for improved code clarity and maintainability.	2026-03-10 20:46:14 +08:00
baicai-1145	69ac7f9027	Integrate UnifiedTTSEngine into TTS API for improved audio processing and control. Refactor tts_handle and control endpoints to utilize the new engine, enhancing error handling and response management. Update set_refer_audio and set_gpt_weights endpoints to return payloads from the engine, streamlining audio configuration processes.	2026-03-10 06:59:28 +08:00
baicai-1145	827d6ea47c	Refactor TTS and scheduler components to enhance text processing and batching capabilities. Introduce PrepareCoordinator for managing text feature preparation asynchronously, and update SchedulerDebugWorker to support new finalize task management. Implement batch processing in PrepareBertBatchWorker with improved admission control and profiling metrics. Add text CPU preprocessing utilities for better text segmentation and normalization.	2026-03-10 06:58:53 +08:00
baicai-1145	a45e171ff5	Enhance sampling functions in TTS by adding support for previous token masks in logits_to_probs. Implement batch processing for sampling with padded token sequences and contiguous sampling groups. Refactor sampling logic in T2S scheduler to utilize new functionalities, improving efficiency and flexibility in token generation.	2026-03-09 21:24:16 +08:00
baicai-1145	845b181360	Implement batch processing for BERT and reference semantic tasks in TTS. Introduce StageLimiter for managing concurrent processing and enhance the TTS class with new methods for handling audio and semantic extraction. Update profiling metrics for better performance tracking during inference.	2026-03-09 05:19:28 +08:00
baicai-1145	d245eb169c	Refactor T2S scheduler and inference handling to improve attention mask management and memory tracking. Update T2SRunningRequest and T2SActiveBatch classes to include optional key padding masks. Introduce new benchmarking tools for API performance and memory usage analysis, enhancing overall system efficiency.	2026-03-09 01:42:04 +08:00
baicai-1145	dc37b0b9ef	Add WebAPI documentation and implement TTS API with endpoints for text-to-speech inference, control commands, and model switching. Enhance TTS class with methods for extracting prompt semantics and reference audio specifications. Introduce a scheduler prototype for managing T2S requests.	2026-03-09 00:22:59 +08:00
baicai-1145	30a4557d8d	Implement last inference statistics tracking in Text2SemanticDecoder and enhance TTS class with prompt semantic extraction. This includes methods for setting and retrieving inference stats, as well as improvements to audio processing and feature extraction in TTS.	2026-03-08 23:08:27 +08:00
baicai-1145	b250e62402	Enhance G2PW model input handling by introducing polyphonic context character support and updating the data preparation method to return additional query IDs. This improves the processing of polyphonic characters in sentences.	2026-03-08 03:01:20 +08:00
baicai-1145	800acd45ff	Enhance G2P processing by implementing batch input handling in _g2p function, improving efficiency. Update prepare_onnx_input to utilize caching for tokenization and add optional parameters for character ID mapping and phoneme masks. Refactor G2PWOnnxConverter to streamline model loading and configuration management.	2026-03-07 05:47:22 +08:00
XXXXRT666	2d9193b0d3	Migrate to miniforge, add missing dependencies, update docker file, remove deprecated files (#2732 ) * Migrate to miniforge, add missing dependencies, update docker file, remove deprecated files * Add Env Vars and Secrets	2026-02-09 15:05:25 +08:00
Oarora	9986880b3f	fix Conda 条款未同意导致的构建失败 (#2727 )	2026-02-08 23:52:04 +08:00
ChasonJiang	c767f0b83b	修复bug (#2704 ) * 修复bug * fallbak and bug fix	2025-12-30 16:00:21 +08:00
ChasonJiang	9080a967d5	修复采样错误 (#2703 )	2025-12-30 15:21:03 +08:00
sushistack	51df9f7384	Fix model file name in README instructions (#2700 )	2025-12-25 16:44:21 +08:00
ChasonJiang	bfca0f6b2d	对齐naive_infer的解码策略，防止吞句 (#2697 )	2025-12-19 17:37:19 +08:00
ChasonJiang	abe984395c	对齐gpt topk默认采样参数 (#2696 )	2025-12-19 16:05:36 +08:00
RVC-Boss	cc89c3660e	Update requirements.txt	2025-12-19 15:54:54 +08:00
ChasonJiang	36b3231c6f	bug fix (#2689 )	2025-12-15 14:23:06 +08:00
RVC-Boss	9ec3a60f30	Update config.py	2025-12-01 20:23:49 +08:00
RVC-Boss	fc533b6fb7	Update fasterwhisper_asr.py	2025-12-01 11:38:37 +08:00
XXXXRT666	857799276c	Fix Modelscope (#2679 )	2025-12-01 11:13:15 +08:00
Spr_Aachen	92d2d337fd	Fix training error caused by float type of default_batch_size parameter (#2662 )	2025-11-28 22:53:43 +08:00
ChasonJiang	6fb441f65e	更友好的流模式选项 (#2678 )	2025-11-28 22:13:48 +08:00
XXXXRT666	c85c54eca9	Add ModelScope Snapshot Download For ASR (#2627 ) * Add ModelScope Snapshot Download For ASR * Typo Fix * Remove YUE in whisper * Remove HF ENDPOINT * Add FunASR Download	2025-11-28 22:10:49 +08:00
RVC-Boss	cb00840c4e	Add files via upload	2025-11-28 22:02:03 +08:00
wzy3650	60a4a214af	vq distributed training support (#2577 ) Co-authored-by: wangzeyuan <wangzeyuan@agora.io>	2025-11-28 21:57:13 +08:00
zzz	6375bbe316	尝试 stream infer (#2469 ) * 尝试 stream infer * 在 stream_infer 脚本中绘制生成的音频 * stream_infer 增加导出部分。 * stream_infer: 更方便找规律的图 * stream_infer: 在拼接音频时进行相关性搜索，减少拼接带来基频断裂的情况 * stream_infer: 导出 `find_best_audio_offset_fast` * stream_infer: 优化波形显示，方便对比 * stream_v2pro.py 从命令行读取参数 * stream_v2pro.py 减少用于导出的文本长度 * stream_v2pro: 修复由于 spectrogram_torch 输入是 half 导致 spec 溢出最终没有声音的问题 * stream_v2pro: 新增 --lang 参数提示参考文字的语言类型	2025-11-28 21:36:57 +08:00
KamioRinn	e00ca92140	Fix ASMD (#2636 )	2025-11-28 21:22:43 +08:00
ChasonJiang	92ab59c553	更细粒度的流式推理模式 (#2671 ) * 更好的流式推理模式 * 清理无用代码 * modified: GPT_SoVITS/AR/models/t2s_model.py modified: GPT_SoVITS/TTS_infer_pack/TTS.py modified: GPT_SoVITS/module/models.py * modified: GPT_SoVITS/TTS_infer_pack/TTS.py * modified: .gitignore modified: GPT_SoVITS/AR/models/t2s_model.py modified: GPT_SoVITS/TTS_infer_pack/TTS.py modified: GPT_SoVITS/module/models.py * modified: GPT_SoVITS/AR/models/t2s_model.py modified: GPT_SoVITS/TTS_infer_pack/TTS.py modified: GPT_SoVITS/module/models.py modified: api_v2.py * modified: GPT_SoVITS/TTS_infer_pack/TTS.py * 更正拼写错误 * 支持固定chunk长度的流式推理，优化sola算法 * 修复api_v2的ogg格式传输问题	2025-11-28 21:12:41 +08:00
RVC-Boss	11aa78bd9b	修复环境变量可能不为str的问题修复环境变量可能不为str的问题	2025-09-10 15:01:04 +08:00
XXXXRT666	fdf794e31d	Update WSL Rocm (#2561 )	2025-08-02 17:47:15 +08:00
多玩幻灵qwq	0be59c8043	fix: 更正链接 (#2539 )	2025-07-19 00:29:48 +08:00
ChasonJiang	b5a67e6247	修复gpt的loss计算问题 (#2537 ) * 修复gpt的loss计算问题 * fallback tts config	2025-07-18 14:59:59 +08:00
ChasonJiang	b9211657d8	优化TTS_Config的代码逻辑 (#2536 ) * 优化TTS_Config的代码逻辑 * 在载入vits权重之后保存tts_config	2025-07-18 11:54:40 +08:00
XXXXRT666	cefafee32c	Add Distil (#2531 )	2025-07-17 20:28:25 +08:00
RVC-Boss	2d09bbe63a	Update tts_infer.yaml	2025-07-16 15:44:04 +08:00

1 2 3 4 5 ...

1055 Commits