1085 Commits

Author SHA1 Message Date
zpeng11
32d650b709
Merge 8858492f56326dce521db2c4a4b3a7323e786596 into 08d627c3338173c3229286d8787060d6559fe0f8 2026-04-30 16:52:30 +08:00
RVC-Boss
08d627c333
增加cuda graph支持,普通推理模式推理速度原地翻倍,效果不变。2
增加cuda graph支持,普通推理模式推理速度原地翻倍,效果不变。2
2026-04-30 15:01:45 +08:00
RVC-Boss
6d95b559e8
增加cuda graph支持,普通推理模式推理速度原地翻倍,效果不变。1
增加cuda graph支持,普通推理模式推理速度原地翻倍,效果不变。1
2026-04-30 15:01:11 +08:00
RVC-Boss
ea2d2a8166
Update README.md 2026-04-19 21:02:57 +08:00
SapphireLab
d9f03dad3e
Update Documentation (#2768)
* 调整日志格式

* docs: Update other languages' changelogs
2026-04-18 22:33:55 +08:00
RVC-Boss
647935357a
Update Changelog_CN.md 2026-04-18 19:01:11 +08:00
RVC-Boss
02425ea256
Fixed issues such as missing imports for types like Optional.
Fixed issues such as missing imports for types like `Optional`.
2026-04-18 17:33:53 +08:00
Harikrishna KP
938f05fce8
fix: correct torch.randint upper bound to include both values (#2733) 2026-04-18 17:19:55 +08:00
huang yutong
445d18ccce
fix: 修复 TTS 音频后处理中的多个缺陷 (#2753)
1. 修复音频超采样时 int16 双重转换导致整数溢出(CRITICAL)
   - audio_postprocess 中 `audio = (audio * 32768).astype(np.int16)` 位于
     if/else 块之外无条件执行,当 super_sampling=True 时音频已在分支内
     转为 int16,再次乘以 32768 导致溢出和音频完全失真
   - 同时修复 super_sampling=True 但超分模型不存在时 torch.Tensor 调用
     .astype() 的 AttributeError

2. 修复 batched vocoder 推理中 padding_len=0 导致音频丢失(HIGH)
   - 当 padding_len 恰好为 0 时,`-0 * upsample_rate == 0`,切片
     `audio[x:0]` 返回空张量,导致整段音频丢失

3. 修复文件不存在时错误地抛出 FileExistsError(LOW)
   - 应为 FileNotFoundError

Made-with: Cursor
2026-04-18 17:16:24 +08:00
Mushroomcowisheggs
00ce973412
feat: 添加数据集的错误处理提示 (#2758)
Co-authored-by: moomushroom <107208254+moomushroom@users.noreply.github.com>
2026-04-18 17:13:30 +08:00
huang yutong
14191901cd
fix: 修复多个模块中的独立 bug (#2755)
1. 修复 sync_buffer 中除以函数对象而非调用结果(distrib.py)
   - `buffer.data /= world_size` 中 world_size 是函数,缺少 (),
     导致 TypeError 使分布式训练 buffer 同步失败

2. 修复 istft 函数缺少 return 语句(spec_utils.py)
   - 函数计算了结果但未返回,调用者始终得到 None

3. 修复 cut0 返回字面量 "/n" 而非换行符 "\n"(text_segmentation_method.py)
   - 导致后续 text.split("\n") 无法正确切分,字面 /n 被当作文本内容

4. 修复粤语 ASR 的 vad/punc model_revision 被无条件覆盖(funasr_asr.py)
   - 粤语分支将 vad_model_revision 设为空(因不使用 VAD/标点模型),
     但 if/else 外的赋值将其覆盖为 "v2.0.4",传入错误的 revision 参数

Made-with: Cursor
2026-04-18 17:10:56 +08:00
东云
780383d5bd
[codex] Improve Windows single-GPU v3 LoRA training / 改进 Windows 单卡 v3 LoRA 训练流程 (#2767)
* Improve Windows single-GPU v3 LoRA training

* Drop unrelated checkpoint helper change from PR

* Tighten PR scope to single-GPU training path fixes
2026-04-18 16:54:26 +08:00
白菜工厂1145号员工
ba8de9b760
优化 G2PW 的推理输入构造与多音字处理流程,减少重复计算,降低长句场景下的推理开销 (#2763)
* Enhance G2P processing by implementing batch input handling in _g2p function, improving efficiency. Update prepare_onnx_input to utilize caching for tokenization and add optional parameters for character ID mapping and phoneme masks. Refactor G2PWOnnxConverter to streamline model loading and configuration management.

* Enhance G2PW model input handling by introducing polyphonic context character support and updating the data preparation method to return additional query IDs. This improves the processing of polyphonic characters in sentences.
2026-04-18 16:52:32 +08:00
XXXXRT666
2d9193b0d3
Migrate to miniforge, add missing dependencies, update docker file, remove deprecated files (#2732)
* Migrate to miniforge, add missing dependencies, update docker file, remove deprecated files

* Add Env Vars and Secrets
2026-02-09 15:05:25 +08:00
Oarora
9986880b3f
fix Conda 条款未同意导致的构建失败 (#2727) 2026-02-08 23:52:04 +08:00
ChasonJiang
c767f0b83b
修复bug (#2704)
* 修复bug

* fallbak and bug fix
2025-12-30 16:00:21 +08:00
ChasonJiang
9080a967d5
修复采样错误 (#2703) 2025-12-30 15:21:03 +08:00
sushistack
51df9f7384
Fix model file name in README instructions (#2700) 2025-12-25 16:44:21 +08:00
ChasonJiang
bfca0f6b2d
对齐naive_infer的解码策略,防止吞句 (#2697) 2025-12-19 17:37:19 +08:00
ChasonJiang
abe984395c
对齐gpt topk默认采样参数 (#2696) 2025-12-19 16:05:36 +08:00
RVC-Boss
cc89c3660e
Update requirements.txt 2025-12-19 15:54:54 +08:00
ChasonJiang
36b3231c6f
bug fix (#2689) 2025-12-15 14:23:06 +08:00
RVC-Boss
9ec3a60f30
Update config.py 2025-12-01 20:23:49 +08:00
RVC-Boss
fc533b6fb7
Update fasterwhisper_asr.py 2025-12-01 11:38:37 +08:00
XXXXRT666
857799276c
Fix Modelscope (#2679) 2025-12-01 11:13:15 +08:00
Spr_Aachen
92d2d337fd
Fix training error caused by float type of default_batch_size parameter (#2662) 2025-11-28 22:53:43 +08:00
ChasonJiang
6fb441f65e
更友好的流模式选项 (#2678) 2025-11-28 22:13:48 +08:00
XXXXRT666
c85c54eca9
Add ModelScope Snapshot Download For ASR (#2627)
* Add ModelScope Snapshot Download For ASR

* Typo Fix

* Remove YUE in whisper

* Remove HF ENDPOINT

* Add FunASR Download
2025-11-28 22:10:49 +08:00
RVC-Boss
cb00840c4e
Add files via upload 2025-11-28 22:02:03 +08:00
wzy3650
60a4a214af
vq distributed training support (#2577)
Co-authored-by: wangzeyuan <wangzeyuan@agora.io>
2025-11-28 21:57:13 +08:00
zzz
6375bbe316
尝试 stream infer (#2469)
* 尝试 stream infer

* 在 stream_infer 脚本中绘制生成的音频

* stream_infer 增加导出部分。

* stream_infer: 更方便找规律的图

* stream_infer: 在拼接音频时进行相关性搜索,减少拼接带来基频断裂的情况

* stream_infer: 导出 `find_best_audio_offset_fast`

* stream_infer: 优化波形显示,方便对比

* stream_v2pro.py 从命令行读取参数

* stream_v2pro.py 减少用于导出的文本长度

* stream_v2pro: 修复由于 spectrogram_torch 输入是 half 导致 spec 溢出最终没有声音的问题

* stream_v2pro: 新增 --lang 参数提示参考文字的语言类型
2025-11-28 21:36:57 +08:00
KamioRinn
e00ca92140
Fix ASMD (#2636) 2025-11-28 21:22:43 +08:00
ChasonJiang
92ab59c553
更细粒度的流式推理模式 (#2671)
* 更好的流式推理模式

* 清理无用代码

* modified:   GPT_SoVITS/AR/models/t2s_model.py
	modified:   GPT_SoVITS/TTS_infer_pack/TTS.py
	modified:   GPT_SoVITS/module/models.py

* modified:   GPT_SoVITS/TTS_infer_pack/TTS.py

* modified:   .gitignore
	modified:   GPT_SoVITS/AR/models/t2s_model.py
	modified:   GPT_SoVITS/TTS_infer_pack/TTS.py
	modified:   GPT_SoVITS/module/models.py

* modified:   GPT_SoVITS/AR/models/t2s_model.py
	modified:   GPT_SoVITS/TTS_infer_pack/TTS.py
	modified:   GPT_SoVITS/module/models.py
	modified:   api_v2.py

* modified:   GPT_SoVITS/TTS_infer_pack/TTS.py

* 更正拼写错误

* 支持固定chunk长度的流式推理,优化sola算法

* 修复api_v2的ogg格式传输问题
2025-11-28 21:12:41 +08:00
RVC-Boss
11aa78bd9b
修复环境变量可能不为str的问题
修复环境变量可能不为str的问题
2025-09-10 15:01:04 +08:00
zpeng11
8858492f56 feat:enable speed control for v1v2 2025-08-31 00:14:07 -04:00
zpeng11
337da7454e feat:remove prints 2025-08-26 17:09:36 -04:00
zpeng11
3e63595f0e feat:update kv cache to [len, head, dim] to allow linear size increasement 2025-08-26 17:03:29 -04:00
zpeng11
fa84e262ae feat:remove unneed for main 2025-08-25 22:53:15 -04:00
zpeng11
968ac4c264 feat: solved problem, export works 2025-08-25 22:37:52 -04:00
zpeng11
419909b443 failed , testing expand y 2025-08-25 21:57:36 -04:00
zpeng11
c85ee3d521 feat:successfully unified first step and following step 2025-08-25 17:57:04 -04:00
zpeng11
d413a4f5b1 run time working 2025-08-25 17:07:38 -04:00
zpeng11
26228402e3 feat:solve unified kv cache shape handling, todo: clean up upper level to unify first and following step 2025-08-25 12:06:26 -04:00
zpeng11
0c5f61f98c feat:rename and features to onnx export 2025-08-25 01:46:53 -04:00
zpeng11
633e478b24 feat:clean up playground 2025-08-24 02:37:34 -04:00
zpeng11
942caa888e feat:supporting half export 2025-08-24 02:29:33 -04:00
zpeng11
72c5d3224e utility updates 2025-08-24 02:11:47 -04:00
zpeng11
48d52778ce feat:clean up export logics and add notes 2025-08-24 02:00:32 -04:00
zpeng11
e4d1894a8f feat:experiments with for onnx with attention, but does not work well todo:clean code and try v3v4 2025-08-24 00:46:29 -04:00
zpeng11
5982080939 feat:updated fsdecode and decoder interface 2025-08-23 17:35:21 -04:00