12 Commits

Author SHA1 Message Date
lsh
551d3dc281 Fix s1_train DDP crash on Windows single-GPU (sm_120 / Blackwell)
On Windows with a single GPU running CUDA 12.8 + PyTorch 2.7+ on Blackwell
(sm_120) hardware, s1_train.py crashes with an access violation (exit code
3221225477) shortly after pytorch_lightning's Trainer initialization, before
the first batch runs.

Root cause: DDPStrategy with the gloo backend is forced on Windows even
when there's only one GPU. The gloo + sm_120 + CUDA 12.8 combination has a
known incompatibility (see PyTorch forum "[Solved] RTX 5090 sm_120 Training
Segfault - DDP Was the Cause") that produces a native crash inside the
Lightning training loop.

Two changes, scoped to Windows + CUDA only:

  * GPT_SoVITS/s1_train.py: on Windows, use Lightning's "auto" strategy,
    which picks `single_device` for one GPU and skips DDP entirely. Also
    pin devices=1 on Windows so multi-GPU users don't accidentally enable
    DDP. Non-Windows behaviour is unchanged (NCCL DDP, all available GPUs).
  * GPT_SoVITS/AR/data/bucket_sampler.py: when the distributed process
    group isn't initialized (i.e. running under single_device strategy),
    fall back to a single-replica configuration instead of crashing in
    dist.get_world_size(). Defensive change — behaviour is unchanged when
    DDP is properly initialized.

Tested on:
  * Windows 11 + RTX 5090 (sm_120) + CUDA 12.8 + PyTorch 2.11+cu128
    15-epoch s1 training completes cleanly, weights saved as expected.

Closes #2626.
2026-05-16 19:10:20 -07:00
Mushroomcowisheggs
00ce973412
feat: 添加数据集的错误处理提示 (#2758)
Co-authored-by: moomushroom <107208254+moomushroom@users.noreply.github.com>
2026-04-18 17:13:30 +08:00
XXXXRT666
53cac93589
Refactor: Format Code with Ruff and Update Deprecated G2PW Link (#2255)
* ruff check --fix

* ruff format --line-length 120 --target-version py39

* Change the link for G2PW Model

* update pytorch version and colab
2025-04-07 16:42:47 +08:00
XXXXRT666
a3f5fb9614
v1v2 Version Switching (#1391)
v1v2 Version Switching
2024-08-06 12:00:51 +08:00
RVC-Boss
99f09c8bdc
微调未读取bert文件重大bug修复 2024-06-06 16:39:36 +08:00
root
4496426896 修改代码引用,淡定 2024-02-28 17:31:19 +08:00
RVC-Boss
9b5231a317
dpo改实验性勾选而非必须。勾选后batch size自动减半。
dpo改实验性勾选而非必须。勾选后batch size自动减半。
2024-02-15 20:17:33 +08:00
RVC-Boss
ecb4b23fc3
Update data_module.py 2024-01-28 19:19:57 +08:00
Wu Zichen
07a5339691 mps support 2024-01-24 19:37:47 +08:00
spicysama
cc632b985d
Update dataset.py
pandas csv file doesn't have keys called "item_name", "sematic_text",update a method "iloc". which is more accurate.
2024-01-17 19:43:32 +08:00
Blaise
0d3d47f3c3 more code refactor 2024-01-16 17:14:18 +01:00
RVC-Boss
41ca6028d6
Add files via upload 2024-01-16 17:38:48 +08:00