GPT-SoVITS

mirror of https://github.com/RVC-Boss/GPT-SoVITS.git synced 2026-06-04 05:01:27 +08:00

History

lsh 551d3dc281 Fix s1_train DDP crash on Windows single-GPU (sm_120 / Blackwell)

On Windows with a single GPU running CUDA 12.8 + PyTorch 2.7+ on Blackwell
(sm_120) hardware, s1_train.py crashes with an access violation (exit code
3221225477) shortly after pytorch_lightning's Trainer initialization, before
the first batch runs.

Root cause: DDPStrategy with the gloo backend is forced on Windows even
when there's only one GPU. The gloo + sm_120 + CUDA 12.8 combination has a
known incompatibility (see PyTorch forum "[Solved] RTX 5090 sm_120 Training
Segfault - DDP Was the Cause") that produces a native crash inside the
Lightning training loop.

Two changes, scoped to Windows + CUDA only:

  * GPT_SoVITS/s1_train.py: on Windows, use Lightning's "auto" strategy,
    which picks `single_device` for one GPU and skips DDP entirely. Also
    pin devices=1 on Windows so multi-GPU users don't accidentally enable
    DDP. Non-Windows behaviour is unchanged (NCCL DDP, all available GPUs).
  * GPT_SoVITS/AR/data/bucket_sampler.py: when the distributed process
    group isn't initialized (i.e. running under single_device strategy),
    fall back to a single-replica configuration instead of crashing in
    dist.get_world_size(). Defensive change — behaviour is unchanged when
    DDP is properly initialized.

Tested on:
  * Windows 11 + RTX 5090 (sm_120) + CUDA 12.8 + PyTorch 2.11+cu128
    15-epoch s1 training completes cleanly, weights saved as expected.

Closes #2626.

2026-05-16 19:10:20 -07:00

Fix s1_train DDP crash on Windows single-GPU (sm_120 / Blackwell)

2026-05-16 19:10:20 -07:00

BigVGAN

Refactor: Format Code with Ruff and Update Deprecated G2PW Link (#2255 )

2025-04-07 16:42:47 +08:00

configs

优化TTS_Config的代码逻辑 (#2536 )

2025-07-18 11:54:40 +08:00

eres2net

Fix bugs in install.sh, reduce log noise, and improve error reporting (#2464 )

2025-06-17 15:21:36 +08:00