mirror of
https://github.com/RVC-Boss/GPT-SoVITS.git
synced 2025-04-05 19:41:56 +08:00
parent
0565c3e072
commit
f042030cca
172
docs/en/Changelog_EN.md
Normal file
172
docs/en/Changelog_EN.md
Normal file
@ -0,0 +1,172 @@
|
||||
### 20240121 Update
|
||||
|
||||
1. Added `is_share` to the `config`. In scenarios like Colab, this can be set to `True` to map the WebUI to the public network.
|
||||
2. Added English system translation support to WebUI.
|
||||
3. The `cmd-asr` automatically detects if the FunASR model is included; if not found in the default directory, it will be downloaded from ModelScope.
|
||||
4. Attempted to fix the SoVITS training ZeroDivisionError reported in [Issue 79](https://github.com/RVC-Boss/GPT-SoVITS/issues/79) by filtering samples with zero length, etc.
|
||||
5. Cleaned up cached audio files and other files in the `TEMP` folder.
|
||||
6. Significantly reduced the issue of synthesized audio containing the end of the reference audio.
|
||||
|
||||
### 20240122 Update
|
||||
|
||||
1. Fixed the issue where excessively short output files resulted in repeating the reference audio.
|
||||
2. Tested native support for English and Japanese training (Japanese training requires the root directory to be free of non-English special characters).
|
||||
3. Improved audio path checking. If an attempt is made to read from an incorrect input path, it will report that the path does not exist instead of an ffmpeg error.
|
||||
|
||||
### 20240123 Update
|
||||
|
||||
1. Resolved the issue where Hubert extraction caused NaN errors, leading to SoVITS/GPT training ZeroDivisionError.
|
||||
2. Added support for quick model switching in the inference WebUI.
|
||||
3. Optimized the model file sorting logic.
|
||||
4. Replaced `jieba` with `jieba_fast` for Chinese word segmentation.
|
||||
|
||||
### 20240126 Update
|
||||
|
||||
1. Added support for Chinese-English mixed and Japanese-English mixed output texts.
|
||||
2. Added an optional segmentation mode for output.
|
||||
3. Fixed the issue of UVR5 reading and automatically jumping out of directories.
|
||||
4. Fixed multiple newline issues causing inference errors.
|
||||
5. Removed redundant logs in the inference WebUI.
|
||||
6. Supported training and inference on Mac.
|
||||
7. Automatically forced single precision for GPU that do not support half precision; enforced single precision under CPU inference.
|
||||
|
||||
### 20240128 Update
|
||||
|
||||
1. Fixed the issue with the pronunciation of numbers converting to Chinese characters.
|
||||
2. Fixed the issue of swallowing a few characters at the beginning of sentences.
|
||||
3. Excluded unreasonable reference audio lengths by setting restrictions.
|
||||
4. Fixed the issue where GPT training did not save checkpoints.
|
||||
5. Completed model downloading process in the Dockerfile.
|
||||
|
||||
### 20240129 Update
|
||||
|
||||
1. Changed training configurations to single precision for GPUs like the 16 series, which have issues with half precision training.
|
||||
2. Tested and updated the available Colab version.
|
||||
3. Fixed the issue of git cloning the ModelScope FunASR repository with older versions of FunASR causing interface misalignment errors.
|
||||
|
||||
### 20240130 Update
|
||||
|
||||
1. Automatically removed double quotes from all path-related entries to prevent errors from novice users copying paths with double quotes.
|
||||
2. Fixed issues with splitting Chinese and English punctuation and added punctuation at the beginning and end of sentences.
|
||||
3. Added splitting by punctuation.
|
||||
|
||||
### 20240201 Update
|
||||
|
||||
1. Fixed the UVR5 format reading error causing separation failures.
|
||||
2. Supported automatic segmentation and language recognition for mixed Chinese-Japanese-English texts.
|
||||
|
||||
### 20240202 Update
|
||||
|
||||
1. Fixed the issue where an ASR path ending with `/` caused an error in saving the filename.
|
||||
2. [PR 377](https://github.com/RVC-Boss/GPT-SoVITS/pull/377) introduced PaddleSpeech's Normalizer to fix issues like reading "xx.xx%" (percent symbols) and "元/吨" being read as "元吨" instead of "元每吨", and fixed underscore errors.
|
||||
|
||||
### 20240207 Update
|
||||
|
||||
1. Corrected language parameter confusion causing decreased Chinese inference quality reported in [Issue 391](https://github.com/RVC-Boss/GPT-SoVITS/issues/391).
|
||||
2. [PR 403](https://github.com/RVC-Boss/GPT-SoVITS/pull/403) adapted UVR5 to higher versions of librosa.
|
||||
3. [Commit 14a2851](https://github.com/RVC-Boss/GPT-SoVITS/commit/14a285109a521679f8846589c22da8f656a46ad8) fixed UVR5 inf everywhere error caused by `is_half` parameter not converting to boolean, resulting in constant half precision inference, which caused `inf` on 16 series GPUs.
|
||||
4. Optimized English text frontend.
|
||||
5. Fixed Gradio dependencies.
|
||||
6. Supported automatic reading of `.list` full paths if the root directory is left blank during dataset preparation.
|
||||
7. Integrated Faster Whisper ASR for Japanese and English.
|
||||
|
||||
### 20240208 Update
|
||||
|
||||
1. [Commit 59f35ad](https://github.com/RVC-Boss/GPT-SoVITS/commit/59f35adad85815df27e9c6b33d420f5ebfd8376b) attempted to fix GPT training hang on Windows 10 1909 and [Issue 232](https://github.com/RVC-Boss/GPT-SoVITS/issues/232) (Traditional Chinese System Language).
|
||||
|
||||
### 20240212 Update
|
||||
|
||||
1. Optimized logic for Faster Whisper and FunASR, switching Faster Whisper to mirror downloads to avoid issues with Hugging Face connections.
|
||||
2. [PR 457](https://github.com/RVC-Boss/GPT-SoVITS/pull/457) enabled experimental DPO Loss training option to mitigate GPT repetition and missing characters by constructing negative samples during training and made several inference parameters available in the inference WebUI.
|
||||
|
||||
### 20240214 Update
|
||||
|
||||
1. Supported Chinese experiment names in training (previously caused errors).
|
||||
2. Made DPO training an optional feature instead of mandatory. If selected, the batch size is automatically halved. Fixed issues with new parameters not being passed in the inference WebUI.
|
||||
|
||||
### 20240216 Update
|
||||
|
||||
1. Supported input without reference text.
|
||||
2. Fixed bugs in Chinese frontend reported in [Issue 475](https://github.com/RVC-Boss/GPT-SoVITS/issues/475).
|
||||
|
||||
### 20240221 Update
|
||||
|
||||
1. Added a noise reduction option during data processing (noise reduction leaves only 16kHz sampling rate; use only if the background noise is significant).
|
||||
2. [PR 559](https://github.com/RVC-Boss/GPT-SoVITS/pull/559), [PR 556](https://github.com/RVC-Boss/GPT-SoVITS/pull/556), [PR 532](https://github.com/RVC-Boss/GPT-SoVITS/pull/532), [PR 507](https://github.com/RVC-Boss/GPT-SoVITS/pull/507), [PR 509](https://github.com/RVC-Boss/GPT-SoVITS/pull/509) optimized Chinese and Japanese frontend processing.
|
||||
3. Switched Mac CPU inference to use CPU instead of MPS for faster performance.
|
||||
4. Fixed Colab public URL issue.
|
||||
|
||||
### 20240306 Update
|
||||
|
||||
1. [PR 672](https://github.com/RVC-Boss/GPT-SoVITS/pull/672) accelerated inference by 50% (tested on RTX3090 + PyTorch 2.2.1 + CU11.8 + Win10 + Py39) .
|
||||
2. No longer requires downloading the Chinese FunASR model first when using Faster Whisper non-Chinese ASR.
|
||||
3. [PR 610](https://github.com/RVC-Boss/GPT-SoVITS/pull/610) fixed UVR5 reverb removal model where the setting was reversed.
|
||||
4. [PR 675](https://github.com/RVC-Boss/GPT-SoVITS/pull/675) enabled automatic CPU inference for Faster Whisper if no CUDA is available.
|
||||
5. [PR 573](https://github.com/RVC-Boss/GPT-SoVITS/pull/573) modified `is_half` check to ensure proper CPU inference on Mac.
|
||||
|
||||
### 202403/202404/202405 Update
|
||||
|
||||
#### Minor Fixes:
|
||||
|
||||
1. Fixed issues with the no-reference text mode.
|
||||
2. Optimized the Chinese and English text frontend.
|
||||
3. Improved API format.
|
||||
4. Fixed CMD format issues.
|
||||
5. Added error prompts for unsupported languages during training data processing.
|
||||
6. Fixed the bug in Hubert extraction.
|
||||
|
||||
#### Major Fixes:
|
||||
|
||||
1. Fixed the issue of SoVITS training without freezing VQ (which could cause quality degradation).
|
||||
2. Added a quick inference branch.
|
||||
|
||||
### 20240610 Update
|
||||
|
||||
#### Minor Fixes:
|
||||
|
||||
1. [PR 1168](https://github.com/RVC-Boss/GPT-SoVITS/pull/1168) & [PR 1169](https://github.com/RVC-Boss/GPT-SoVITS/pull/1169) improved the logic for pure punctuation and multi-punctuation text input.
|
||||
2. [Commit 501a74a](https://github.com/RVC-Boss/GPT-SoVITS/commit/501a74ae96789a26b48932babed5eb4e9483a232) fixed CMD format for MDXNet de-reverb in UVR5, supporting paths with spaces.
|
||||
3. [PR 1159](https://github.com/RVC-Boss/GPT-SoVITS/pull/1159) fixed progress bar logic for SoVITS training in `s2_train.py`.
|
||||
|
||||
#### Major Fixes:
|
||||
|
||||
4. [Commit 99f09c8](https://github.com/RVC-Boss/GPT-SoVITS/commit/99f09c8bdc155c1f4272b511940717705509582a) fixed the issue of WebUI's GPT fine-tuning not reading BERT feature of Chinese input texts, causing inconsistency with inference and potential quality degradation.
|
||||
**Caution: If you have previously fine-tuned with a large amount of data, it is recommended to retune the model to improve quality.**
|
||||
|
||||
### 20240706 Update
|
||||
|
||||
#### Minor Fixes:
|
||||
|
||||
1. [Commit 1250670](https://github.com/RVC-Boss/GPT-SoVITS/commit/db50670598f0236613eefa6f2d5a23a271d82041) fixed default batch size decimal issue in CPU inference.
|
||||
2. [PR 1258](https://github.com/RVC-Boss/GPT-SoVITS/pull/1258), [PR 1265](https://github.com/RVC-Boss/GPT-SoVITS/pull/1265), [PR 1267](https://github.com/RVC-Boss/GPT-SoVITS/pull/1267) fixed issues where denoising or ASR encountering exceptions would exit all pending audio files.
|
||||
3. [PR 1253](https://github.com/RVC-Boss/GPT-SoVITS/pull/1253) fixed the issue of splitting decimals when splitting by punctuation.
|
||||
4. [Commit a208698](https://github.com/RVC-Boss/GPT-SoVITS/commit/a208698e775155efc95b187b746d153d0f2847ca) fixed multi-process save logic for multi-GPU training.
|
||||
5. [PR 1251](https://github.com/RVC-Boss/GPT-SoVITS/pull/1251) removed redundant `my_utils`.
|
||||
|
||||
#### Major Fixes:
|
||||
|
||||
6. The accelerated inference code from [PR 672](https://github.com/RVC-Boss/GPT-SoVITS/pull/672) has been validated and merged into the main branch, ensuring consistent inference effects with the base.
|
||||
It also supports accelerated inference in no-reference text mode.
|
||||
|
||||
**Future updates will continue to verify the consistency of changes in the `fast_inference` branch**.
|
||||
|
||||
### 20240727 Update
|
||||
|
||||
#### Minor Fixes:
|
||||
|
||||
1. [PR 1298](https://github.com/RVC-Boss/GPT-SoVITS/pull/1298) cleaned up redundant i18n code.
|
||||
2. [PR 1299](https://github.com/RVC-Boss/GPT-SoVITS/pull/1299) fixed issues where trailing slashes in user file paths caused command line errors.
|
||||
3. [PR 756](https://github.com/RVC-Boss/GPT-SoVITS/pull/756) fixed the step calculation logic in GPT training.
|
||||
|
||||
#### Major Fixes:
|
||||
|
||||
4. [Commit 9588a3c](https://github.com/RVC-Boss/GPT-SoVITS/commit/9588a3c52d9ebdb20b3c5d74f647d12e7c1171c2) supported speech rate adjustment for synthesis.
|
||||
Enabled freezing randomness while only adjusting the speech rate.
|
||||
|
||||
### TODO list:
|
||||
|
||||
1. Optimize inference for Chinese polyphonic characters.
|
||||
(Seeking testers, feel free to comment your results in the [PR 488](https://github.com/RVC-Boss/GPT-SoVITS/pull/488))
|
||||
**Caution: This PR have been merged in v2 base model and will be included in the next release**.
|
||||
2. Working on solving the issue of low-quality reference audio causing poor audio quality.
|
||||
**Caution: Resolved in July 2024, scheduled for August 2024 release**.
|
Loading…
x
Reference in New Issue
Block a user