Merge pull request #28 from ricecakey06/main

Update README.md
2026-06-05 05:48:14 +08:00 · 2024-01-17 18:10:48 +08:00 · 2024-01-17 18:10:48 +08:00 · ff95f5b48a
commit ff95f5b48a
parent 318ebe5d9f 021a650643
1 changed files with 49 additions and 37 deletions
--- a/README.md
+++ b/README.md
@ -1,13 +1,29 @@
-# GPT-SoVITS - Voice Conversion and Text-to-Speech WebUI
+<div align="center">

-## Demo Video and Features
+<h1>GPT-SoVITS</h1>
+A Powerful Few-shot Voice Conversion and Text-to-Speech WebUI based on VITS.<br><br>

-Check out our demo video in Chinese: [Bilibili Demo](https://www.bilibili.com/video/BV12g4y1m7Uw/)
+[![madewithlove](https://img.shields.io/badge/made_with-%E2%9D%A4-red?style=for-the-badge&labelColor=orange
+)](https://github.com/RVC-Boss/GPT-SoVITS)
+
+<img src="https://counter.seku.su/cmoe?name=gptsovits&theme=r34" /><br>
+
+[![Licence](https://img.shields.io/badge/LICENSE-MIT-green.svg?style=for-the-badge)](https://github.com/RVC-Boss/GPT-SoVITS/blob/main/LICENSE)
+[![Huggingface](https://img.shields.io/badge/🤗%20-Spaces-yellow.svg?style=for-the-badge)](https://huggingface.co/lj1995/GPT-SoVITS/tree/main)
+
+[**English**](./README.md) | [**中文简体**](./docs/cn/README.md)
+
+</div>
+
+------
+
+
+
+> Check out our [demo video](https://www.bilibili.com/video/BV12g4y1m7Uw) here!

 https://github.com/RVC-Boss/GPT-SoVITS/assets/129054828/05bee1fa-bdd8-4d85-9350-80c060ab47fb

-### Features:
-
+## Features:
 1. **Zero-shot TTS:** Input a 5-second vocal sample and experience instant text-to-speech conversion.

 2. **Few-shot TTS:** Fine-tune the model with just 1 minute of training data for improved voice similarity and realism.
@ -16,23 +32,7 @@ https://github.com/RVC-Boss/GPT-SoVITS/assets/129054828/05bee1fa-bdd8-4d85-9350-

 4. **WebUI Tools:** Integrated tools include voice accompaniment separation, automatic training set segmentation, Chinese ASR, and text labeling, assisting beginners in creating training datasets and GPT/SoVITS models.

-## Todo List
-
-0. **High Priority:**
-   - Localization in Japanese and English.
-   - User guide.
-
-1. **Features:**
-   - Zero-shot voice conversion (5s) / few-shot voice conversion (1min).
-   - TTS speaking speed control.
-   - Enhanced TTS emotion control.
-   - Experiment with changing SoVITS token inputs to probability distribution of vocabs.
-   - Improve English and Japanese text frontend.
-   - Develop tiny and larger-sized TTS models.
-   - Colab scripts.
-   - Expand training dataset (2k -> 10k).
-
-## Requirements (How to Install)
+## Environment Preparation

 ### Python and PyTorch Version

@ -85,14 +85,6 @@ brew install ffmpeg

 Download and place [ffmpeg.exe](https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/ffmpeg.exe) and [ffprobe.exe](https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/ffprobe.exe) in the GPT-SoVITS root.

-### Pretrained Models
-
-Download pretrained models from [GPT-SoVITS Models](https://huggingface.co/lj1995/GPT-SoVITS) and place them in `GPT_SoVITS\pretrained_models`.
-
-For Chinese ASR, download models from [Damo ASR Models](https://modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/files) and place them in `tools/damo_asr/models`.
-
-For UVR5 (Vocals/Accompaniment Separation & Reverberation Removal), download models from [UVR5 Weights](https://huggingface.co/lj1995/VoiceConversionWebUI/tree/main/uvr5_weights) and place them in `tools/uvr5/uvr5_weights`.
-
 ## Dataset Format

 The TTS annotation .list file format:
@ -101,18 +93,33 @@ The TTS annotation .list file format:
 vocal_path|speaker_name|language|text
 ```

-Example:
-
-```
-D:\GPT-SoVITS\xxx/xxx.wav|xxx|en|I like playing Genshin.
-```
-
 Language dictionary:

 - 'zh': Chinese
 - 'ja': Japanese
 - 'en': English

+Example:
+
+```
+D:\GPT-SoVITS\xxx/xxx.wav|xxx|en|I like playing Genshin.
+```
+## Todo List
+
+- [ ] **High Priority:**
+   - [ ] Localization in Japanese and English.
+   - [ ] User guide.
+
+- [ ] **Features:**
+   - [ ] Zero-shot voice conversion (5s) / few-shot voice conversion (1min).
+   - [ ] TTS speaking speed control.
+   - [ ] Enhanced TTS emotion control.
+   - [ ] Experiment with changing SoVITS token inputs to probability distribution of vocabs.
+   - [ ] Improve English and Japanese text frontend.
+   - [ ] Develop tiny and larger-sized TTS models.
+   - [ ] Colab scripts.
+   - [ ] Expand training dataset (2k -> 10k).
+   
 ## Credits

 Special thanks to the following projects and contributors:
@ -131,3 +138,8 @@ Special thanks to the following projects and contributors:
 - [SubFix](https://github.com/cronrpc/SubFix)
 - [FFmpeg](https://github.com/FFmpeg/FFmpeg)
 - [gradio](https://github.com/gradio-app/gradio)
+
+## Thanks to all contributors for their efforts
+<a href="https://github.com/RVC-Boss/GPT-SoVITS/graphs/contributors" target="_blank">
+  <img src="https://contrib.rocks/image?repo=RVC-Boss/GPT-SoVITS" />
+</a>