mirror of
https://github.com/RVC-Boss/GPT-SoVITS.git
synced 2025-10-08 07:49:59 +08:00
103 lines
3.4 KiB
Markdown
103 lines
3.4 KiB
Markdown
## Overview
|
|
|
|
This document aims to introduce how to use our Text-to-Speech API, including making requests via GET and POST methods. This API supports converting text into the voice of specified characters and supports different languages and emotional expressions.
|
|
|
|
## Character and Emotion List
|
|
|
|
To obtain the supported characters and their corresponding emotions, please visit the following URL:
|
|
|
|
- URL: `http://127.0.0.1:5000/character_list`
|
|
- Returns: A JSON format list of characters and corresponding emotions
|
|
- Method: `GET`
|
|
|
|
```
|
|
{
|
|
"Hanabi": [
|
|
"default",
|
|
"Normal",
|
|
"Yandere",
|
|
],
|
|
"Hutao": [
|
|
"default"
|
|
]
|
|
}
|
|
```
|
|
|
|
## Regarding Aliases
|
|
|
|
From version 2.2.4, an alias system was added. Detailed allowed aliases can be found in `Inference/params_config.json`.
|
|
|
|
## Text-to-Speech
|
|
|
|
- URL: `http://127.0.0.1:5000/tts`
|
|
- Returns: Audio on success. Error message on failure.
|
|
- Method: `GET`/`POST`
|
|
|
|
### GET Method
|
|
|
|
#### Format
|
|
|
|
```
|
|
http://127.0.0.1:5000/tts?character={{characterName}}&text={{text}}
|
|
```
|
|
|
|
- Parameter explanation:
|
|
- `character`: The name of the character folder, pay attention to case sensitivity, full/half width, and language (Chinese/English).
|
|
- `text`: The text to be converted, URL encoding is recommended.
|
|
- Optional parameters include `text_language`, `format`, `top_k`, `top_p`, `batch_size`, `speed`, `temperature`, `emotion`, `save_temp`, and `stream`, detailed explanations are provided in the POST section below.
|
|
- From version 2.2.4, an alias system was added, with detailed allowed aliases found in `Inference/params_config.json`.
|
|
|
|
### POST Method
|
|
|
|
#### JSON Package Format
|
|
|
|
##### All Parameters
|
|
|
|
```
|
|
{
|
|
"method": "POST",
|
|
"body": {
|
|
"character": "${chaName}",
|
|
"emotion": "${Emotion}",
|
|
"text": "${speakText}",
|
|
"text_language": "${textLanguage}",
|
|
"batch_size": ${batch_size},
|
|
"speed": ${speed},
|
|
"top_k": ${topK},
|
|
"top_p": ${topP},
|
|
"temperature": ${temperature},
|
|
"stream": "${stream}",
|
|
"format": "${Format}",
|
|
"save_temp": "${saveTemp}"
|
|
}
|
|
}
|
|
```
|
|
|
|
You can omit one or more items. From version 2.2.4, an alias system was introduced, detailed allowed aliases can be found in `Inference/params_config.json`.
|
|
|
|
##### Minimal Data:
|
|
|
|
```
|
|
{
|
|
"method": "POST",
|
|
"body": {
|
|
"text": "${speakText}"
|
|
}
|
|
}
|
|
```
|
|
|
|
##### Parameter Explanation
|
|
|
|
- **text**: The text to be converted, URL encoding is recommended.
|
|
- **character**: Character folder name, pay attention to case sensitivity, full/half width, and language.
|
|
- **emotion**: Character emotion, must be an actually supported emotion of the character, otherwise, the default emotion will be used.
|
|
- **text_language**: Text language (auto / zh / en / ja), default is multilingual mixed.
|
|
- **top_k**, **top_p**, **temperature**: GPT model parameters, no need to modify if unfamiliar.
|
|
|
|
- **batch_size**: How many batches at a time, can be increased for faster processing if you have a powerful computer, integer, default is 1.
|
|
- **speed**: Speech speed, default is 1.0.
|
|
- **save_temp**: Whether to save temporary files, when true, the backend will save the generated audio, and subsequent identical requests will directly return that data, default is false.
|
|
- **stream**: Whether to stream, when true, audio will be returned sentence by sentence, default is false.
|
|
- **format**: Format, default is WAV, allows MP3/ WAV/ OGG.
|
|
|