mirror of https://github.com/RVC-Boss/GPT-SoVITS.git synced 2026-07-03 12:38:12 +08:00

mangzhnag a47b87bb7b feat: add waveform trimming UI, Swagger docs, enhanced API annotations

- Frontend: add wavesurfer.js v7 waveform visualization with region-based audio trimming
- Frontend: add export trimmed audio button, OfflineAudioContext-based client-side trimming
- API: add OpenAPI tags, descriptions, and summaries for all endpoints
- API: enhance /health endpoint with PID, memory, and GPU info (optional psutil/torch)
- API: bump version to 1.1.0, enable /docs and /redoc
- Docs: rewrite simple_api.md as comprehensive API reference
- Docs: update simple_api_quickstart.md with Swagger/ReDoc links
- Docs: update README with endpoint table and feature list
- Tests: fix DummyFastAPI mock to accept **kwargs (tags, summary, etc.)
- All 7 tests pass, compile check OK

2026-06-11 21:42:29 +08:00

9.1 KiB

Raw Blame History

GPT-SoVITS 简化接口文档

本项目新增 simple_api.py 作为中间层，封装 GPT-SoVITS 推理引擎，提供更简洁的调用方式。

快速开始

# 安装依赖
python -m pip install -r requirements.txt

# 启动
python simple_api.py -c simple_api.yaml

# 访问
Swagger UI:  http://127.0.0.1:9881/docs
ReDoc:       http://127.0.0.1:9881/redoc
测试前端:     http://127.0.0.1:9881/test/

接口总览

方法	路径	说明	标签
GET	`/health`	健康检查（含 GPU 信息）	System
GET	`/voices`	列出 voice profiles	System
POST	`/api/tts`	核心 TTS 接口（MVP）	MVP
GET	`/speak`	voice profile TTS (GET)	Profile
POST	`/speak`	voice profile TTS (POST)	Profile
POST	`/v1/tts`	OpenAI 兼容格式 TTS	Profile
POST	`/speak/base64`	返回 Base64 音频	Profile
POST	`/admin/reload-config`	热加载配置	Admin
POST	`/admin/weights`	切换模型权重	Admin

1. POST /api/tts — 核心 TTS 接口

推荐使用此接口。上传参考音频和文字，直接返回生成的音频。

请求格式

Content-Type: multipart/form-data

字段说明

字段	类型	必填	默认值	说明
`text`	string	是	—	需要生成的文字
`ref_audio`	file	是	—	主参考音频，3-10 秒（支持 wav/flac/ogg/mp3/m4a/aac）
`aux_ref_audio`	file[]	否	—	辅助参考音频，可上传多个
`prompt_text`	string	否	`""`	主参考音频对应文字（v2 可留空；v3/v4 必填）
`text_lang`	string	否	`zh`	生成文字语言：zh/en/ja/ko/yue/auto
`prompt_lang`	string	否	`zh`	参考音频语言：zh/en/ja/ko/yue/auto
`format`	string	否	`wav`	返回格式：wav/ogg/aac/raw
`emotion`	string	否	`neutral`	情绪预设：neutral/happy/calm/sad/angry
`speed`	float	否	—	语速（0.5-2.0），覆盖情绪预设中的语速
`seed`	int	否	`-1`	随机种子，-1 为随机

情绪预设参数映射

情绪	temperature	top_p	top_k	speed_factor	repetition_penalty
neutral	—	—	—	—	—
happy	1.1	0.95	—	—	—
calm	0.8	0.85	—	0.92	—
sad	0.75	0.85	—	0.9	—
angry	1.2	—	20	—	1.25

显式传入 speed 会覆盖情绪预设中的 speed_factor。

curl 示例

基础调用：

curl.exe -X POST http://127.0.0.1:9881/api/tts `
  -F "text=你好，欢迎使用这个声音。" `
  -F "ref_audio=@D:\audio\ref.wav" `
  --output output.wav

带辅助参考音频和情绪：

curl.exe -X POST http://127.0.0.1:9881/api/tts `
  -F "text=你好，欢迎使用这个声音。" `
  -F "ref_audio=@D:\audio\ref.wav" `
  -F "aux_ref_audio=@D:\audio\aux1.wav" `
  -F "emotion=happy" `
  -F "speed=1.1" `
  --output output.wav

Linux/macOS：

curl -X POST http://127.0.0.1:9881/api/tts \
  -F "text=你好，欢迎使用这个声音。" \
  -F "ref_audio=@/path/to/ref.wav" \
  -F "emotion=calm" \
  --output output.wav

成功：音频二进制流（Content-Type: audio/wav 等）
失败：JSON 错误信息

{"message": "tts failed", "exception": "..."}

常见错误

HTTP 状态码	原因
400	text 为空 / ref_audio 缺失 / 音频时长不在 3-10 秒 / 不支持的 format / v3/v4 时 prompt_text 为空
404	voice profile 不存在（仅 /speak 接口）
503	TTS pipeline 未就绪（模型未加载）

2. GET /health — 健康检查

curl http://127.0.0.1:9881/health

返回示例：

{
  "status": "ok",
  "tts_config": "GPT_SoVITS/configs/tts_infer.yaml",
  "version": "v2",
  "languages": ["auto", "en", "zh"],
  "pid": 12345,
  "memory_mb": 2048.5,
  "gpu": {
    "name": "NVIDIA GeForce RTX 3080",
    "memory_used_mb": 4096.2,
    "memory_total_mb": 10240.0
  }
}

3. GET /voices — 列出 voice profiles

curl http://127.0.0.1:9881/voices

返回示例：

{
  "default_voice": "default",
  "voices": [
    {
      "name": "default",
      "description": "Replace this profile with your reference voice.",
      "text_lang": "zh",
      "prompt_lang": "zh",
      "ref_audio_path": "reference.wav",
      "ready": true
    }
  ]
}

4. POST /speak — voice profile TTS

基于 simple_api.yaml 中配置的 voice profile 调用 TTS。

请求体（JSON）

{
  "text": "hello world",
  "voice": "default",
  "text_lang": "zh",
  "format": "wav",
  "speed": 1.0
}

字段	类型	必填	说明
`text`	string	是	需要生成的文字
`voice`	string	否	voice profile 名称，不传则使用 default
`text_lang`	string	否	生成文字语言
`format`	string	否	返回格式
`stream`	bool	否	是否流式返回
`speed`	float	否	语速

curl 示例

curl -X POST http://127.0.0.1:9881/speak \
  -H "Content-Type: application/json" \
  -d '{"text":"你好世界","voice":"default"}' \
  --output output.wav

5. GET /speak — voice profile TTS (GET)

与 POST /speak 相同，但通过 URL 参数传递。

GET /speak?text=hello&voice=default&format=wav

6. POST /speak/base64 — 返回 Base64 音频

返回 Base64 编码的音频，适合 Web 前端直接使用。

curl -X POST http://127.0.0.1:9881/speak/base64 \
  -H "Content-Type: application/json" \
  -d '{"text":"hello","voice":"default"}'

{
  "media_type": "audio/wav",
  "audio_base64": "UklGRi..."
}

7. POST /v1/tts — OpenAI 兼容格式

请求格式与 POST /speak 相同，路径兼容 OpenAI TTS API 风格。

8. POST /admin/reload-config — 热加载配置

重新加载 simple_api.yaml，无需重启服务。

curl -X POST http://127.0.0.1:9881/admin/reload-config

返回：{"message": "success", "default_voice": "default"}

9. POST /admin/weights — 切换模型权重

运行时切换 GPT-SoVITS 模型权重文件。

curl -X POST http://127.0.0.1:9881/admin/weights \
  -H "Content-Type: application/json" \
  -d '{"gpt_weights_path":"path/to/gpt.pt","sovits_weights_path":"path/to/sovits.pt"}'

配置文件

simple_api.yaml：

server:
  host: 127.0.0.1
  port: 9881
  tts_config: GPT_SoVITS/configs/tts_infer.yaml

cors_allow_origins:
  - "*"

upload:
  dir: runtime/uploads
  min_ref_seconds: 3
  max_ref_seconds: 10
  max_upload_mb: 80

defaults:
  text_lang: zh
  prompt_lang: zh
  media_type: wav
  text_split_method: cut5
  batch_size: 1
  speed_factor: 1.0
  seed: -1

emotion_presets:
  neutral: {}
  happy:
    temperature: 1.1
    top_p: 0.95
  calm:
    temperature: 0.8
    top_p: 0.85
    speed_factor: 0.92
  sad:
    temperature: 0.75
    top_p: 0.85
    speed_factor: 0.9
  angry:
    temperature: 1.2
    top_k: 20
    repetition_penalty: 1.25

voices:
  default:
    description: Replace this profile with your reference voice.
    ref_audio_path: reference.wav
    prompt_text: Replace this with the exact text spoken in reference.wav.
    prompt_lang: zh
    text_lang: zh

配置说明

配置项	说明
`server.host`	监听地址
`server.port`	监听端口
`server.tts_config`	GPT-SoVITS 推理配置文件路径
`upload.dir`	临时上传目录
`upload.min_ref_seconds`	主参考音频最短秒数
`upload.max_ref_seconds`	主参考音频最长秒数
`upload.max_upload_mb`	单个上传文件最大体积 (MB)
`defaults.*`	所有接口的默认参数
`emotion_presets.*`	情绪预设参数映射
`voices.*`	固定音色 profile

添加自定义音色

编辑 simple_api.yaml，在 voices 下添加：

voices:
  narrator:
    description: "男声旁白"
    ref_audio_path: voices/narrator.wav
    prompt_text: "旁白参考音频的逐字稿"
    prompt_lang: zh
    text_lang: zh

然后热加载：

curl -X POST http://127.0.0.1:9881/admin/reload-config

测试

契约测试（无需 GPU）

python -m unittest tests.test_simple_api_contract -v

覆盖：

/api/tts 路由注册
上传接口参数构造
主参考音频 3-10 秒校验
v2 空 prompt_text 允许 / v3/v4 空 prompt_text 拒绝
临时上传目录清理
情绪预设应用与 speed 覆盖

前端测试

启动后端
访问 http://127.0.0.1:9881/test/
上传音频或视频（视频会自动提取音频）
使用波形裁剪工具选择 3-10 秒片段
填写文字，选择情绪和语速
点击生成

启动脚本

脚本	平台	说明
`go-simple-api.ps1`	Windows PowerShell	自动检测 runtime\python.exe
`go-simple-api.bat`	Windows CMD	同上
`open-test-frontend.ps1`	Windows PowerShell	直接打开测试前端 HTML

9.1 KiB Raw Blame History Unescape Escape

GPT-SoVITS 简化接口文档

快速开始

接口总览

1. POST /api/tts — 核心 TTS 接口

请求格式

字段说明

情绪预设参数映射

curl 示例

返回

常见错误

2. GET /health — 健康检查

3. GET /voices — 列出 voice profiles

4. POST /speak — voice profile TTS

请求体（JSON）

curl 示例

5. GET /speak — voice profile TTS (GET)

6. POST /speak/base64 — 返回 Base64 音频

7. POST /v1/tts — OpenAI 兼容格式

8. POST /admin/reload-config — 热加载配置

9. POST /admin/weights — 切换模型权重

配置文件

配置说明

添加自定义音色

测试

契约测试（无需 GPU）

前端测试

启动脚本

9.1 KiB

Raw Blame History