From 43eabf21da367a9e29167dbfbffe89012de92509 Mon Sep 17 00:00:00 2001 From: RVC-Boss <129054828+RVC-Boss@users.noreply.github.com> Date: Tue, 11 Feb 2025 21:08:33 +0800 Subject: [PATCH] gpt_sovits_v3 gpt_sovits_v3 --- GPT_SoVITS/f5_tts/model/backbones/README.md | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) create mode 100644 GPT_SoVITS/f5_tts/model/backbones/README.md diff --git a/GPT_SoVITS/f5_tts/model/backbones/README.md b/GPT_SoVITS/f5_tts/model/backbones/README.md new file mode 100644 index 0000000..155671e --- /dev/null +++ b/GPT_SoVITS/f5_tts/model/backbones/README.md @@ -0,0 +1,20 @@ +## Backbones quick introduction + + +### unett.py +- flat unet transformer +- structure same as in e2-tts & voicebox paper except using rotary pos emb +- update: allow possible abs pos emb & convnextv2 blocks for embedded text before concat + +### dit.py +- adaln-zero dit +- embedded timestep as condition +- concatted noised_input + masked_cond + embedded_text, linear proj in +- possible abs pos emb & convnextv2 blocks for embedded text before concat +- possible long skip connection (first layer to last layer) + +### mmdit.py +- sd3 structure +- timestep as condition +- left stream: text embedded and applied a abs pos emb +- right stream: masked_cond & noised_input concatted and with same conv pos emb as unett