zzz b0e465eb72
feat: 添加导出 v3 的 script (#2208)
* feat: 添加导出 v3 的 script

* Fix: 由于 export_torch_script_v3 的改动,v2 现在需要传入 top_k
2025-03-26 14:50:55 +08:00
..
2025-02-11 21:08:48 +08:00
2025-02-11 21:08:33 +08:00
2025-02-11 21:08:48 +08:00

Backbones quick introduction

unett.py

  • flat unet transformer
  • structure same as in e2-tts & voicebox paper except using rotary pos emb
  • update: allow possible abs pos emb & convnextv2 blocks for embedded text before concat

dit.py

  • adaln-zero dit
  • embedded timestep as condition
  • concatted noised_input + masked_cond + embedded_text, linear proj in
  • possible abs pos emb & convnextv2 blocks for embedded text before concat
  • possible long skip connection (first layer to last layer)

mmdit.py

  • sd3 structure
  • timestep as condition
  • left stream: text embedded and applied a abs pos emb
  • right stream: masked_cond & noised_input concatted and with same conv pos emb as unett