- Remove redundant comments and debug information
- Adjust default parameters in training scripts
- Clean up code in lora_trainer and trainer implementations
- Add DeepSpeed ZeRO-3 configuration support
- Optimize memory usage during training
- Rename training scripts to reflect ZeRO usage
- Update related configuration files and trainers
- Add SFT (Supervised Fine-Tuning) trainers for all model variants:
- CogVideoX I2V and T2V
- CogVideoX-1.5 I2V and T2V
- Add DeepSpeed ZeRO configuration files:
- ZeRO-2 with and without CPU offload
- ZeRO-3 with and without CPU offload
- Add base accelerate config for distributed training
- Update trainer.py to support SFT training mode
This enables full-parameter fine-tuning with memory-efficient distributed training using DeepSpeed ZeRO optimization.
This change enables caching of prompt embeddings in the CogVideoX text-to-video
LoRA trainer, which can improve training efficiency by avoiding redundant text
encoding operations.
Add and refactor trainers for CogVideoX model variants:
- Implement CogVideoXT2VLoraTrainer for text-to-video generation
- Refactor CogVideoXI2VLoraTrainer for image-to-video generation
Both trainers support LoRA fine-tuning with proper handling of:
- Model components loading and initialization
- Video encoding and batch collation
- Loss computation with noise prediction
- Validation step for generation