This commit introduces a knowledge distillation module to enhance logo generation in the CogVideoX-2B text-to-video model.
The key changes include:
- A new `KDTrainer` class that inherits from `CogVideoXT2VLoraTrainer`. This trainer loads a teacher model and computes a knowledge distillation loss to guide the student model.
- The teacher model loading logic has been updated to support a VGG16-based Faster R-CNN model, to be compatible with user-provided weights. This includes a custom construction of the Faster R-CNN model with a VGG16 backbone and appropriate RoI heads.
- The `kd` training type is now supported, allowing users to select it from the command line.
- New command-line arguments (`teacher_model_path`, `teacher_model_num_classes`, `kd_loss_weight`) have been added to configure the knowledge distillation process.
- A new configuration file (`cogvideox_2b_kd.yaml`) is provided as an example for running a `kd` training session.
This commit introduces a knowledge distillation module to enhance logo generation in the CogVideoX-2B text-to-video model.
The key changes include:
- A new `KDTrainer` class that inherits from `CogVideoXT2VLoraTrainer`. This trainer loads a teacher model (OpenLogo Faster R-CNN) and computes a knowledge distillation loss to guide the student model.
- The `kd` training type is now supported, allowing users to select it from the command line.
- New command-line arguments (`teacher_model_path`, `teacher_model_num_classes`, `kd_loss_weight`) have been added to configure the knowledge distillation process.
- A new configuration file (`cogvideox_2b_kd.yaml`) is provided as an example for running a `kd` training session.
- Remove redundant comments and debug information
- Adjust default parameters in training scripts
- Clean up code in lora_trainer and trainer implementations
- Add DeepSpeed ZeRO-3 configuration support
- Optimize memory usage during training
- Rename training scripts to reflect ZeRO usage
- Update related configuration files and trainers
- Add SFT (Supervised Fine-Tuning) trainers for all model variants:
- CogVideoX I2V and T2V
- CogVideoX-1.5 I2V and T2V
- Add DeepSpeed ZeRO configuration files:
- ZeRO-2 with and without CPU offload
- ZeRO-3 with and without CPU offload
- Add base accelerate config for distributed training
- Update trainer.py to support SFT training mode
This enables full-parameter fine-tuning with memory-efficient distributed training using DeepSpeed ZeRO optimization.
This change enables caching of prompt embeddings in the CogVideoX text-to-video
LoRA trainer, which can improve training efficiency by avoiding redundant text
encoding operations.
Add and refactor trainers for CogVideoX model variants:
- Implement CogVideoXT2VLoraTrainer for text-to-video generation
- Refactor CogVideoXI2VLoraTrainer for image-to-video generation
Both trainers support LoRA fine-tuning with proper handling of:
- Model components loading and initialization
- Video encoding and batch collation
- Loss computation with noise prediction
- Validation step for generation