CogVideo

mirror of https://github.com/THUDM/CogVideo.git synced 2026-05-19 15:40:31 +08:00

Author	SHA1	Message	Date
google-labs-jules[bot]	ebc9d39c02	feat: Add knowledge distillation for logo generation with VGG teacher This commit introduces a knowledge distillation module to enhance logo generation in the CogVideoX-2B text-to-video model. The key changes include: - A new `KDTrainer` class that inherits from `CogVideoXT2VLoraTrainer`. This trainer loads a teacher model and computes a knowledge distillation loss to guide the student model. - The teacher model loading logic has been updated to support a VGG16-based Faster R-CNN model, to be compatible with user-provided weights. This includes a custom construction of the Faster R-CNN model with a VGG16 backbone and appropriate RoI heads. - The `kd` training type is now supported, allowing users to select it from the command line. - New command-line arguments (`teacher_model_path`, `teacher_model_num_classes`, `kd_loss_weight`) have been added to configure the knowledge distillation process. - A new configuration file (`cogvideox_2b_kd.yaml`) is provided as an example for running a `kd` training session.	2025-08-21 10:13:06 +00:00
google-labs-jules[bot]	193b1f4dcb	feat: Add knowledge distillation for logo generation This commit introduces a knowledge distillation module to enhance logo generation in the CogVideoX-2B text-to-video model. The key changes include: - A new `KDTrainer` class that inherits from `CogVideoXT2VLoraTrainer`. This trainer loads a teacher model (OpenLogo Faster R-CNN) and computes a knowledge distillation loss to guide the student model. - The `kd` training type is now supported, allowing users to select it from the command line. - New command-line arguments (`teacher_model_path`, `teacher_model_num_classes`, `kd_loss_weight`) have been added to configure the knowledge distillation process. - A new configuration file (`cogvideox_2b_kd.yaml`) is provided as an example for running a `kd` training session.	2025-08-21 09:14:51 +00:00
Yuxuan Zhang	39c6562dc8	format	2025-03-22 15:14:06 +08:00
OleehyO	455b44a7b5	chore: code cleanup and parameter optimization - Remove redundant comments and debug information - Adjust default parameters in training scripts - Clean up code in lora_trainer and trainer implementations	2025-01-13 11:56:28 +00:00
zR	1534bf33eb	add pipeline	2025-01-12 19:27:21 +08:00
OleehyO	fdb9820949	feat: support DeepSpeed ZeRO-3 and optimize peak memory usage - Add DeepSpeed ZeRO-3 configuration support - Optimize memory usage during training - Rename training scripts to reflect ZeRO usage - Update related configuration files and trainers	2025-01-12 05:33:56 +00:00
OleehyO	caa24bdc36	feat: add SFT support with ZeRO optimization strategies - Add SFT (Supervised Fine-Tuning) trainers for all model variants: - CogVideoX I2V and T2V - CogVideoX-1.5 I2V and T2V - Add DeepSpeed ZeRO configuration files: - ZeRO-2 with and without CPU offload - ZeRO-3 with and without CPU offload - Add base accelerate config for distributed training - Update trainer.py to support SFT training mode This enables full-parameter fine-tuning with memory-efficient distributed training using DeepSpeed ZeRO optimization.	2025-01-11 02:13:32 +00:00
OleehyO	e213b6c083	fix: pad latent frames to match patch_size_t requirements	2025-01-11 02:08:07 +00:00
OleehyO	36427274d6	style: format import statements across finetune module	2025-01-07 05:54:52 +00:00
zR	1789f07256	format and check fp16 for cogvideox2b	2025-01-07 13:16:18 +08:00
OleehyO	9157e0cbc8	Adapt dataset for text embeddings and add noise padding - Add text embedding support in dataset collation - Pad 2 random noise frames at the beginning of latent space during training	2025-01-06 10:44:58 +00:00
OleehyO	7e1ac76847	feat(cogvideox): add prompt embedding caching support This change enables caching of prompt embeddings in the CogVideoX text-to-video LoRA trainer, which can improve training efficiency by avoiding redundant text encoding operations.	2025-01-04 06:17:56 +00:00
OleehyO	66e4ba2592	fix(cogvideox): add prompt embedding caching and fix frame padding - Add support for cached prompt embeddings in dataset - Fix bug where first frame wasn't properly padded in latent space	2025-01-04 06:16:42 +00:00
OleehyO	a001842834	feat: implement CogVideoX trainers for I2V and T2V tasks Add and refactor trainers for CogVideoX model variants: - Implement CogVideoXT2VLoraTrainer for text-to-video generation - Refactor CogVideoXI2VLoraTrainer for image-to-video generation Both trainers support LoRA fine-tuning with proper handling of: - Model components loading and initialization - Video encoding and batch collation - Loss computation with noise prediction - Validation step for generation	2025-01-01 15:10:54 +00:00
OleehyO	85e00a1082	feat(models): add scaffolding	2025-01-01 15:10:40 +00:00

15 Commits