mirror of
https://github.com/THUDM/CogVideo.git
synced 2026-05-09 00:24:06 +08:00
The ncopy calculation used `latent.shape[2] % patch_size_t` which computes the remainder rather than the number of frames needed to reach alignment. For example, with shape[2]=13 and patch_size_t=4, this gives ncopy=1, resulting in 14 frames which is still not divisible by 4, causing the assertion to fail. The correct formula is `(patch_size_t - latent.shape[2] % patch_size_t) % patch_size_t` which computes how many frames must be prepended to reach the next multiple of patch_size_t. The outer modulo handles the already-aligned case (returns 0 instead of patch_size_t). Fixes #782