mirror of
https://github.com/THUDM/CogVideo.git
synced 2026-05-09 00:24:06 +08:00
In VideoDiffusionLoss.__call__, the weight tensor `w` is a multi-dimensional tensor produced by append_dims(). Using Python's built-in min(w, self.min_snr_value) on a multi-dimensional tensor and a scalar does not perform element-wise clamping — it either raises an error or produces incorrect results depending on the tensor shape. Replace with torch.clamp(w, max=self.min_snr_value) to correctly apply element-wise upper-bound clamping, which is the intended behavior for the min-SNR-gamma loss weighting strategy.