In VideoDiffusionLoss.__call__, the weight tensor `w` is a
multi-dimensional tensor produced by append_dims(). Using Python's
built-in min(w, self.min_snr_value) on a multi-dimensional tensor
and a scalar does not perform element-wise clamping — it either
raises an error or produces incorrect results depending on the
tensor shape.
Replace with torch.clamp(w, max=self.min_snr_value) to correctly
apply element-wise upper-bound clamping, which is the intended
behavior for the min-SNR-gamma loss weighting strategy.