VMonarch: Efficient Video Diffusion Transformers with Structured Attention May 5, 2026· Cheng Liang , Haoxian Chen , Liang Hou , Qi Fan , Gangshan Wu , Xin Tao Limin Wang · 0 min read Cite URL Type Conference paper Publication Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Last updated on May 5, 2026 Authors Limin Wang Nanjing University ← VideoRealBench: A Chain-of-Thought Realism Evaluation Benchmark for Generated Human-Centric Videos May 5, 2026 Will Multimodal Models Be Dazzled by Multi-Image Visual Puzzles? May 5, 2026 →