Will Multimodal Models Be Dazzled by Multi-Image Visual Puzzles? May 5, 2026· Zhi Zhu , YaoQi Fan , Zhe Chen , Yue Cao , Yangzhou Liu Tong Lu · 0 min read Cite URL Type Conference paper Publication Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Last updated on May 5, 2026 Authors Tong Lu Nanjing University ← VMonarch: Efficient Video Diffusion Transformers with Structured Attention May 5, 2026 Arbitrary Generative Video Interpolation Apr 24, 2026 →