Will Multimodal Models Be Dazzled by Multi-Image Visual Puzzles?

May 5, 2026·

Zhi Zhu

,

YaoQi Fan

,

Zhe Chen

,

Yue Cao

,

Yangzhou Liu

Tong Lu

Tong Lu

· 0 min read

Cite URL

Type

Conference paper

Publication

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Last updated on May 5, 2026

Tong Lu

Authors

Nanjing University

← VMonarch: Efficient Video Diffusion Transformers with Structured Attention May 5, 2026

Arbitrary Generative Video Interpolation Apr 24, 2026 →