UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions May 5, 2026· Guozhen Zhang , Zixiang Zhou , Teng Hu , Ziqiao Peng , Youliang Zhang , Yi Chen , Yuan Zhou , Qinglin Lu Limin Wang · 0 min read Cite URL Type Conference paper Publication Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Last updated on May 5, 2026 Authors Limin Wang Nanjing University ← TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs May 5, 2026 VideoRealBench: A Chain-of-Thought Realism Evaluation Benchmark for Generated Human-Centric Videos May 5, 2026 →