UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions

2026年5月5日·

Guozhen Zhang

,

Zixiang Zhou

,

Teng Hu

,

Ziqiao Peng

,

Youliang Zhang

,

Yi Chen

,

Yuan Zhou

,

Qinglin Lu

Limin Wang

Limin Wang

· 0 分钟阅读时长

引用 URL

类型

出版物

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

最近更新于 2026年5月5日

Limin Wang

Authors

← TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs 2026年5月5日

VideoRealBench: A Chain-of-Thought Realism Evaluation Benchmark for Generated Human-Centric Videos 2026年5月5日 →