DuPO: Enabling Reliable Self-Verification via Dual Preference Optimization Apr 24, 2026· Shuaijie She , Yu Bao , Yu Lu , Lu Xu , Tao Li , Wenhao Zhu , Jianbing Zhang Shujian Huang , Shanbo Cheng , Lu Lu , Yuxuan Wang · 0 min read Cite URL Type Conference paper Publication The Fourteenth International Conference on Learning Representations Last updated on Apr 24, 2026 ← CaReBench: A Fine-grained Benchmark for Video Captioning and Retrieval Apr 24, 2026 Long-Context Attention Benchmark: From Kernel Efficiency to Distributed Context Parallelism Apr 24, 2026 →