DuPO: Enabling Reliable Self-Verification via Dual Preference Optimization

Apr 24, 2026·
Shuaijie She
,
Yu Bao
,
Yu Lu
,
Lu Xu
,
Tao Li
,
Wenhao Zhu
,
Jianbing Zhang
Shujian Huang
Shujian Huang
,
Shanbo Cheng
,
Lu Lu
,
Yuxuan Wang
· 0 min read
Type
Publication
The Fourteenth International Conference on Learning Representations