DuPO: Enabling Reliable Self-Verification via Dual Preference Optimization

2026年4月24日·
Shuaijie She
,
Yu Bao
,
Yu Lu
,
Lu Xu
,
Tao Li
,
Wenhao Zhu
,
Jianbing Zhang
Shujian Huang
Shujian Huang
,
Shanbo Cheng
,
Lu Lu
,
Yuxuan Wang
· 0 分钟阅读时长
类型
出版物
The Fourteenth International Conference on Learning Representations