Balancing the Experts: Unlocking LoRA-MoE for GRPO via Mechanism-Aware Rewards

2026年4月24日·
Changlian Ma
,
Zizheng Huang
,
Xiangyu Zeng
,
Yi Wang
,
Cheng Liang
,
Kun Tian
,
Xinhai Zhao
Limin Wang
Limin Wang
· 0 分钟阅读时长
类型
出版物
The Fourteenth International Conference on Learning Representations