$boldsymbol{f}$-OPD: Stabilizing Long-Horizon On-Policy Distillation with Freshness-Aware Control
arXiv:2605.17862v1 Announce Type: cross Abstract: Scaling on-policy distillation (OPD) for large language models (LLMs) confronts a fundamental tension: asynchronous execution…
