Vision-OPD: Learning to See Fine Details for Multimodal LLMs via On-Policy Self-Distillation
arXiv:2605.18740v2 Announce Type: replace-cross Abstract: Multimodal Large Language Models (MLLMs) still struggle with fine-grained visual understanding, where answers often depend…
