One Token Per Frame: Reconsidering Visual Bandwidth in World Models for VLA Policy
arXiv:2605.07931v2 Announce Type: replace-cross Abstract: Vision-language-action (VLA) models increasingly rely on auxiliary world modules to plan over long horizons, yet…
