RoadmapBench: Evaluating Long-Horizon Agentic Software Development Across Version Upgrades
arXiv:2605.15846v2 Announce Type: replace-cross Abstract: Coding agents are increasingly deployed in real software development, where a single version iteration requires…
