Code Review Agent Benchmark
arXiv:2603.23448v2 Announce Type: replace-cross Abstract: Software engineering agents have shown significant promise in writing code. As AI agents permeate code…
A Step Toward Federated Pretraining of Multimodal Large Language Models
arXiv:2603.26786v1 Announce Type: cross Abstract: The rapid evolution of Multimodal Large Language Models (MLLMs) is bottlenecked by the saturation of…
Kill-Chain Canaries: Stage-Level Tracking of Prompt Injection Across Attack Surfaces and Model Safety Tiers
arXiv:2603.28013v1 Announce Type: cross Abstract: We present a stage-decomposed analysis of prompt injection attacks against five frontier LLM agents. Prior…
CARLA-Air: Fly Drones Inside a CARLA World — A Unified Infrastructure for Air-Ground Embodied Intelligence
arXiv:2603.28032v1 Announce Type: cross Abstract: The convergence of low-altitude economies, embodied intelligence, and air-ground cooperative systems creates growing demand for…
CPUBone: Efficient Vision Backbone Design for Devices with Low Parallelization Capabilities
arXiv:2603.26425v2 Announce Type: replace-cross Abstract: Recent research on vision backbone architectures has predominantly focused on optimizing efficiency for hardware platforms…
SmaAT-QMix-UNet: A Parameter-Efficient Vector-Quantized UNet for Precipitation Nowcasting
arXiv:2603.21879v2 Announce Type: replace-cross Abstract: Weather forecasting supports critical socioeconomic activities and complements environmental protection, yet operational Numerical Weather Prediction…
FUSAR-GPT : A Spatiotemporal Feature-Embedded and Two-Stage Decoupled Visual Language Model for SAR Imagery
arXiv:2602.19190v3 Announce Type: replace-cross Abstract: Research on the intelligent interpretation of all-weather, all-time Synthetic Aperture Radar (SAR) is crucial for…
MLLM-based Textual Explanations for Face Comparison
arXiv:2603.16629v3 Announce Type: replace-cross Abstract: Multimodal Large Language Models (MLLMs) have recently been proposed as a means to generate natural-language…
When Chain-of-Thought Backfires: Evaluating Prompt Sensitivity in Medical Language Models
arXiv:2603.25960v1 Announce Type: cross Abstract: Large Language Models (LLMs) are increasingly deployed in medical settings, yet their sensitivity to prompt…
How Open Must Language Models be to Enable Reliable Scientific Inference?
arXiv:2603.26539v1 Announce Type: cross Abstract: How does the extent to which a model is open or closed impact the scientific…
