MASEval: Extending Multi-Agent Evaluation from Models to Systems
arXiv:2603.08835v1 Announce Type: new Abstract: The rapid adoption of LLM-based agentic systems has produced a rich ecosystem of frameworks (smolagents,…
PostTrainBench: Can LLM Agents Automate LLM Post-Training?
arXiv:2603.08640v2 Announce Type: replace-cross Abstract: AI agents have become surprisingly proficient at software engineering over the past year, largely due…
Common Sense vs. Morality: The Curious Case of Narrative Focus Bias in LLMs
arXiv:2603.09434v1 Announce Type: cross Abstract: Large Language Models (LLMs) are increasingly deployed across diverse real-world applications and user communities. As…
CERES: A Probabilistic Early Warning System for Acute Food Insecurity
arXiv:2603.09425v1 Announce Type: cross Abstract: We present CERES (Calibrated Early-warning and Risk Estimation System), an automated probabilistic forecasting system for…
IMSE: Intrinsic Mixture of Spectral Experts Fine-tuning for Test-Time Adaptation
arXiv:2603.07926v2 Announce Type: replace-cross Abstract: Test-time adaptation (TTA) has been widely explored to prevent performance degradation when test data differ…
Autonomous AI Agents for Option Hedging: Enhancing Financial Stability through Shortfall Aware Reinforcement Learning
arXiv:2603.06587v1 Announce Type: new Abstract: The deployment of autonomous AI agents in derivatives markets has widened a practical gap between…
Bridging Domains through Subspace-Aware Model Merging
arXiv:2603.05768v2 Announce Type: replace-cross Abstract: Model merging integrates multiple task-specific models into a single consolidated one. Recent research has made…
DSH-Bench: A Difficulty- and Scenario-Aware Benchmark with Hierarchical Subject Taxonomy for Subject-Driven Text-to-Image Generation
arXiv:2603.08090v1 Announce Type: cross Abstract: Significant progress has been achieved in subject-driven text-to-image (T2I) generation, which aims to synthesize new…
ImageEdit-R1: Boosting Multi-Agent Image Editing via Reinforcement Learning
arXiv:2603.08059v1 Announce Type: cross Abstract: With the rapid advancement of commercial multi-modal models, image editing has garnered significant attention due…
SAIL: Similarity-Aware Guidance and Inter-Caption Augmentation-based Learning for Weakly-Supervised Dense Video Captioning
arXiv:2603.05437v2 Announce Type: replace-cross Abstract: Weakly-Supervised Dense Video Captioning aims to localize and describe events in videos trained only on…
