GRPO-VPS: Enhancing Group Relative Policy Optimization with Verifiable Process Supervision for Effective Reasoning
arXiv:2604.20659v1 Announce Type: cross Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has advanced the reasoning capabilities of Large Language Models…
The Tool-Overuse Illusion: Why Does LLM Prefer External Tools over Internal Knowledge?
arXiv:2604.19749v1 Announce Type: new Abstract: Equipping LLMs with external tools effectively addresses internal reasoning limitations. However, it introduces a critical…
RoLegalGEC: Legal Domain Grammatical Error Detection and Correction Dataset for Romanian
arXiv:2604.19593v2 Announce Type: replace-cross Abstract: The importance of clear and correct text in legal documents cannot be understated, and, consequently,…
MOMO: A framework for seamless physical, verbal, and graphical robot skill learning and adaptation
arXiv:2604.20468v1 Announce Type: cross Abstract: Industrial robot applications require increasingly flexible systems that non-expert users can easily adapt for varying…
VTouch++: A Multimodal Dataset with Vision-Based Tactile Enhancement for Bimanual Manipulation
arXiv:2604.20444v1 Announce Type: cross Abstract: Embodied intelligence has advanced rapidly in recent years; however, bimanual manipulation-especially in contact-rich tasks remains…
Cyber Defense Benchmark: Agentic Threat Hunting Evaluation for LLMs in SecOps
arXiv:2604.19533v2 Announce Type: replace-cross Abstract: We introduce the Cyber Defense Benchmark, a benchmark for measuring how well large language model…
Speculative End-Turn Detector for Efficient Speech Chatbot Assistant
arXiv:2503.23439v2 Announce Type: replace-cross Abstract: Spoken dialogue systems powered by large language models have demonstrated remarkable abilities in understanding human…
Temp-R1: A Unified Autonomous Agent for Complex Temporal KGQA via Reverse Curriculum Reinforcement Learning
arXiv:2601.18296v2 Announce Type: replace-cross Abstract: Temporal Knowledge Graph Question Answering (TKGQA) is inherently challenging, as it requires sophisticated reasoning over…
More Than Sum of Its Parts: Deciphering Intent Shifts in Multimodal Hate Speech Detection
arXiv:2603.21298v3 Announce Type: replace-cross Abstract: Combating hate speech on social media is critical for securing cyberspace, yet relies heavily on…
FaithLens: Detecting and Explaining Faithfulness Hallucination
arXiv:2512.20182v4 Announce Type: replace-cross Abstract: Recognizing whether outputs from large language models (LLMs) contain faithfulness hallucination is crucial for real-world…
