OmniVL-Guard: Towards Unified Vision-Language Forgery Detection and Grounding via Balanced RL

ByAdmin

Feb 13, 2026

THE AI TODAY

arXiv:2602.10687v2 Announce Type: replace-cross
Abstract: Existing forgery detection methods are often limited to uni-modal or bi-modal settings, failing to handle the interleaved text, images, and videos prevalent in real-world misinformation. To bridge this gap, this paper targets to develop a unified framework for omnibus vision-language forgery detection and grounding. In this unified setting, the {interplay} between diverse modalities and the dual requirements of simultaneous detection and localization pose a critical “difficulty bias“ problem: the simpler veracity classification task tends to dominate the gradients, leading to suboptimal performance in fine-grained grounding during multi-task optimization. To address this challenge, we propose textbf{OmniVL-Guard}, a balanced reinforcement learning framework for omnibus vision-language forgery detection and grounding. Particularly, OmniVL-Guard comprises two core designs: Self-Evolving CoT Generatio and Adaptive Reward Scaling Policy Optimization (ARSPO). {Self-Evolving CoT Generation} synthesizes high-quality reasoning paths, effectively overcoming the cold-start challenge. Building upon this, {Adaptive Reward Scaling Policy Optimization (ARSPO)} dynamically modulates reward scales and task weights, ensuring a balanced joint optimization. Extensive experiments demonstrate that OmniVL-Guard significantly outperforms state-of-the-art methods and exhibits zero-shot robust generalization across out-of-domain scenarios.

By Admin

AI RESEARCH

OmniVL-Guard: Towards Unified Vision-Language Forgery Detection and Grounding via Balanced RL

ByAdmin

By Admin

Related Post

The most important features in generalized additive models might be groups of features

DEAF: A Benchmark for Diagnostic Evaluation of Acoustic Faithfulness in Audio Language Models

SCALE:Scalable Conditional Atlas-Level Endpoint transport for virtual cell perturbation prediction

You missed

The most important features in generalized additive models might be groups of features

DEAF: A Benchmark for Diagnostic Evaluation of Acoustic Faithfulness in Audio Language Models

TDAD: Test-Driven Agentic Development – Reducing Code Regressions in AI Coding Agents via Graph-Based Impact Analysis

Cognitive Amplification vs Cognitive Delegation in Human-AI Systems: A Metric Framework