DEAF: A Benchmark for Diagnostic Evaluation of Acoustic Faithfulness in Audio Language Models

ByAdmin

Mar 20, 2026

THE AI TODAY

arXiv:2603.18048v1 Announce Type: new
Abstract: Recent Audio Multimodal Large Language Models (Audio MLLMs) demonstrate impressive performance on speech benchmarks, yet it remains unclear whether these models genuinely process acoustic signals or rely on text-based semantic inference. To systematically study this question, we introduce DEAF (Diagnostic Evaluation of Acoustic Faithfulness), a benchmark of over 2,700 conflict stimuli spanning three acoustic dimensions: emotional prosody, background sounds, and speaker identity. Then, we design a controlled multi-level evaluation framework that progressively increases textual influence, ranging from semantic conflicts in the content to misleading prompts and their combination, allowing us to disentangle content-driven bias from prompt-induced sycophancy. We further introduce diagnostic metrics to quantify model reliance on textual cues over acoustic signals. Our evaluation of seven Audio MLLMs reveals a consistent pattern of text dominance: models are sensitive to acoustic variations, yet predictions are predominantly driven by textual inputs, revealing a gap between high performance on standard speech benchmarks and genuine acoustic understanding.

By Admin

AI RESEARCH

Accurate prediction of ecDNA in interphase cancer cells using deep neural networks

Apr 11, 2026 Admin

AI RESEARCH

Using causal machine learning and real world data to improve dose response decision making for secukinumab in psoriatic arthritis

Apr 11, 2026 Admin

AI RESEARCH

LobePrior segments lung lobes on computed tomography images in the presence of severe abnormalities

Apr 10, 2026 Admin

DEAF: A Benchmark for Diagnostic Evaluation of Acoustic Faithfulness in Audio Language Models

ByAdmin

By Admin

Related Post

Accurate prediction of ecDNA in interphase cancer cells using deep neural networks

Using causal machine learning and real world data to improve dose response decision making for secukinumab in psoriatic arthritis

LobePrior segments lung lobes on computed tomography images in the presence of severe abnormalities

Leave a Reply Cancel reply

You missed

Using causal machine learning and real world data to improve dose response decision making for secukinumab in psoriatic arthritis

Accurate prediction of ecDNA in interphase cancer cells using deep neural networks

A lightweight machine learning approach for DDoS detection and classification

LobePrior segments lung lobes on computed tomography images in the presence of severe abnormalities