Is Human Annotation Necessary? Iterative MBR Distillation for Error Span Detection in Machine Translation

ByAdmin

Mar 17, 2026

THE AI TODAY

arXiv:2603.12983v2 Announce Type: replace-cross
Abstract: Error Span Detection (ESD) is a crucial subtask in Machine Translation (MT) evaluation, aiming to identify the location and severity of translation errors. While fine-tuning models on human-annotated data improves ESD performance, acquiring such data is expensive and prone to inconsistencies among annotators. To address this, we propose a novel self-evolution framework based on Minimum Bayes Risk (MBR) decoding, named Iterative MBR Distillation for ESD, which eliminates the reliance on human annotations by leveraging an off-the-shelf LLM to generate pseudo-labels. Extensive experiments on the WMT Metrics Shared Task datasets demonstrate that models trained solely on these self-generated pseudo-labels outperform both unadapted base model and supervised baselines trained on human annotations at the system and span levels, while maintaining competitive sentence-level performance.

By Admin

AI RESEARCH

CodeTracer: Towards Traceable Agent States

Apr 17, 2026 Admin

AI RESEARCH

Evaluating Supervised Machine Learning Models: Principles, Pitfalls, and Metric Selection

Apr 17, 2026 Admin

AI RESEARCH

Beyond Conservative Automated Driving in Multi-Agent Scenarios via Coupled Model Predictive Control and Deep Reinforcement Learning

Apr 17, 2026 Admin

Is Human Annotation Necessary? Iterative MBR Distillation for Error Span Detection in Machine Translation

ByAdmin

By Admin

Related Post

CodeTracer: Towards Traceable Agent States

Evaluating Supervised Machine Learning Models: Principles, Pitfalls, and Metric Selection

Beyond Conservative Automated Driving in Multi-Agent Scenarios via Coupled Model Predictive Control and Deep Reinforcement Learning

You missed

Rhetorical Questions in LLM Representations: A Linear Probing Study

ASTER: Latent Pseudo-Anomaly Generation for Unsupervised Time-Series Anomaly Detection

Three Roles, One Model: Role Orchestration at Inference Time to Close the Performance Gap Between Small and Large Agents

Variance Computation for Weighted Model Counting with Knowledge Compilation Approach