DeepER-Med: Advancing Deep Evidence-Based Research in Medicine Through Agentic AI

ByAdmin

Apr 20, 2026

arXiv:2604.15456v1 Announce Type: new
Abstract: Trustworthiness and transparency are essential for the clinical adoption of artificial intelligence (AI) in healthcare and biomedical research. Recent deep research systems aim to accelerate evidence-grounded scientific discovery by integrating AI agents with multi-hop information retrieval, reasoning, and synthesis. However, most existing systems lack explicit and inspectable criteria for evidence appraisal, creating a risk of compounding errors and making it difficult for researchers and clinicians to assess the reliability of their outputs. In parallel, current benchmarking approaches rarely evaluate performance on complex, real-world medical questions. Here, we introduce DeepER-Med, a Deep Evidence-based Research framework for Medicine with an agentic AI system. DeepER-Med frames deep medical research as an explicit and inspectable workflow of evidence-based generation, consisting of three modules: research planning, agentic collaboration, and evidence synthesis. To support realistic evaluation, we also present DeepER-MedQA, an evidence-grounded dataset comprising 100 expert-level research questions derived from authentic medical research scenarios and curated by a multidisciplinary panel of 11 biomedical experts. Expert manual evaluation demonstrates that DeepER-Med consistently outperforms widely used production-grade platforms across multiple criteria, including the generation of novel scientific insights. We further demonstrate the practical utility of DeepER-Med through eight real-world clinical cases. Human clinician assessment indicates that DeepER-Med’s conclusions align with clinical recommendations in seven cases, highlighting its potential for medical research and decision support.

DeepER-Med: Advancing Deep Evidence-Based Research in Medicine Through Agentic AI

ByAdmin

By Admin

Related Post

PatientAgentBench: A Benchmark Framework for Evaluating Patient-Facing Health AI Agents

Matrix-Free Photoacoustic Image Reconstruction via Sensor-Token Self-Attention

Evaluating Communicative Belief Updates in Large Language Models via Implicature Recognition and Cancellation

You missed

DisasterTD: Disaster Toponym Disambiguation Using Multimodal LLMs and Cross-View Geolocalization

CoTinyVLA: Chain-of-Thought Distillation for a Sub-Billion-Parameter Vision-Language-Action Model

Inferring Missing Trajectory Data with Temporal Convolutional Networks

Toward Standardized Cross-Vendor Agent Tool Trust Management in Autonomous Networks