Vision-Core Guided Contrastive Learning for Balanced Multi-modal Prognosis Prediction of Stroke

ByAdmin

May 15, 2026

THE AI TODAY

arXiv:2605.14710v1 Announce Type: cross
Abstract: Deep learning and multi-modal fusion have demonstrated transformative potential in medical diagnosis by integrating diverse data sources. However, accurate prognosis for ischemic stroke remains challenging due to limitations in existing multi-modal approaches. First, current methods are predominantly confined to dual-modal fusion, lacking a framework that effectively integrates the trifecta of medical images, structured clinical data, and unstructured text. Second, they often fail to establish deep bidirectional interactions between modalities; To address these critical gaps, this paper proposes a novel tri-modal fusion model for ischemic stroke prognosis. Our approach first enriches the data representation by employing a Large Language Model (LLM) to automatically generate semi-structured diagnostic text from brain MRIs. This process not only addresses the scarcity of expert annotations but also serves as a regularized semantic enhancement, improving multimodal fusion robustness. Furthermore, we design a core component termed the Vision-Conditioned Dual Alignment Fusion Module (VDAFM), which strategically uses visual features as a conditional prior to guide fine-grained interaction with the generated text. This module achieves a dynamic and profound fusion through a dual semantic alignment loss, effectively mitigating modal heterogeneity. Extensive experiments on a real-world clinical dataset demonstrate that our model achieves state-of-the-art performance.

By Admin

AI RESEARCH

Vision-Core Guided Contrastive Learning for Balanced Multi-modal Prognosis Prediction of Stroke

ByAdmin

By Admin

Related Post

ENSEMBITS: an alphabet of protein conformational ensembles

GraphBit: A Graph-based Agentic Framework for Non-Linear Agent Orchestration

Query-Conditioned Test-Time Self-Training for Large Language Models

Leave a Reply Cancel reply

You missed

GraphBit: A Graph-based Agentic Framework for Non-Linear Agent Orchestration

ENSEMBITS: an alphabet of protein conformational ensembles

IntentVLA: Short-Horizon Intent Modeling for Aliased Robot Manipulation

Vision-Core Guided Contrastive Learning for Balanced Multi-modal Prognosis Prediction of Stroke