Routing-Free Mixture-of-Experts
arXiv:2604.00801v1 Announce Type: cross Abstract: Standard Mixture-of-Experts (MoE) models rely on centralized routing mechanisms that introduce rigid inductive biases. We…
arXiv:2604.00801v1 Announce Type: cross Abstract: Standard Mixture-of-Experts (MoE) models rely on centralized routing mechanisms that introduce rigid inductive biases. We…
arXiv:2604.00785v1 Announce Type: cross Abstract: Pretraining Large Language Models (LLMs) from scratch requires massive amount of compute. Aurora super computer…
arXiv:2603.28902v1 Announce Type: new Abstract: Charts are central to analytical reasoning, yet existing benchmarks for chart understanding focus almost exclusively…
arXiv:2603.28707v2 Announce Type: replace-cross Abstract: We present a physics-based neural network framework for the discovery of constitutive models in fully…
arXiv:2505.18602v3 Announce Type: replace-cross Abstract: Large language models (LLMs) have revolutionized algorithm development, yet their application in symbolic regression, where…
arXiv:2512.16081v2 Announce Type: replace-cross Abstract: Social interactions incorporate nonverbal signals to convey emotions alongside speech, including facial expressions and body…
arXiv:2603.14841v2 Announce Type: replace-cross Abstract: Road crashes remain a leading cause of preventable fatalities. Existing prediction models predominantly produce binary…
arXiv:2510.23364v2 Announce Type: replace-cross Abstract: Flood hazard mapping is essential for disaster prevention but remains challenging in data-scarce regions, where…
arXiv:2603.29661v1 Announce Type: cross Abstract: Existing narrative extraction methods face a trade-off between coherence, interactivity, and multi-storyline support. Narrative Maps…
arXiv:2603.29654v1 Announce Type: cross Abstract: Aligning human-interpretable concepts with the internal representations learned by modern machine learning systems remains a…