arXiv:2512.19097v2 Announce Type: replace-cross
Abstract: Unifying the vast heterogeneity of brain signals into a single foundation model is a longstanding challenge in neuroscience. Yet, even as large-scale pretraining becomes feasible, the field lacks principled guidance on how to scale electrophysiological foundation models under realistic data and compute constraints. We present the first systematic scaling law analysis spanning both EEG and iEEG, and uncover a distinct data-constrained characteristic. Unlike language modeling, performance in electrophysiology is dominated first by data scale, followed by training duration (epochs), with model parameter count playing a subordinate role under fixed compute budgets. This challenges the prevailing “bigger is better” heuristic derived from large language models. Building on these insights, we introduce DIVER-1, a family of models trained on the largest and most diverse corpus to date: 59.3k hours (54k EEG and 5.3k iEEG) across 1.6 million channel-hours from more than 17.7k subjects, scaling up to 1.82 billion parameters. By prioritizing data diversity and training horizons over mere parameter expansion, DIVER-1 achieves state-of-the-art performance across established benchmarks. Our work provides both a powerful generalist model and actionable guidelines for efficient development of future neuro-AI systems.
THE AI TODAY 