TEON: Tensorized Orthonormalization Beyond Layer-Wise Muon for Large Language Model Pre-Training
arXiv:2601.23261v2 Announce Type: replace-cross Abstract: The Muon optimizer has demonstrated strong empirical performance in pre-training large language models by performing…
