arXiv:2601.23039v2 Announce Type: replace-cross
Abstract: Differentiable matching layers, often implemented via entropy-regularized Optimal Transport, serve as a critical approximate inference mechanism in structural prediction. However, recovering discrete permutations via annealing $epsilon to 0$ is notoriously unstable. We identify a fundamental mechanism for this failure: textbf{Premature Mode Collapse}. By analyzing the non-normal dynamics of the Sinkhorn fixed-point map, we reveal a theoretical textbf{thermodynamic speed limit}. Under standard exponential cooling, the shift in the target posterior ($O(1)$) outpaces the contraction rate of the inference operator, which degrades as $O(1/epsilon)$. This mismatch inevitably forces the inference trajectory into spurious local basins. To address this, we propose textbf{Efficient PH-ASC}, an adaptive scheduling algorithm that monitors the stability of the inference process. By enforcing a linear stability law, we decouple expensive spectral diagnostics from the training loop, reducing overhead from $O(N^3)$ to amortized $O(1)$. Our implementation and interactive demo are available at https://github.com/xxx0438/torch-sinkhorn-asc and https://huggingface.co/spaces/leon0923/torch-sinkhorn-asc-demo. bounded away from zero in generic training dynamics unless the feature extractor converges unrealistically fast.
THE AI TODAY 