s2n-bignum-bench: A practical benchmark for evaluating low-level code reasoning of LLMs
arXiv:2603.14628v1 Announce Type: cross Abstract: Neurosymbolic approaches leveraging Large Language Models (LLMs) with formal methods have recently achieved strong results…
