Nature Machine Intelligence, Published online: 13 February 2026; doi:10.1038/s42256-025-01176-7
The usage of pretrained protein language models (pLMs) is rapidly growing. However, Szymborski and Emad find that pretrained pLMs can be a source of data leakage in the task of protein–protein interaction inference, showing inflated performance scores.

