BAGEL: Benchmarking Animal Knowledge Expertise in Language Models
arXiv:2604.16241v1 Announce Type: cross Abstract: Large language models have shown strong performance on broad-domain knowledge and reasoning benchmarks, but it…
arXiv:2604.16241v1 Announce Type: cross Abstract: Large language models have shown strong performance on broad-domain knowledge and reasoning benchmarks, but it…
arXiv:2604.14373v2 Announce Type: replace-cross Abstract: Rural environmental risks are shaped by place-based conditions (e.g., housing quality, road access, land-surface patterns),…
arXiv:2604.16090v1 Announce Type: cross Abstract: Probabilistic Synchronous Parallel (PSP) is a technique in distributed learning systems to reduce synchronization bottlenecks…
arXiv:2604.16104v1 Announce Type: cross Abstract: Lung cancer remains one of the leading causes of cancer-related mortality worldwide. Conventional computed tomography…
arXiv:2604.14967v2 Announce Type: replace-cross Abstract: Retrieval-Augmented Generation (RAG) extends Large Vision-Language Models (LVLMs) with external visual knowledge. However, existing visual…
arXiv:2604.15456v1 Announce Type: new Abstract: Trustworthiness and transparency are essential for the clinical adoption of artificial intelligence (AI) in healthcare…
arXiv:2505.21569v3 Announce Type: replace-cross Abstract: Although LLM-based agents are proven to master tool orchestration in scientific fields, particularly chemistry, their…
arXiv:2604.11641v3 Announce Type: replace-cross Abstract: Code agents are advancing rapidly, but debugging them is becoming increasingly difficult. As frameworks orchestrate…
arXiv:2604.13882v1 Announce Type: cross Abstract: The evaluation of supervised machine learning models is a critical stage in the development of…