SE-Bench: Benchmarking Self-Evolution with Knowledge Internalization
arXiv:2602.04811v1 Announce Type: cross Abstract: True self-evolution requires agents to act as lifelong learners that internalize novel experiences to solve…
arXiv:2602.04811v1 Announce Type: cross Abstract: True self-evolution requires agents to act as lifelong learners that internalize novel experiences to solve…
arXiv:2602.04809v1 Announce Type: cross Abstract: Recent years have seen an explosion of interest in autonomous cyber defence agents trained to…
arXiv:2602.03900v1 Announce Type: new Abstract: Large Language Models (LLM) can struggle with reasoning ability and planning tasks. Many prompting techniques…
arXiv:2602.03048v2 Announce Type: replace-cross Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a key approach for enhancing LLM…
arXiv:2602.03516v2 Announce Type: replace-cross Abstract: Learning from negative samples holds great promise for improving Large Language Model (LLM) reasoning capability,…
arXiv:2602.03257v1 Announce Type: cross Abstract: Finding frequently occurring subgraph patterns or network motifs in neural architectures is crucial for optimizing…
arXiv:2602.02230v2 Announce Type: replace-cross Abstract: Telemetry streams from large-scale Internet-connected systems (e.g., IoT deployments and online platforms) naturally form an…
arXiv:2602.03226v1 Announce Type: cross Abstract: Long-context inputs in large language models (LLMs) often suffer from the “lost in the middle”…
arXiv:2602.02515v1 Announce Type: new Abstract: Leaderboard scores on public benchmarks have been steadily rising and converging, with many frontier language…
arXiv:2602.02408v2 Announce Type: replace-cross Abstract: Model editing aims to correct errors in large, pretrained models without altering unrelated behaviors. While…