Scheduling Your LLM Reinforcement Learning with Reasoning Trees
arXiv:2510.24832v1 Announce Type: new Abstract: Using Reinforcement Learning with Verifiable Rewards (RLVR) to optimize Large Language Models (LLMs) can be…
arXiv:2510.24832v1 Announce Type: new Abstract: Using Reinforcement Learning with Verifiable Rewards (RLVR) to optimize Large Language Models (LLMs) can be…
arXiv:2510.23691v1 Announce Type: new Abstract: We present Game-TARS, a generalist game agent trained with a unified, scalable action space anchored…
arXiv:2510.23409v2 Announce Type: replace-cross Abstract: Data valuation has become central in the era of data-centric AI. It drives efficient training…
arXiv:2510.24217v1 Announce Type: cross Abstract: As more Intensive Care Unit (ICU) data becomes available, the interest in developing clinical prediction…
arXiv:2510.24235v1 Announce Type: cross Abstract: Reward models (RMs) are central to reinforcement learning from human feedback (RLHF), providing the critical…
arXiv:2510.23458v2 Announce Type: replace-cross Abstract: Confidence in LLMs is a useful indicator of model uncertainty and answer reliability. Existing work…
arXiv:2510.20819v2 Announce Type: replace-cross Abstract: Recent advances in generative modeling have positioned diffusion models as state-of-the-art tools for sampling from…
arXiv:2510.23028v1 Announce Type: cross Abstract: AutoRegressive (AR) models have demonstrated competitive performance in image generation, achieving results comparable to those…
arXiv:2510.23034v1 Announce Type: cross Abstract: Binarized Neural Networks (BNNs) are a class of deep neural networks designed to utilize minimal…
arXiv:2510.21280v2 Announce Type: replace-cross Abstract: While recent sound event detection (SED) systems can identify baleen whale calls in marine audio,…