Reward Learning through Ranking Mean Squared Error
arXiv:2601.09236v2 Announce Type: replace-cross Abstract: Reward design remains a significant bottleneck in applying reinforcement learning (RL) to real-world problems. A…
For An Exciting Tomorrow
arXiv:2601.09236v2 Announce Type: replace-cross Abstract: Reward design remains a significant bottleneck in applying reinforcement learning (RL) to real-world problems. A…
arXiv:2601.10222v1 Announce Type: cross Abstract: Optimization is central to both modern machine learning (ML) and scientific machine learning (SciML), yet…
arXiv:2601.10236v1 Announce Type: cross Abstract: AI writing assistants can reduce effort and improve fluency, but they may also weaken writers’…
arXiv:2601.09478v2 Announce Type: replace-cross Abstract: Semantic understanding of popularity bias is a crucial yet underexplored challenge in recommender systems, where…
arXiv:2601.09765v1 Announce Type: new Abstract: Since the release of ChatGPT, there has been a lot of debate about whether AI…
arXiv:2601.07182v2 Announce Type: replace-cross Abstract: Policy optimization for large language models often suffers from sparse reward signals in multi-step reasoning…
arXiv:2601.07582v2 Announce Type: replace-cross Abstract: Memory is critical for dialogue agents to maintain coherence and enable continuous adaptation in long-term…
arXiv:2601.07866v1 Announce Type: new Abstract: While machine learning shows promise for maternal health risk prediction, clinical adoption in resource-constrained settings…
arXiv:2601.07853v1 Announce Type: cross Abstract: Financial agents powered by large language models (LLMs) are increasingly deployed for investment analysis, risk…
arXiv:2601.08461v1 Announce Type: cross Abstract: We provide a formal analytic proof for a class of non-canonical polynomial continued fractions representing…