Imagine in Space: Exploring the Frontier of Spatial Intelligence and Reasoning Efficiency in Vision Language Models
arXiv:2511.13782v1 Announce Type: new Abstract: Large language models (LLMs) and vision language models (VLMs), such as DeepSeek R1,OpenAI o3, and…
Batch Acquisition Function Evaluations and Decouple Optimizer Updates for Faster Bayesian Optimization
arXiv:2511.13625v2 Announce Type: replace-cross Abstract: Bayesian optimization (BO) efficiently finds high-performing parameters by maximizing an acquisition function, which models the…
Effective Diversification of Multi-Carousel Book Recommendation
arXiv:2511.14461v1 Announce Type: cross Abstract: Using multiple carousels, lists that wrap around and can be scrolled, is the basis for…
Analyzing the Impact of Participant Failures in Cross-Silo Federated Learning
arXiv:2511.14456v1 Announce Type: cross Abstract: Federated learning (FL) is a new paradigm for training machine learning (ML) models without sharing…
Unlocking the Forgery Detection Potential of Vanilla MLLMs: A Novel Training-Free Pipeline
arXiv:2511.13442v2 Announce Type: replace-cross Abstract: With the rapid advancement of artificial intelligence-generated content (AIGC) technologies, including multimodal large language models…
The Second Law of Intelligence: Controlling Ethical Entropy in Autonomous Systems
arXiv:2511.10704v1 Announce Type: new Abstract: We propose that unconstrained artificial intelligence obeys a Second Law analogous to thermodynamics, where ethical…
Instella: Fully Open Language Models with Stellar Performance
arXiv:2511.10628v2 Announce Type: replace-cross Abstract: Large language models (LLMs) have demonstrated remarkable performance across a wide range of tasks, yet…
Experiences from Benchmarking Vision-Language-Action Models for Robotic Manipulation
arXiv:2511.11298v1 Announce Type: cross Abstract: Foundation models applied in robotics, particularly textbf{Vision–Language–Action (VLA)} models, hold great promise for achieving general-purpose…
Building the Web for Agents: A Declarative Framework for Agent-Web Interaction
arXiv:2511.11287v1 Announce Type: cross Abstract: The increasing deployment of autonomous AI agents on the web is hampered by a fundamental…
Speech-Audio Compositional Attacks on Multimodal LLMs and Their Mitigation with SALMONN-Guard
arXiv:2511.10222v2 Announce Type: replace-cross Abstract: Recent progress in large language models (LLMs) has enabled understanding of both speech and non-speech…