HumanMCP: A Human-Like Query Dataset for Evaluating MCP Tool Retrieval Performance
arXiv:2602.23367v1 Announce Type: new Abstract: Model Context Protocol (MCP) servers contain a collection of thousands of open-source standardized tools, linking…
arXiv:2602.23367v1 Announce Type: new Abstract: Model Context Protocol (MCP) servers contain a collection of thousands of open-source standardized tools, linking…
arXiv:2602.23061v2 Announce Type: replace-cross Abstract: Semi-structured documents integrate diverse interleaved data elements (e.g., tables, charts, hierarchical paragraphs) arranged in various…
arXiv:2602.23296v2 Announce Type: replace-cross Abstract: Federated learning (FL) faces challenges in uncertainty quantification (UQ). Without reliable UQ, FL systems risk…
arXiv:2602.22624v1 Announce Type: cross Abstract: Editing images via instruction provides a natural way to generate interactive content, but it is…
arXiv:2602.22661v1 Announce Type: cross Abstract: Although diffusion language models (DLMs) are evolving quickly, many recent models converge on a set…
arXiv:2602.21189v2 Announce Type: replace-cross Abstract: Pass@k is a widely used performance metric for verifiable large language model tasks, including mathematical…
arXiv:2602.22215v1 Announce Type: new Abstract: Large Language Models (LLMs) demonstrate potential in the field of scientific idea generation. However, the…
arXiv:2602.21670v2 Announce Type: replace-cross Abstract: Multi-robot task planning requires decomposing natural-language instructions into executable actions for heterogeneous robot teams. Conventional…