CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing

ByAdmin

May 7, 2026

arXiv:2605.02910v2 Announce Type: new
Abstract: Recent advances in large language models have led to strong performance on reasoning and environment-interaction tasks, yet their ability for creative problem-solving remains underexplored. We study this capability through the lens of creative tool use, where a model repurposes available objects by reasoning about their affordances and attributes rather than relying on canonical usage. As a first step, we introduce CreativityBench, a benchmark for evaluating affordance-based creativity in LLMs. To this end, we build a large-scale affordance knowledge base (KB) with 4K entities and 150K+ affordance annotations, explicitly linking objects, parts, attributes, and actionable uses. Building on this KB, we generate 14K grounded tasks that require identifying non-obvious yet physically plausible solutions under constraints. Evaluations across 10 state-of-the-art LLMs, including closed and open-source models, show that models can often select a plausible object, but fail to identify the correct parts, their affordances, and the underlying physical mechanism needed to solve the task, leading to a significant drop in performance. Furthermore, improvements from model scaling quickly saturate, strong general reasoning does not reliably translate to creative affordance discovery, and common inference-time strategies such as Chain-of-Thought yield limited gains. These results suggest that creative tool use remains a major challenge for current models, and that CreativityBench provides a useful testbed for studying this missing dimension of intelligence, with potential implications for planning and reasoning modules in future agents.

CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing

ByAdmin

By Admin

Related Post

OPIUM: Mitigating Steering Externalities and Over-Refusal via Dual Objective Latent Optimization

Scaling Interpretable Transformers with Parity Bottleneck Layers

SalesLoop: Reinforcement Learning from Performance Feedback for Sales Lead Ranking

You missed

AINTMA: Agentic AI Architecture for Autonomous Test Management with Generative Intelligence, Secure Cloud Communication and Adaptive Quality Analytics

Pushing the Frontier of Full-Song Generation: Hierarchical Autoregressive Planning Meets Flow-Matching Rendering

SalesLoop: Reinforcement Learning from Performance Feedback for Sales Lead Ranking

Scaling Interpretable Transformers with Parity Bottleneck Layers