AnchorKV: Safety-Aware KV Cache Compression via Soft Penalty with a Refusal Anchor

ByAdmin

Jun 17, 2026

THE AI TODAY

arXiv:2606.17872v1 Announce Type: cross
Abstract: Large language models (LLMs) outperform earlier architectures on generative inference and long-context tasks, but their large size introduces significant challenges in memory usage, energy cost, and on-device deployment. Since scaling pre-trained language models improves downstream capability cite{zhao2023survey}, the key-value (KV) cache becomes a dominant inference bottleneck. Recent KV cache compression methods cite{jo2025fastkv,li2024snapkv,zhou2024dynamickv} reduce this cost by retaining only a subset of attention-relevant tokens. However, while these approaches preserve accuracy on benign workloads, their compression policies either fail to defend against jailbreak attacks cite{jiang2024robustkv} or degrade safety alignment under aggressive eviction.
We propose AnchorKV, a drop-in modification to KV cache compression that biases token retention scores away from directions in key space associated with harmful prompts. AnchorKV constructs an offline safety anchor by adapting a difference-of-means representation engineering approach cite{arditi2024refusal,zou2023representation} to the layer-specific key projection space used in KV caching. Based on this anchor, a soft penalty token selection rule trades a small amount of utility for substantially improved safety alignment, while reducing to the original compressor when the penalty is zero.

By Admin

AI RESEARCH

AnchorKV: Safety-Aware KV Cache Compression via Soft Penalty with a Refusal Anchor

ByAdmin

By Admin

Related Post

OPIUM: Mitigating Steering Externalities and Over-Refusal via Dual Objective Latent Optimization

Scaling Interpretable Transformers with Parity Bottleneck Layers

SalesLoop: Reinforcement Learning from Performance Feedback for Sales Lead Ranking

You missed

AINTMA: Agentic AI Architecture for Autonomous Test Management with Generative Intelligence, Secure Cloud Communication and Adaptive Quality Analytics

Pushing the Frontier of Full-Song Generation: Hierarchical Autoregressive Planning Meets Flow-Matching Rendering

SalesLoop: Reinforcement Learning from Performance Feedback for Sales Lead Ranking

Scaling Interpretable Transformers with Parity Bottleneck Layers