Inference Energy and Latency in AI-Mediated Education: A Learning-per-Watt Analysis of Edge and Cloud Models

ByAdmin

Mar 24, 2026

THE AI TODAY

arXiv:2603.20223v1 Announce Type: cross
Abstract: Immediate feedback is a foundational requirement of effective AI-mediated learning, yet the energy and latency costs of delivering it remain largely unexamined. This study investigates the latency-energy-learning trade-off in AI tutoring through an empirical comparison of two on-device inference configurations of Microsoft Phi-3 Mini (4k-instruct) on an NVIDIA T4 GPU: full-precision FP16 and 4-bit NormalFloat (NF4) quantisation. Both were evaluated under KV-cache-enabled inference across 500 educational prompts spanning five secondary school subject domains. Pedagogical quality was assessed for each of the 1000 generated responses by a hybrid panel of 10 Cambridge International teachers and three frontier AI systems using a four-dimension rubric. We introduce Learning-per-Watt (LpW), a novel metric quantifying pedagogical value per unit of energy over the learner’s waiting window. Under realistic deployment, NF4 achieves lower per-inference energy than FP16 (329 J vs. 369 J) but higher latency (13.4 s vs. 9.2 s), yielding a modest FP16 advantage in LpW of 1.33x at a quality difference of 0.19 points. Under cache-disabled inference — used in offline evaluation but absent from real deployments — the gap widens to 7.4x, overstating the FP16 advantage by more than fivefold. Quantisation efficiency is hardware-dependent and inference-regime dependent, with significant implications for equitable AI tutoring deployment in low-resource settings.

By Admin

AI RESEARCH

Inference Energy and Latency in AI-Mediated Education: A Learning-per-Watt Analysis of Edge and Cloud Models

ByAdmin

By Admin

Related Post

Small-molecule binding and sensing with a designed protein family

Machine learning-predicted chromatin organization landscape across pediatric tumors

CAPTCHA Solving for Native GUI Agents: Automated Reasoning-Action Data Generation and Self-Corrective Training

Leave a Reply Cancel reply

You missed

Machine learning-predicted chromatin organization landscape across pediatric tumors

Small-molecule binding and sensing with a designed protein family

Powerful Teachers Matter: Text-Guided Multi-view Knowledge Distillation with Visual Prior Enhancement

Synthetic Mixed Training: Scaling Parametric Knowledge Acquisition Beyond RAG