Post Content Post navigation PR2: Predictive Routing Replay for MoE-Based LLM Reinforcement Learning Consistency Training Can Entrench Misalignment