ZerosquareLe 13/02/2026 à 08:49

The "Are You Sure?" Problem: Why Your AI Keeps Changing Its MindDr. Randal S. OlsonAsk your AI 'are you sure?' and watch it flip. Models fold 60% of the time because we trained them to please, not push back. The fix isn't better prompts.

Here's why this happens. Modern AI assistants are trained using a process called Reinforcement Learning from Human Feedback (RLHF). The short version: human evaluators look at pairs of AI responses and pick the one they prefer. The model learns to produce responses that get picked more often.

The problem is that humans consistently rate agreeable responses higher than accurate ones. Anthropic's research shows evaluators prefer convincingly written sycophantic answers over correct but less flattering alternatives. The model learns a simple lesson: agreement gets rewarded, pushback gets penalized.