ZerosquareLe 13/02/2026 à 08:49
Here's why this happens. Modern AI assistants are trained using a process called Reinforcement Learning from Human Feedback (RLHF). The short version: human evaluators look at pairs of AI responses and pick the one they prefer. The model learns to produce responses that get picked more often.
The problem is that humans consistently rate agreeable responses higher than accurate ones. Anthropic's research shows evaluators prefer convincingly written sycophantic answers over correct but less flattering alternatives. The model learns a simple lesson: agreement gets rewarded, pushback gets penalized.