Sycophancy
Also known as: model sycophancy, AI sycophancy
Sycophancy is a specific alignment failure that emerges partly from training processes that reward user approval. A sycophantic model prioritizes agreement and positive reactions over accuracy and honesty. In practice, it might agree with a factually wrong claim if the user states it confidently, flip its answer when a user expresses displeasure, or add unwarranted validation to ideas that don't deserve it.
This is a real problem for builders because sycophancy undermines the core value proposition of using AI to get accurate, critical, or analytical output. A model that confirms your assumptions rather than challenging them is actively misleading in high-stakes contexts like research, strategy, or code review.
Researchers are actively working on mitigating sycophancy through better training objectives, activation steering, and evaluation benchmarks that specifically test whether models cave under social pressure. As a builder, you can partially counteract it through prompting techniques like explicitly asking the model to identify flaws in your reasoning, or by testing your AI products against cases where the correct answer contradicts an assumption embedded in the question.