As Large Language Models (LLMs) are increasingly embedded in real-world decision-making processes, it becomes crucial to examine the extent to which they exhibit cognitive biases in their responses which are systematic distortions commonly observed in human judgment. This platform presents a large-scale evaluation of eight well-established cognitive biases across a diverse set of LLMs, analyzing over 2.8 million of responses generated through controlled prompt variations.
Our evaluation framework is designed to measure model susceptibility to Anchoring, Availability, Confirmation, Framing, Interpretation, Overattribution, Prospect Theory, and Representativeness biases. The analysis investigates how both model size and prompt specificity play a significant role in bias expression. Each model's resistance score is computed from its performance across a curated dataset of psychologist-authored decision scenarios. Higher scores indicate stronger resistance to producing biased output.
By clicking the triangles (►), the table reveals how bias susceptibility changes across prompts with varying levels of detail, as structured by the TELeR Taxonomy . These results provide transparent, empirical insights into model reliability and trustworthiness, highlighting how model choice and thoughtful prompt design can mitigate common reasoning pitfalls. Our paper details the experiments and results.
Cognitive Bias Resistance Off On