Behavior

The marshmallow test: how a famous study lost its punchline

The 1972 study that said willpower in preschool predicts success in adulthood became a parenting industry. A 2018 replication with ten times the sample size found something embarrassing for the original.

Dr. Sofia Vásquez
Research Director, Institute for Child Development Studies
4 min read

A child sits at a table. An experimenter places a marshmallow in front of her. You can eat this one now, or you can wait fifteen minutes and have two. The experimenter leaves the room. The video runs.

This is the most famous protocol in developmental psychology, and one of the most popularly cited findings in pop psychology — that the kids who could wait fifteen minutes for two marshmallows went on, decades later, to score better on the SAT, have lower BMI, earn more money, and be generally more successful at the project of being adult humans. The conclusion was that self-control was a measurable trait visible in preschool that predicted life outcomes. It made the cover of The New Yorker. It powered a parenting industry.

It also turns out to be a much weaker finding than the popular version suggests.

1. The original

Walter Mischel and colleagues ran the basic delay-of-gratification paradigm at the Bing Nursery School on Stanford's campus across the 1970s and 1980s. The sample was small (in some versions, under a hundred children) and demographically homogenous — predominantly white, middle-class, and the children of Stanford-affiliated parents. The follow-up correlations between preschool wait-time and adolescent SAT scores, parent-rated competence, and other measures were striking enough that the finding entered the popular imagination by the late 1990s (Mischel, Shoda, & Rodriguez, 1989).

2. The replication

In 2018 Tyler Watts, Greg Duncan, and Haonan Quan published a replication using a vastly larger and demographically broader sample drawn from the NICHD Study of Early Child Care and Youth Development. Sample size: 918 children. The original finding, in its strong form, did not replicate.

In their analysis, the bivariate correlation between preschool delay-of-gratification and later adolescent achievement was modest at best — and once they controlled for socioeconomic status, parental education, home environment, and early cognitive ability, the correlation essentially vanished (Watts, Duncan, & Quan, 2018).

The same paper noted what the original couldn't have known: the kids in Mischel's sample were already cognitively and socioeconomically advantaged. The "self-control trait" the test appeared to be measuring was substantially confounded with the trait of having a stable home where adults reliably follow through on promises. Children from chaotic environments rationally eat the marshmallow now.

3. What the data actually supports

What survives:

Self-control is real. People differ in their ability to delay gratification, and that ability matters for outcomes.

A fifteen-minute test at age four is not how you measure it. The behavior captured by the original paradigm is largely a function of trust in the experimenter, hunger, fatigue, and home stability. Removing those, the residual trait variance is small.

Childhood self-control develops, doesn't lock. Longitudinal studies show substantial change in self-control from age four through age twenty, contingent on intervention, environment, and stress (Duckworth & Kern, 2011).

4. The methodological lesson

The original marshmallow test wasn't fraudulent. It was a finding from a small, special sample, replicated through repetition rather than larger samples. When the larger sample came, the strong claim shrank. The lesson — uncomfortable for the field and for pop psychology readers — is that single-sample findings should be cited with explicit acknowledgment of how robustly they replicate.

The pop-psychology version of the test became a fable about character. The actual data tells a duller, more equity-conscious story: kids who grow up in environments where adults keep their promises are more likely to wait. Whether that's a moral about kids or about adults is up to the reader.

References
  1. Duckworth, A. L., & Kern, M. L. (2011). A meta-analysis of the convergent validity of self-control measures. Journal of Research in Personality, 45(3), 259-268.
  2. Mischel, W., Shoda, Y., & Rodriguez, M. L. (1989). Delay of gratification in children. Science, 244(4907), 933-938.
  3. Watts, T. W., Duncan, G. J., & Quan, H. (2018). Revisiting the marshmallow test: A conceptual replication investigating links between early delay of gratification and later outcomes. Psychological Science, 29(7), 1159-1177.