Summary

Context

  • To my knowledge, this is the largest meta-analysis of AI educational studies to date

Methods

  • Performed a meta-analysis of studies on AI use in various educational settings
    • Meta-analysis = statistical analysis of individual study results
  • Identified 6,621 potential studies to include
  • Filtered down to include only 51 studies, published between November 2022 and May 2025
  • For a study to be included:
    • must have been a randomized controlled trial (not an observation study)
      • i.e., researchers had to explicitly organize people in groups to use vs. not use AI (as opposed to giving everyone access and seeing who does vs. does not use it)
    • ChatGPT or a similar tool must have been used
      • not necessarily ChatGPT exactly, the paper’s title is misleading
    • the treatment group must have used ChatGPT (or similar) while the control group must have used older, more typical technologies (e.g. calculators or videos okay, but not other AI tools)
  • Outcomes were categorized into one of these areas
    • learning performance
    • learning perception
    • higher-order thinking
  • Outcomes were broken out by one of these variables:
    • grade level
    • course type
    • learning model
    • duration
    • role of ChatGPT
    • area of ChatGPT application
  • In total, 72 effects from 51 studies were identified (some studies studied more than one)

Results

  • AI has a large positive effect on improving learning performance, with the largest effects seen in:
    • skills & competency development courses (followed by STEM)
    • problem solving use cases
    • durations of between 4-8 weeks
  • AI has a moderately positive effect on enhancing learning perception
    • longer durations improved learning perception the most (the other variables had no effects)
  • AI has a moderately positive effect on fostering higher-order thinking
    • longer durations improved higher-order thinking the most (the other variables had no effects)

Critiques

  • The researchers use the term “ChatGPT” throughout the paper, but it appears they are actually referencing any use of generative AI, gpt-3.5, gpt-4, gpt-4o, etc. use
  • ChatGPT and the models have improved significantly since November 2022. The researchers make no reference to the actual AI model throughout the analysis (it should be one of the variables to break out the results)
  • No mention is made of how the AI was designed or implemented in the individual studies (e.g. Socratic tutor with quality prompting vs. just using vanilla gpt-3.5 as a tutor)
    • the individual studies probably have more details, but the meta-analysis did not consider these

Takeaway

This is super cool to see. The phrases large positive effect and moderately positive effect have exact meanings, and we rarely see such strong educational outcomes, especially across a meta-analysis of this size. For comparison, my paper co-authored with researchers from Los Angeles Pacific University across 2,090 samples saw “small-to-medium” effects, and those were still strong results. Frankly, this is as good of evidence as we can get that AI use is helping, not harming educational outcomes. There are of course some caveats, such as if AI use over longer time spans still positively influences outcomes. The meta-analysis references one study in passing showing decreased performance across greater than 8 weeks of use. It is still unclear to me whether or not AI use over the long term is good or bad (spoiler alert: it probably depends).

In my personal exploration to figure out if our work is actually having a positive effect on humanity, this is a huge green check mark in my book ✅ But, I’m still a bit concerned that we’re measuring the wrong things. Did sociology researchers think MySpace and Facebook were helpful for human social flourishing in 2008? Would they still think it is today? I don’t know the answers to these questions, but I am cautiously optimistic for longer term research findings.