Saturday, February 13, 2010

When random assignment yields implausible estimates

Consider the following educational treatment, as described on the Department of Education's What Works Clearinghouse:
Seventh graders were placed at random into intervention and comparison groups near the start of the school year.

Both groups were given structured writing assignments three to five times during their seventh- and eighth-grade years.

The intervention group wrote about their personal values (e.g., relationships with friends and family, religious values) and why these were important to them.

The comparison group wrote about neutral subjects, such as their daily routine, or why values they considered unimportant might be important to others.

The result? The WWC summarizes the findings:

Among African-American students, completing writing exercises about their values increased their average seventh- and eighth-grade GPA by a quarter of a letter grade (0.24 points), a change that was statistically significant. The intervention did not have a statistically significant effect on the academic outcomes of European-American students.

Among low-achieving African-American students, the effect was somewhat larger, an increase in average seventh- and eighth-grade GPA of 0.41 points. In addition, the intervention reduced the likelihood that low-achieving African-American students were assigned to a remedial program or were retained in grade.

Is this plausible? Both 0.24 and 0.41 GPA points are large impacts, particularly given that the support of the GPA distribution at many schools is largely confined to 2.0 to 4.0 or even 2.5 to 4.0.

What could go wrong? Like most educational interventions, this one is not double-blind. It may not be single-blind either, if the teachers know which students ended up in the treatment and control groups. On the other hand, given that all students wrote essays it may not have been clear to the students what the "treatment" was in some sense.

One might also worry that one of the alternative essay topics, namely "why values they considered unimportant might be important to others" is not really "neutral" and instead might have its own treatment effect. In this regard, it would be nice to have a no-essay control group along with the "other essay topic" second treatment arm. A no-essay control group has its own issues, of course, because it makes the nature of the treatment clearer to the students, but it would shed at least some light on concerns that one of the alternative essay topics may reduce GPAs rather than the other increasing them or some combination of the two.

Also, one would like a compelling theory of why such a modest treatment should have such a gigantic effect.

Interesting, and puzzling, and a good illustration of why one study is rarely definitive and, not unrelated, why good science involves both exact replications and small perturbations of provocative results.

I should have gotten one of my ECON 490 students to do their literature survey on such treatments.

