Wednesday, June 15, 2011


Genetic Matching for Estimating Causal Effects:
A General Multivariate Matching Method for Achieving Balance in
Observational Studies
Alexis Diamond
ŽJasjeet S. Sekhon

This paper presents Genetic Matching, a method of multivariate matching, that uses an evolutionary search algorithm to determine the weight each covariate is given. Both propensity score matching and matching based on Mahalanobis distance are limiting cases of this method. The algorithm makes transparent certain issues that all matching methods must confront. We present simulation studies that show that the algorithm improves covariate balance, and that it may reduce conditional bias if the selection on observables assumption holds. We then present a reanalysis of a number of datasets in the LaLonde (1986) controversy.

JEL classification: C13, C14, H31

Keywords: Matching, Propensity Score, SeKeywords: Matching, Propensity Score, Selection on Observables, Genetic Optimization, Causal Inference
I happened to read this paper (for the second time) a couple of days ago. It introduces for an economist audience (the authors are political scientists) a new algorithm for the construction of estimates of causal effects based on an assumption of "selection on observed variables". Other disciplines sometimes call this assumption unconfoundedness or ignorability (and economists sometimes refer to the "conditional independence assumption"). What all this jargon means is that the researcher thinks that conditional on covariates available in the data, individuals are assigned to treatment, whether by nature, by institutions, or by their own choices, or some combination of these, in a way that is unrelated to their untreated outcomes. Essentially, one makes the case for an assumption random assignment conditional on observed characteristics, where in good papers that case consists of more than "this is all I could do" or "look at how many different variables I matched on mom!".

To see what makes the method outlined in this paper (and in more technical detail in other papers available on Jas Sekhon's webpage, including some co-authored with my UM political science colleague Walter Mebane) it helps to think about what a randomized experiment does. Statistically, random assignment of individuals into treatment and control groups balances the distributions of both observed and unobserved covariates between the treated and control units. When samples are small, this balance may be imperfect in particular realizations, but as the sample gets larger, the balance, statistically speaking, becomes better and better.

What Genmatch does is to choose untreated units to match to the treated units in an observational study based solely on a criterion of post-match balance. This contrasts with the usual approach in economics of using something like nearest neighbor matching or kernel matching on estimated propensity scores (probabilities of treatment) in an iterative process in which balance is tested at each iteration and the propensity score model is made more flexible by adding additional terms until some desired level of balance is achieved.

The paper includes two Monte Carlo analyses as well as an application to the much-abused data from LaLonde's (1986) seminal work on the National Supported Work Demonstration. The NSW application is well done and sensibly interpreted. One of the Monte Carlo analyses, drawn from the literature on matching outside of economics, has the bizarre feature that it includes matching on instruments, that is, on variables that affect participation in treatment but do not otherwise affect outcomes. As Jay Bhattacharya explains at length, you should not do this.

Readers interested in Genmatch will also likely be interested in inverse probability tilting, which has the same spirit of building balance maximization into the estimation but in the context of weighting estimators rather than matching estimators.

No comments: