Wednesday, March 11, 2009

Non-parametrics in economics

Marginal Revolution wanders far from its usual selection of topics with a post that wonders why non-parametric statistics are not used more in economics.

The comments are remarkably good and cover most of the ground. One even mentions me by name! I will thus offer just a couple of quick remarks:

1. It is important to distinguish between what is popular in the theoretical econometrics world and what is popular in the applied world. It is also important to distinguish between what is going on in applied work in the top departments and what is going on elsewhere. Methods innovations tend to trickle down from the theoretical econometricians to the applied people at top departments to the applied people at other departments (though some ideas never break out of the theoretical econometrics world). There is also variation across subfields in the speed with which methodological innovation plays out; labor economics tends to move first among the applied micro fields, I think largely because the general availability of larger and better data sets makes them more useful there (and serves to attract more technical types to the subfield). The course that Guido Imbens and Jeff Wooldridge have put on at the NBER, the ASSA metings and elsewhere (as well as related but less ambitious courses in Europe including mine with Michael Lechner) aim to speed the second part of this process. Non- and semi-parametric methods are quite popular at present in both the theoretical econometrics world and among applied people at top departments.

2. The "treatment effects" literature is broader than one might gather from the discussion at marginal revolution. In a loose sense, it encompasses most empirical work in applied micro. This literature now features a lot of work using matching / weighting methods, which are either non-parametric or semi-parametric depending on the particular implementation, as well as smaller but growing literatures that apply non- and semi-parametric methods in contexts with selection on unobserved variables, such as regression discontinuity designs and the local IV methods developed by Heckman and his students. The difference-in-differences version of matching laid out in Heckman, Ichimura, Smith and Todd (1998) Econometrica can also be considered a semi-parametric method, and has seen a fair amount of use in the literature.

3. The asymptotics for non- and semi-parametric methods are quite complicated but actually applying them in practice tends not to be, though the point made by one commenter at MR about their relative absence in standard packages like Stata is well taken. The fact that Abadie and Imbens have just released a new NBER working paper providing serious distribution theory for single nearest neighbor matching without replacement, a method that has been routinely used in the applied statistics literature for decades, is testament to the first point. In regard to the second, what could be simpler than means, weighted means, histograms and the like?

4. My favorite fact about non-parametrics is that a standard parametric linear regression becomes a non-parametric estimator if you just promise to add more terms on the right hand side as the sample size gets larger. Of course, you have to promise to add them at the correct rate, so that the hard part is making the correct promise. The fuzzy boundary between parametric and non-parametric methods is also quite apparent in the recent (very fine) survey / best practices regression discontinuity paper by Guido Imbens and Thomas Lemieux. They suggest running a linear regression in a neighborhood of the disconinuity. Is this a parametric linear regression with a caliper or is it a local linear regression with a uniform kernel? As in the (very) old SNL commercial, it is a floor was and a dessert topping.