There are several different things going on in this post. First, there is an implicit empirical claim about the variability in inferences drawn from alternative statistical tests applied to the same data. I think this claim is incorrect. Usually, different statistical tests of the same null applied to a given data set produce the same inference. If they do not, that is a signal that the evidence provided by the data is not very strong, which is something that should be reported.
Second, there is an implicit call for what might label informed Bayesianism. Jeff wants a machine that allows the reader to apply his or her own prior to the data and then obtain the implied posterior distribution of the relevant parameters. In the absence of such a machine (and, in fact, there are Bayesian software packages that could function as such a machine) one might call for, and Jeff might be happy with, applied Bayesian analyses that presented results conditional on a set of thoughtfully chosen informative priors rather than, as is common at present, on a single analytically convenient non-informative prior.
Indeed, one can imagine having all sorts of rhetorical fun with meta-priors over priors, or even obtaining evidence on the distribution of priors in the population of researchers and using that to guide the choice of what is presented. I would argue that, in an informal way that I like to call casual Bayesianism, that this is essentially what we already do as readers of scholarly articles. We take the (almost always) classical statistical (or frequentist, if you prefer) evidence and informally use it to update our informal prior to produce an informal posterior. I would further argue that formalizing this process generally does not pass a cost-benefit test.
Contra Jeff, I think the main danger in most empirical work is not deliberate manipulation of what is presented by researchers (other than, perhaps, choosing regression specifications at the margin to get the standard error from 1.85 to 2.05) but rather coding errors in generating the analysis data or in doing the analysis itself, combined with, on occasion, not getting the standard errors right. It is those problems that keep me awake at night. Well, not really, but I do worry about them.
1 comment:
Thanks, otherbloggerjeff for the comments. I agree with you that scholars effectively roll their own bayesian machines when they read articles but
1. given that, and given that not just scholars will read my paper, and given that the most influential readers are probably not scholars, this only emboldens me to stylize the results. I know that scholars will "undo" my stylizing but that the New York Times magazine will not.
2. Regardless of the audience, the goal of interpreting results is to create a focal point for the entire community of readers to coordinate on. This means that I prefer yes/no questions to questions which give values on a continuum.
For a simple thought experiment ask why we even bother with stars, rather than presenting only p-values. Someone should do a regression discontinuity study on the impact of papers whose results are epsilon above and epsilon below the star threshold.
Post a Comment