Скачать 60.02 Kb.

What’s multiple regression got to do with it? A comment on Michael Shalev Lyle Scruggs lyle.scruggs@uconn.edu Dept. of Political Science University of Connecticut Storrs, CT 06269 Unfortunately, many people like to do their statistical work as they say their prayers—merely substitute in a formula found in a highly respected book. (Hotelling, et al , 1948 [cited in Kennedy 2002]) I want to begin by thanking Michael Shalev and the editors for providing a forum for discussing the role of quantitative techniques in comparative social science. My particular interest in this debate comes from two angles. First, though I am not trained as a methodologist, I regularly teach statistics to graduate students. This gives me a certain affinity with the frustrations expressed in his paper concerning the use and abuse of regression analysis. It is hard trying to explain to graduate students that, while statistical software is a useful hammer, everything is not, in fact, a nail. It is even more frustrating that they can, with some justification, interject to my proscriptions: “but don’t a lot of published papers in our field do that.” Second, I have written several papers dealing directly with of the examples discussed in his paper, most of them employing multiple regression (MR) techniques. Let me start by laying out what I basically agree with in the paper. First, regarding multivariate regression:
On several points of “causation” I am also in agreement with the paper:
With these things said, I am concerned that this article tends to throw the baby out with the proverbial bathwater, leaving us with nothing in the tub. Regarding causation, my points of agreement are completely consistent with taking a diametrically opposite position to Shalev’s with respect to the appropriate methodology for confirming causation. First of all, almost all of the substantive problems with applying MR that are discussed in the paper are also addressed in basic econometrics texts.^{1} Moreover, I think the paper just has it flat wrong about basic aspects of what regression analysis can do. Most of the mistakes that the paper correctly identifies about how MR is too often conducted are seldom solved, and sometimes made worse, by appealing “case analysis” or “other qualitative techniques” (see Seawright 2004). Second, leaving aside critiques of pooling for the moment, a number of the articles singled out for criticism in Shalev’s paper a) have been criticized on the empirics, b) make as much of a case for textbook MR than any alternative approach, particularly one whose details are not clearly specified or are anyway part of the basic MR toolbox. Finally, one has to ask what is the alternative approach to MR for evaluating theories? As I hope to make clear below, the paper’s most extended discussion of “alternatives to multiple regression” doesn’t provide anything approaching a basis for establishing a causal relationship between theoretical variables. MR and causal explanation The first thing most people learn in statistics is that correlation is not causation, and that inferring causation from statistical results requires that there is a theoretical model (a good reason to think there is a causal effect), not just a statistical one. Usually, this implies a theory with some “mechanisms” that may also be subject to investigation. Except in some quite limited senses of the term, almost no one thinks that any MR results justify a causal claim (Goldthorpe 2001). Nonetheless, I can attest from my teaching experience and reviewing manuscripts for scholarly journals that it is common for users of statistics to forget all of this. Why this is so is an interesting question. I have some guesses—e.g., researchers operate in a community that may not know enough about statistics to speak out about inappropriate use; they succumb to the temptation to ignore poor statistical methods when they produce results that seem to support their pet “causes.” But these are only guesses. What I found unclear in the paper is a definition of a cause, and the criteria for stating and establishing one. How attaching names to cases, for example, does anything to resolve the issue of establishing causation is a mystery to me. In later sections of the paper, it seems that this is a means by which one can introduce explanations in an ad (post?) hoc manner, with no recognition that this can easily result a unique configuration of causes for each case. I don’t think that is Shalev’s intent. I know of no theories that are stated in terms of particular observations or cases. Shalev rightly criticizes theoretical approaches that start with a dependent variable, add “independent variables” until most variation in the sample is explained, and then claim to have a model of causes. But one would be hardpressed to find a modern econometrics text that does not reject such an approach. At various places, Shalev raises the prospect that causal relationships can vary across units and across time in an effort to critique MR approaches. He seems to ignore the fact that unit homogeneity is necessary for any verifiable causal explanation in science. One can always claim that an explanation might not always hold in all places, just as one cannot refute the claim that a cause only “seems” to apply in times and places other than the observed case. What can regression do? Consider the following two quotes that are drawn from an early section of the paper. I select them, because I think that they are widely repeated claims against MR in contrast to a casestudy method. “[Case oriented research] assumes from the outset that the effect of any one cause depends on the broader constellation of forces in which it is embedded” (5) “MR is even more challenged by another causal assumption that flourishes in caseoriented analysis, namely that there may be more than one constellation of causes capable of producing the phenomenon of interest.” (5) These objections are metaphysical ones, in the sense that they really undermine any attempt at explanation or verification in the sciences. The first statement amounts to saying that a caseoriented research assumes that any cause cannot be separated from a broader constellation of causes, and implicitly asserts that variable oriented research assumes that it can be. I find the first assumption inscrutable as a basis for comparative social science. If causal forces cannot be isolated from one another and identified across units of comparison, how does one move beyond explaining all differences among cases as due to irreducible differences in the cases themselves. This causal perspective would seem to imply, for example, that differences in welfare spending are ultimately explained by different “national characters” (understood broadly to include culture, history, and institutions), not by leftist governments, strong unions, or the level of economic development, or some combination of just those three factors. If each cause is considered to be embedded in other “forces,” we (even the historians among us) should be required to specify what we think those forces are and how they affect “causes” we are interested in explaining. And these explanations should be subject to some criteria of rejection, which implies a domain beyond a single event. The second statement amounts to a claim that from the infinite set of factors that comprise a “constellation of forces” needed to explain an event, more than one such set of conditions may cause the event. This makes any causal explanation largely irrefutable. Why does the US have no socialist party? If my explanation is “because it was a former British colony,” identifying some British colonies with socialist parties is not sufficient to refute the causal claim definitively, because those other former colonies are not the United States. Indeed, if we did find a condition (X) that was, empirically, unique to those countries without strong socialist parties, one could still not refute the causal claim that, for the United States, condition X was only operative because the US was among other things a former British colony. (The counterfactual would be that condition X would not have precluded the development of a socialist party in the United States if the US had been, say, a French colony.) If nature behaved this way, MR would certainly be humbled, but no less thoroughly than any alternative approach to evaluating causal regularities. Can regression deal with conjunctural causation and causal heterogeneity? The previous section suggested that any approach to explanation must specify what is supposed to matter and how it matters. Here I want to object to a narrower claim that MR cannot really accommodate conjunctural causation and causal heterogeneity. MR does require that whatever causal possibilities that we posit to exist in theory must be specified and operationalized in an empirical model beforehand. But doing that is perfectly compatible with the reality of conjunctural causality and causal heterogeneity. Conjunctural causation can essentially be accounted for by some type of “interaction term” in a regression model. This would test whether the effect of two things together is greater (or less) than the sum of the parts. In a simple case, one can simply take the interaction as the intersection of two variables. If, for example, having A or B alone is jointly bad for you, but having A and B (together) is good for you, this can be incorporated into an MR model. Interaction terms, particularly dummyvariable interaction terms, which allow for the effect of a variable to be different to two contexts, i.e., that government spending produces inflation in noncorporatist systems, but not in corporatist ones, are standard fare in regression texts. (A related, but more complicated, causal structure amenable to MR is hierarchical models, which Shalev praises later in his paper.) The fact that a particular regression model fails to include (or consider) interaction possibilities is a theoretical or a model specification problem, not a technical one. While it is convenient to blame this lack of creativity on making students take statistics courses—Shalev cites Abbott’s claim that using linear models causes us think that causal effects are linear (5)—the widespread confusion about conjunctural causation is really an argument why students desperately need more good statistical training, not less statistical training. Shalev suggests that the problem with an interaction specification in MR is that it takes up degrees of freedom. This is a pretty widespread claim about the advantages of caseoriented approach. But how a case approach, which, if anything, leads to a reduction in the number of cases analyzed, can more adequately discern the validity of an explanation with one more “moving part” is hard to understand. It is more often in “substantive” (i.e., case) approaches that one finds much vaguer specifications about the relationships between variables. I can only draw one straight line curve between points; I can draw a lot of nonstraight curves. So what does it mean to move from a claim that a relationship is “linear” to a claim that it is “nonlinear”?^{2} To illustrate how MR deals with conjunctural causation, consider the following example of ten observations (Table 1). TABLE 1 ABOUT HERE A standard regression model Y= b0+ b1A + b2 B yields Y= .9 + .5A + .5B se (.57) (.57) and the overall model explains no variance Y (Rsquared is around 0) Given a theoretical reason (or just a hunch) of conjunctural causation between A and B, we posit a different regression model Y=b0+ b1A + b2B + b3C. C is a new variable (A*B) Estimating that model with the same sample of data. Y= 0 + 1A + 1B + 3C, and predicts the data perfectly. Individually, A and B have a negative effect, but jointly, their total effect is positive. Note that if our hunch arose simply from eyeballing the data (which you can do in this case) and not for some a priori theoretical reason, then one has simply summarized the data. The question of whether that model is good cannot come from mechanically fitting the data. Causal heterogeneity In contrast to the common assertion that MR cannot handle causal heterogeneity, the possibility that different combinations of variable values can produce the same outcome is precisely what multiple regression allows for. Indeed, when I was a graduate student, I learned that one reason for using MR as opposed to simpler, bivariate regression analysis was that variation in most of the variables that social scientists are interested in is unlikely to have a single cause. Though some of my students do have trouble seeing it at first, a regression estimate produces a predicted value for each case, and it generates predicted values for all possible combination of variables in the model, even if some combinations are not represented by specific cases.^{3} TABLE 2 ABOUT HERE To see this, Table 2 presents another simple set of seven observations. A and B are both associated with Y. Regressing A and B on Y in the form Y = b1A + b2B produces a result (.2+ .6A +.6B) that seems odd at first, because it implies that A and B do not perfectly predict Y. (If A=1 and B=0, Y(predicted)=.8.) However, knowing that Y only takes a zero or one value, the regular OLS regression model is flawed. You need a logit estimator, which is a relatively minor variation on the OLS technique, and is least introduced in most basic econometrics texts. Estimating these data with a logit model produces a result that perfectly classifies all of the cases!^{4} As for the claim that MR does not distinguish between additive, conditional or multiple pathways as the causal forces, they are easily obtained from the predicted values of the actual cases. One thing that is sometimes overlooked is that MR approaches (OLS or variants like logit, ordered logit, etc.) can estimate parameters when variables are measured dichotomously (0 or 1) up to continuously measurement. MR approaches are also generally robust to reductions in the number of categories of measurement. Major alternative approaches, like Qualitative Comparative Analysis are only intuitive when the data are dichotomous for all of the variables. Too much may be made of estimation on a “continuum” when the measured concepts are not really so refined. That may be a temptation that MR permits, but it is not a cause of poor measurement. Spanning large parameter spaces Shalev is certainly correct when he critiques how many MR studies “span” many empty cells and convey an impression of linear effects that is not really justified. For the relationship displayed in Figure 1, OLS reports a “statistically significant” regression line, and would predict Y=12 given X=13. That prediction is based on the assumption that the relationship is linear, and the data obviously fails to support that assumption. (“More supportive data” for a linear effect would be that the observations (X,Y)= (9,4) and (15,21) were actually, say, (9,8) and (15,15)) FIGURE 1 ABOUT HERE But is this a problem that is particularly likely to plague MR as a technique? One step in developing and evaluating regression models is to examine assumptions about functional form, error distributions, and other socalled residual diagnostics before accepting MR results as genuine. Most basic econometrics texts have sections on residual diagnostics and functional form, and walk through all the basic assumptions of linear models. Econometricians like Leamer and Kennedy, and many econometrics texts, provide fundamentals on how to test the robustness of regression estimates, often in ways that reveal problems that Shalev’s paper identifies. It is thus hard to characterize most of these “spanning” problems as unique to MR, let alone a justification for not using MR. The example in Figure 1 is a convenient illustration, because the data do follow a binary pattern. There is very obvious break. When the data actually varies more continuously over the range of values, imposing a binary classification on data can make results very sensitive to where one assigns the cutpoint. Binary classifications are only simple if the cases are really discreet without many cases “somewhere inbetween.” With respect to the capability of regression analysis to identify “problems” in the data, consider the contribution of Lange and Garrett, which Shalev mentions in several places in his paper. Lange and Garrett’s initial findings (based on a simple model with an interaction term) were immediately contested Robert Jackman (1987) based on an assessment of the predicted values and errors. Hicks (1988) and Scruggs (2001) also present refined analyses.^{5} What is the solution to avoid spanning large parameter spaces? Why are those middle cells empty? Shalev’s suggestion seems to be that cells are empty because there is, in fact, not independence in regressors. Such clustering can show up in MR analyses as correlated independent variables (collinearity). This problem makes it hard for MR to isolate with confidence the effect of any one independent variable, while still allowing inferences to be made about ”joint effects” of several variables. In other words, if three factors coexist with the dependent variable in most of our sample of cases, MR analysis will show that the set of factors is associated with the dependent variable, and that primacy of these factors cannot be disentangled empirically. More problematic is when there are situations like in Figure 1. Simple regression can produce misleading results, but proper econometric analysis alerts us to these problems in two ways. First, scatterplots like Figure 1 will raise a red flag. Second, the residuals from a regression analysis of these data are not normally distributed. Both checks would tell us to be wary of the MR estimates of a simply linear relationship. Finally, Shalev does not make a real “positive” argument for a casecentered approach being any better in diagnosing or dealing with this kind of problem. Given clustering like Figure 1, what is the causal explanation? Informed MR diagnoses the problem. But the approach advocated in the paper, like much casestudy work, assumes determinism and perfect measurement of concepts. It infers that any residual is thus due to model misspecification. What if the residual is due to a measurement problem? Or due to sampling variation? Or to simple indeterminacy? It would seem that the paper’s approach would always result in overfitting the data. Population Shalev points out that the data that comparative welfare state researchers use is problematic as a basis of evaluating their empirical models. MR estimates are not useful if you know the population values. Assuming determinacy, this might seem to imply that there should be no residuals, and that overfitting is not really possible. But social scientists generally want to be able to use their explanations to predict. Some notion of prediction (or counterfactual condition) is implicit in most definitions of causality.^{6} This means that the “population” of eighteen OECD countries is really a “sample” of outcomes which we use to create explanatory models that will inform future policy choices or which are consistent with a welldeveloped theory. While prediction is not necessary for an explanation to be correct, there are many conceivable explanations of a given phenomenon. This makes any explanation’s “predictive” power a good basis for parsing among competing explanations. Shalev’s discussion about this problem is unclear to me. On page 11, he cites Freedman and Leamer to the effect that hypothesis testing requires welldeveloped theory and data that has not been used to create the model in the first place. He then cites Ragin’s claim that data and theory are in a constant dialog, and infers that Freedman and Leamer plus Ragin implies that we can, in fact, only count on MR as a way to summarize data, not to test hypotheses. I think this totally misconstrues Ragin and Leamer and Freedman. MR can never simply be used to “summarize” relationships. The “product” of a model is valid to the extent that it can explain data that is independently derived. Leamer and Freedman (and, once again, many basic econometrics texts) do not promote “purist” notions of separating theory and data. What they suggest (also see de Marchi 2006, Granger 1999, Kennedy 2002) is that researchers who want to use particular data to assist in constructing an explanatory model should not use the fit of the model to those same data as a test of the model. Instead, researchers should test the model on new data. In practice, this calls for a strategy of a) dividing your dataset into “model building” and “model testing” subsets, or b) following an approach that looks for other observable implications of a model and testing those other “observable implications” of the theory (King, Keohane and Verba 1995). 