next up previous
Next: References Up: Summary and Conclusions Previous: Methodological Conclusions

General Conclusions

The conceptual starting point of the present analysis was the idea of taking seriously the interpretation of parametric significance tests as approximations of randomization tests. Specifically, randomization procedures have been directly related to variance components in the fixed- and random-effects ANOVA. On the one hand, randomization tests reflect the logic and purpose of significance testing in experimental research in two major respects: (1) A statistical generalization to an underlying population is not intended, in contrast, all conclusion are principally sample related, and (2) the significance test protects from falsely accepting a causal hypothesis and is therefore supporting the internal validity of a study. On the other hand, ANOVA-procedures are widely used as a flexible mathematical tool for the analysis of experimental data, allowing for the treatment of different sources of variance as either fixed or random. The underlying statistical assumptions concerning the idea of statistical generalization to a population, however, should not be confused with the scientific aim of experimentation in psychology.

To state it somewhat differently, parametric statistics and randomization tests reflect two different facets of randomness. The first kind of randomness is realized by the randomization procedure in randomization tests. The randomization test protects either from errors of randomization caused by the random assignment itself, or from chance fluctuation not accounted for by the randomized sequence of treatment levels in within-subjects designs.

The second kind of randomness is expressed in the concept of random sampling from a population. In the present experimental context, this random sampling is given eo ipso with respect to some underlying hypothetical population, that is, it has not to be assumed that the sample is representative with respect to some underlying actual population. Based on these assumptions, what are the more general conclusions, that can be drawn from the present analysis? The first major conclusion concerns the status of inferential statistics in quasi-experiments or correlational analyses. It has been shown with respect to stimuli that the appeal to an underlying population - probably supported by the statistical assumptions of the ANOVA procedures - has the aim to protect the intended causal inference. This logic requires, however, that the underlying - hypothetical - stimulus populations do not differ with respect to the distributions of PCVs, that is, the validity of the ceteris-paribus condition has to be postulated on the level of - hypothetical - populations. It has been argued that apart from the fact that PCVs - and their distributions in hypothetical populations - are principally better controllable with respect to stimuli then to subjects, this reasoning is also true for all other types of studies in which a causal inference is sought, but randomization is not possible. Curiously then, in quasi-experiments and correlational analyses, the reference to an underlying population becomes necessary not because an inference to these populations is sought but because it is necessary for the manifestation of the internal validity in the sample. As already indicated, the assumption that treatment and PCVs are stochastically independent in the - hypothetical - populations is crucial in this context. To explicate further, if PCVs and the hypothetical cause(s) are not stochastically independent in the population - i.e., PCVs are actually confounding variables (ACVs) - the actual probability of falsely rejecting the null-hypothesis (on the level of the scientific hypothesis) might be higher (if the confounding variables work in the direction of the causal hypothesis) or lower (if the confounding variables work against the causal hypothesis) than the nominal tex2html_wrap_inline516-level.

Unfortunately, the assumption of stochastic independence is very difficult to prove empirically since most likely not all PCVs are known, to begin with. This would require the unreachable ideal of a "complete psychological theory". What one can do is at first to argue in favor of the validity of this assumption, based on substantial theoretical reasoning, and secondly to try to control as much PCVs as possible. If one could in fact match subjects with respect to all PCVs the validity of a causal inference would be guaranteed. Incidentally, this idea of a "perfect matching" quite exactly mirrors similar conceptions in philosophical theories of probabilistic causality and causal explanation, like the concept of "objectively homogenous reference classes" in Salmon's (1984) theory of causal explanation. Although one can never reach this "perfect matching", one definitely can not leave this problem to the statistical inference.

Alternatively, in regression or path analysis the idea of a "perfect matching" is reflected by the methodological requirement to include all causally relevant variables - either alone or in combination - in a path analysis in order to be able to interpret paths in a causal fashion (the so-called "closedness" condition). This inclusion of other causally relevant variables - which change or mediate the critical causal relation - is also to be found in philosophical conceptions of probabilistic causation usually referred to as the "screening off" of other conditions (e.g., Davis, 1988). Note, that the conclusions based on sequential testing of variables in this respect (i.e., whether they mediate or change the influence of the hypothetical cause on the dependent variable) are principally dubious since the mediating effects could take place in any possible combination of other variables. As a result, any evaluation of a variable in terms of a causal relation is essentially preliminary if not all relevant variables are included in the analysis at the same time.

The second major conclusion concerns the concept of external validity and its relation to both kinds of randomness. It has been shown that it is important to conceptually distinguish between variability that originates either from the randomization procedure (in case of between-subjects designs) or from chance fluctuations across points of measurement (in case of within-subjects designs) and manifest variability caused by subject/stimulus-treatment interactions. For stimuli all respective variability was argued to be manifest, since there is no reasonable notion of chance fluctuation with respect to stimuli. With respect to subjects, both sources of variance are statistically indistinguishable but should nevertheless be separated conceptually. Whereas variability originating from the process of randomization or chance fluctuations is a potential threat to the internal validity of a study, variability caused by subject/stimulus-treatment interaction is not. As a consequence, considering the variability caused by stimulus-treatment interactions does not raise the internal validity of a study. Rather differential effectiveness of the treatment is an aspect of "the concept formerly known as external validity". This aspect has been named aggregation validity (AV) to set it apart from population or stimulus validity (cf. Hager & Westermann, 1983). AV does not require the notion of an underlying population and is therefore exclusively sample related - as is internal validity. Of course, the evaluation of AV might require replication of an experiment with different-possibly selected-samples of subjects. However, one should distinguish between the empirical evaluation of AV and the concept of AV, the latter clearly needing no appeal to a population. The term AV has been chosen to emphasize that its focus lies in the evaluation of the validity of aggregation across stimuli and subjects.

For methods of isolating subjects for which the treatment has differential effectiveness - thus violating the hypothesized causal law - and who therefore cause disordinal subject-treatment interactions, see Iseler (1996b). From the present perspective the major aim is to separate variability originating from the randomization process itself from variability which is caused by manifest subject-treatment interaction. Inasmuch as there is manifest differential effectiveness of the treatment - i.e., homogenous subgroups of subjects with different treatment effects can be identified - aggregation across these subjects is not a valid procedure and therefore threatens the aggregation validity of an experiment.


next up previous
Next: References Up: Summary and Conclusions Previous: Methodological Conclusions

Methods of Psychological Research 1997 Vol.2 No.2
© 1998 Pabst Science Publishers