Recent interpretations of the significance test do not focus on it as a means to infer to an underlying population - at least not in experimental contexts. Rather, the significance test is seen as following the logic of randomization tests (cf. Bredenkamp, 1980; Edgington, 1995; Erdfelder & Bredenkamp, 1994; Gadenne, 1984; Hager & Westermann, 1983). In randomization tests, the critical random process is not to draw a random sample from an underlying population distribution, but rather to randomly assign subjects (or stimuli) to experimental conditions. This random assignment is required to secure stochastic independence of treatment and potentially confounding variables (PCVs). PCVs designate all other factors that have an effect on the dependent variable/s in the sense of being (parts of) sufficient conditions for causal influence on the dependent variable/s. The latter reflects the fact that psychological hypotheses and explanations are essentially incomplete: They entail only one or few causes as parts of a complete causal network. Moreover, they are neither necessary nor sufficient in themselves but have to be supplied by other (background-) conditions to be effective (see Siemer, 1993 for a more detailed discussion of this property of psychological explanations). PCVs become actually confounding variables (ACVs) inasmuch as the stochastic independence of PCV and treatment does not hold. The randomization procedure is therefore required to secure the internal or ceteris-paribus ("other things equal") validity of an experiment (Hager & Westermann, 1983). If the treatment procedure itself induces confounding variables, randomization does not ensure internal validity, since the confounding takes place after randomization (e.g., Cook & Campbell, 1986).
According to this rationale, a randomization test informs about the probability of the empirical data (in terms of differences in central tendencies), under the assumption that they are exclusively the result of the random assignment procedure, that is, chance. Most importantly, the distribution of the test statistic is generated solely on the basis of the sample data.
To summarize, randomization is necessary to secure the internal validity
of an experiment that tests causal hypotheses. Therefore, the randomization
test answers the question of how likely the results are, under the
assumption that the treatment has no effect (
) and all effects are
exclusively a consequence of the randomization procedure. Consequently, the
randomization test (with appropriate
-level)
protects from falsely accepting
a causal hypothesis, and consequently increases the internal validity of an
experiment. In particular, inferential statistics do not provide the
(inductive) generalization from a sample to a population, that is, the
"external validity" or expected probability to replicate the results of
an experiment.
Therefore, the question of interest can be rephrased as: Under which circumstances does the treatment of stimuli as random effects raise the internal validity of a study? Note that in the present context ANOVA is treated as an approximation to randomization tests. This is possible because Monte-Carlo studies have shown that randomization tests and their parametric equivalents in most cases lead to very similar results (summarized in, e.g., Bredenkamp, 1980). The main reason for this approach is that ANOVA is a much more common method for the evaluation of experimental designs than randomization tests, at least for the designs considered. Therefore, from a statistical point of view, I will discuss random effects or mixed ANOVA models, whereas from a conceptual or methodological point of view, these ANOVAs will be treated as approximations to randomization tests.