In this common design, stimuli are nested under treatment conditions.
As a result, both factors are confounded. Examples are studies in which
different word classes (e.g. nouns vs. adjectives) serve as treatments.
The linear model of an individual score
of subject m on
stimulus j in treatment condition i is given by Equation (1).
![]()
The terms of Equation (1) specify the potential sources of variability in
the (quasi-) experiment. The term
represents the overall mean and
,
, and
are the treatment,
stimulus and subject effects, respectively. (The
grand mean and the treatment effect are designated by Greek letters to
indicate that they are assumed fixed; the others are potentially random.)
The treatment-by-subject interaction is expressed as
.
The quantity
is a particular stimulus-by-subject interaction,
while
is
the random error associated with that particular subject-stimulus
combination in that particular experiment. Table 2 shows the respective
expected mean squares of the ANOVA models for the random- vs. fixed-effects
model. In the random-effects model, the mean square of the treatment effect
has an expected value that entails the stimulus variance-component
(
).
Therefore, the appropriate error term to test the treatment effect is one
that includes this source of error, resulting in a Quasi-F-ratio
(e.g., Clark, 1973). In contrast, in the fixed-effects model, variability
originating from stimulus variability is not considered.
|
Source | df | E(MS) |
| T | p-1 | |
| St/T | p(q-1) | |
| Su | r-1 | |
|
| (p-1)(r-1) | |
|
| p (p-1)(r-1) | |
In order to find the appropriate model one first has to analyze whether the hypotheses about the stimuli have the same nature as those previously discussed, that is: Do the critical hypotheses concern the populations of stimuli or are they essentially causal? Only if the stimulus hypotheses are actually hypotheses about populations it is necessary and reasonable seek for statistical generalization by means of random sampling of material.
Even looking at the research questions stated by the authors arguing in favor of the treatment of stimuli as random effects (e.g., Clark, 1973) makes clear, however, that the critical hypotheses are far from being hypotheses about central tendencies in clearly defined and closed populations of stimuli. Instead, the hypotheses concern the causal properties of certain features of the stimuli. The appeal to populations serves a completely different purpose. Consider, for instance, the following explanation of what Clark (1973) sees as the purpose of a "central tendency" hypothesis (i.e., a population hypothesis about aggregates) in the context of a comparison of homographs vs. nonhomographs:
... homographs take longer to recognize than nonhomographs all other things being equal. Since it is impossible to find single homograph/nonhomograph pairs identical in all other possible factors - frequency, meaning, word length, spelling difficulty, and other undetermined factors - it is only possible to test the hypotheses by looking at the central tendencies (for example, the means) of homographs versus nonhomographs. (Clark, 1973, p. 352, italics added)This paragraph makes perfectly clear, that the hypothesis of interest does not concern aggregates of stimuli in well defined populations. Rather, the very property of homography itself is assumed to causally influence identification times. This is exactly the reason why all other properties of the stimuli have to be controlled in order to realize the ceteris-paribus condition (and secure internal validity). More precisely, it is only necessary to control PCVs and not all other properties of the stimuli, because some of these properties have no causal effect. Why the ceteris-paribus condition is assumed to be true at the level of central tendencies in populations, rather than in samples, remains entirely unexplained, however. Indeed, this is true, only if a very restrictive additional assumption is met.
First of all, how is a conceptualization of the significance test as a
randomization test in the present design possible at all? Obviously,
the stimuli are not randomly assigned to conditions, to begin with. One
could argue, however, that the test of significance informs about the
probability of the empirically obtained differences in central tendencies,
under the assumption that the stimuli come from the same population.
If this assumption is met, random sampling has the same effects as random
assignment to conditions. The assumption, that the stimuli are drawn from
the same population (
) can in turn be reduced to two underlying
assumptions:
The same line of argument holds - mutatis mutandis - for other quasi-experimental designs or correlational studies in which randomization is not possible - at least as far as causal and not "true population hypotheses" are concerned (see the Conclusions). True population hypotheses, however, are commonly of limited theoretical value, since the major aim of (basic) science lies in the investigation of general causal mechanisms rather than in the description of incidental and local phenomena.
Usually, one tries to increase the ceteris-paribus validity in these cases by quasi-experimental control techniques like matching or the inclusion of PCVs either in the experimental design or in the regression or path analysis. However, matching with respect to stimuli seems to be more feasible than with respect to subjects, because the number of PCVs is more restricted. Indeed, if it were possible to reach a "perfect matching" (i.e., with respect to all PCVs), or to include all PCVs in a regression analysis (thus fulfilling the so called "closedness" condition), it would be possible to interpret a regression coefficient or a difference in central tendencies unambiguously in a causal manner.
For the present design this means that the stimulus variance that is not
evoked by the treatment (
) is reduced to zero.
As this variance component
is exactly the term that separates the random-effects from the fixed-effects
model, this means that differences between the two models are reduced
inasmuch as a matching of the stimuli is successful. Furthermore, as
blocking factors (e.g., word length) are usually considered in the process
of designing a study but not in the subsequent statistical analysis, the
application of the random-effects model results in a smaller actual
-level
than the nominal one (even if the statistical assumptions of the
random-effects model, like true random sampling, are actually met;
cf. Wickens & Keppel, 1983). Elimination of extreme material (i.e.,
truncation of the stimulus distributions) has the same effect (e.g.,
Cohen, 1976).