next up previous
Next: Stimuli and treatment conditions Up: MPR-online 1997Vol.2, No.2 Previous: Subjects as random factors

Stimuli nested under treatment conditions

In this common design, stimuli are nested under treatment conditions. As a result, both factors are confounded. Examples are studies in which different word classes (e.g. nouns vs. adjectives) serve as treatments. The linear model of an individual score tex2html_wrap_inline554 of subject m on stimulus j in treatment condition i is given by Equation (1).


equation80

The terms of Equation (1) specify the potential sources of variability in the (quasi-) experiment. The term tex2html_wrap_inline562 represents the overall mean and tex2html_wrap_inline564, tex2html_wrap_inline566, and tex2html_wrap_inline568 are the treatment, stimulus and subject effects, respectively. (The grand mean and the treatment effect are designated by Greek letters to indicate that they are assumed fixed; the others are potentially random.) The treatment-by-subject interaction is expressed as tex2html_wrap_inline570. The quantity tex2html_wrap_inline572 is a particular stimulus-by-subject interaction, while tex2html_wrap_inline574 is the random error associated with that particular subject-stimulus combination in that particular experiment. Table 2 shows the respective expected mean squares of the ANOVA models for the random- vs. fixed-effects model. In the random-effects model, the mean square of the treatment effect has an expected value that entails the stimulus variance-component (tex2html_wrap_inline576). Therefore, the appropriate error term to test the treatment effect is one that includes this source of error, resulting in a Quasi-F-ratio (e.g., Clark, 1973). In contrast, in the fixed-effects model, variability originating from stimulus variability is not considered.

 

Sourcetex2html_wrap_inline578 df E(MS)tex2html_wrap_inline580
T p-1 tex2html_wrap_inline586
St/T p(q-1) tex2html_wrap_inline592
Su r-1 tex2html_wrap_inline598
tex2html_wrap_inline540 (p-1)(r-1) tex2html_wrap_inline604
tex2html_wrap_inline606 p (p-1)(r-1) tex2html_wrap_inline610 (Residual)tex2html_wrap_inline612
Table 2: Sources of Variance and Expected Mean Squares; Repeated Measurements Design with Stimuli Nested Under Treatment-Conditions.

a. T = Treatment (p), Su = Subjects (n), St = Stimuli (q)
b. Variance components typed in bold letters are parts of the random - but not fixed - effects model.
c. tex2html_wrap_inline522 and tex2html_wrap_inline628 can not be estimated independently with only one observation per stimulus-subject combination.

In order to find the appropriate model one first has to analyze whether the hypotheses about the stimuli have the same nature as those previously discussed, that is: Do the critical hypotheses concern the populations of stimuli or are they essentially causal? Only if the stimulus hypotheses are actually hypotheses about populations it is necessary and reasonable seek for statistical generalization by means of random sampling of material.

Even looking at the research questions stated by the authors arguing in favor of the treatment of stimuli as random effects (e.g., Clark, 1973) makes clear, however, that the critical hypotheses are far from being hypotheses about central tendencies in clearly defined and closed populations of stimuli. Instead, the hypotheses concern the causal properties of certain features of the stimuli. The appeal to populations serves a completely different purpose. Consider, for instance, the following explanation of what Clark (1973) sees as the purpose of a "central tendency" hypothesis (i.e., a population hypothesis about aggregates) in the context of a comparison of homographs vs. nonhomographs:

... homographs take longer to recognize than nonhomographs all other things being equal. Since it is impossible to find single homograph/nonhomograph pairs identical in all other possible factors - frequency, meaning, word length, spelling difficulty, and other undetermined factors - it is only possible to test the hypotheses by looking at the central tendencies (for example, the means) of homographs versus nonhomographs. (Clark, 1973, p. 352, italics added)
This paragraph makes perfectly clear, that the hypothesis of interest does not concern aggregates of stimuli in well defined populations. Rather, the very property of homography itself is assumed to causally influence identification times. This is exactly the reason why all other properties of the stimuli have to be controlled in order to realize the ceteris-paribus condition (and secure internal validity). More precisely, it is only necessary to control PCVs and not all other properties of the stimuli, because some of these properties have no causal effect. Why the ceteris-paribus condition is assumed to be true at the level of central tendencies in populations, rather than in samples, remains entirely unexplained, however. Indeed, this is true, only if a very restrictive additional assumption is met.

First of all, how is a conceptualization of the significance test as a randomization test in the present design possible at all? Obviously, the stimuli are not randomly assigned to conditions, to begin with. One could argue, however, that the test of significance informs about the probability of the empirically obtained differences in central tendencies, under the assumption that the stimuli come from the same population. If this assumption is met, random sampling has the same effects as random assignment to conditions. The assumption, that the stimuli are drawn from the same population (tex2html_wrap_inline514) can in turn be reduced to two underlying assumptions:

  1. The null hypothesis is true, that is, the stimuli come from the same population with respect to the treatment.
  2. The stimuli come from the same population with respect to all PCVs.
The second assumption implies nothing less than the assumption of the validity of the ceteris-paribus condition at a population level. This assumption becomes necessary, because the ceteris paribus condition is not guaranteed by a randomization procedure (at least with respect to the distributions of the PCVs, e.g. Steyer, 1992). It has thus to be secured in a different way. However, this is a very strong a-priori assumption. As a result, the significance test informs only about the probability of the data, given the conjunction of the tex2html_wrap_inline514 and the assumption that the ceteris-paribus condition is met at the population level. The internal validity of the inference is therefore secured, if and only if there exist no systematic differences (in terms of PCVs) between the kinds of stimuli in the population, that is, stimulus category and PCVs are stochastically independent. For a recent approach to falsify this assumption of unconfoundedness in the population with respect to single PCVs - referred to as potential confounders - see Steyer, Gabler, and Rucai (1995).

The same line of argument holds - mutatis mutandis - for other quasi-experimental designs or correlational studies in which randomization is not possible - at least as far as causal and not "true population hypotheses" are concerned (see the Conclusions). True population hypotheses, however, are commonly of limited theoretical value, since the major aim of (basic) science lies in the investigation of general causal mechanisms rather than in the description of incidental and local phenomena.

Usually, one tries to increase the ceteris-paribus validity in these cases by quasi-experimental control techniques like matching or the inclusion of PCVs either in the experimental design or in the regression or path analysis. However, matching with respect to stimuli seems to be more feasible than with respect to subjects, because the number of PCVs is more restricted. Indeed, if it were possible to reach a "perfect matching" (i.e., with respect to all PCVs), or to include all PCVs in a regression analysis (thus fulfilling the so called "closedness" condition), it would be possible to interpret a regression coefficient or a difference in central tendencies unambiguously in a causal manner.

For the present design this means that the stimulus variance that is not evoked by the treatment (tex2html_wrap_inline634) is reduced to zero. As this variance component is exactly the term that separates the random-effects from the fixed-effects model, this means that differences between the two models are reduced inasmuch as a matching of the stimuli is successful. Furthermore, as blocking factors (e.g., word length) are usually considered in the process of designing a study but not in the subsequent statistical analysis, the application of the random-effects model results in a smaller actual tex2html_wrap_inline516-level than the nominal one (even if the statistical assumptions of the random-effects model, like true random sampling, are actually met; cf. Wickens & Keppel, 1983). Elimination of extreme material (i.e., truncation of the stimulus distributions) has the same effect (e.g., Cohen, 1976).


next up previous
Next: Stimuli and treatment conditions Up: MPR-online 1997Vol.2, No.2 Previous: Subjects as random factors

Methods of Psychological Research 1997 Vol.2 No.2
© 1998 Pabst Science Publishers