next up previous contents
Next: Bibliography Up: A Paradoxical Property of Previous: An Urn Model

Methodological Consequences

It may be questioned whether a situation of the kind demonstrated in Section 2.1 will ever occur in actual research. Doesn't the median paradox belong to that species of highly artificial counterexamples without practical relevance constructed by meticulous mathematicians to prove the foolishness of everyday research practice? Even if one shares the critical view behind this question, the example is sufficient to show that the process of 'deriving' an aggregate hypothesis from a hypothesis referring to individuals needs an argument of plausibility, if the hypothesis is explicated in terms of an order of medians: Although a situation with $\mu{}^*_{ua} < \mu{}^*_{ub}$for every $u \in D$ and $\mu{}^*_{\pi a} \ge \mu{}^*_{\pi b}$cannot be excluded, it may be considered unlikely. Insofar, the derivation of the aggregate hypothesis is qualitatively different from the case of hypotheses referring to expectations, which can be regarded as a strict derivation based on the results of Steyer et al. (1995 [23], 1996 [24]) mentioned in Section 1.2.

The difference between both kinds of 'derivation' has further consequences for the methodology of testing causal hypotheses in psychology. We are now sufficiently prepared to discuss the fairness of the method of strict hypothesis testing outlined in Section 1.1. If treatment effects are explicated as individual or average causal effects upon the expectation of the dependent variable, then a causal hypothesis claiming a positive individual causal effect (i.e., $\mu{}_{ub} - \mu{}_{ua} > 0$) for every unit belonging to a domain D implies $\mu{}_{\pi b} - \mu{}_{\pi a} > 0$for every selection distribution $\pi$, including those gouverned by covariates which have identified a subgroup with a reversed average causal effect in an exploratory analysis. Recall, however, that the selection distribution $\pi$must be identical for both conditions. Under this provision, the prediction $\mu{}_{\pi b} - \mu{}_{\pi a} > 0$follows for every selection distribution $\pi$from the aggregation stability of positive causal effects on expectations, and a test of this prediction is fair, indeed.

Whereas the difference $\mu{}_{ub} - \mu{}_{ua}$is the individual effect of b vs. a upon the expectation of the dependent variable, a causal hypothesis may also refer to differences of other properties of the distribution of the dependent variable, e.g. to its variability or to its median.20 Hence, the hypothesis that the medians follow the order $\mu{}^*_{ua} < \mu{}^*_{ub}$for every unit u belonging to a domain set D may be called a causal hypothesis with the same right. Formally, the median paradox leads immediately to the conclusion that testing this hypothesis via the aggregate hypothesis $\mu{}^*_{\pi a} < \mu{}^*_{\pi b}$is an unfair test. But to evaluate the methodological weight of this unfairness, we have to resume the question whether a situation of the kind described in Section 2 is likely to occur in actual research. Even if this is doubted for 'representative sampling' or for the common practice of using arbitrarily biased samples, the matter becomes different in the framework of strict hypothesis testing with selection distributions gouverned by covariates, which have enabled (in an exploratory analysis) the isolation of a subgroup with an order of medians contrary to the hypothesized one. Consider a situation, where the domain set D contains a (possibly small) subset A of units with the typical intersections of ogives, which have turned out in Section 2.2 to favour the occurrence of a median paradox. Would it be entirely implausible that some pattern of covariate values correlates with membership in the set A? In a situation of this kind, a selection of units based on these covariate values can increase considerably the probability of a sample with a median paradox, even if the set A is a small minority of D, which would never dominate a sample selected without a bias favouring elements of the set A. In other words, the technique of deliberately biased samples can systematically produce a median paradox. Hence, the prediction underlying a hypothesis test would be too daring, since it can lead to an unfairly strict test of the hypothesis that $\mu{}^*_{ua} < \mu{}^*_{ub}$holds for every $u \in D$.

Should this fact be coined into an argument against median hypotheses or against the method of strict hypothesis testing based on deliberate sampling bias? Under a reconstructive (i.e., non-normative) view of methodology, it is not the task of this discipline to establish norms, but to point out possible consequences of methods, including their advantages as well as risks of erroneous interpretations of data. In this understanding, the methodological relevance of the median paradox can be summarized in the conclusion that it hinders a way of redeeming hypothesis testing in psychological research from justified complaints about its lack of daring predictions, if effects of conditions are explicated in terms of medians.

On the other side, shifting to the test of differences in expectations may be considered problematic for dependent variables with ordinal scale level. So it should be mentioned that there is another way of explicating a 'positive' effect of a condition b (relative to a) on a dependent variable: A hypothesis can claim that the relation $f_u(a,x) \ge f_u(b,x)$holds for every $x \in R$, and fu(a,x) > fu(b,x) for some real number x, which may be different between units. Since this property (called 'strict stochastic order'21) is based only on the $\le$-relation of the dependent variable22, it is meaningful (in the sense of invariance under admissible transformations) for ordinal data. Furthermore, this property is stable under aggregation in a similar way, as it has been reported in Section 1.2 for the hypothesis of positive individual causal effects upon expectations: It follows for every map $y_\pi: (C \times{}R) \rightarrow R$characterizing an RSO-process $\pi$in a domain D, if it holds for every unit belonging to that domain. (See Iseler, 1996b [13], for this conclusion.) Hence, the derivation and the test of a corresponding aggregate hypothesis (including suitable significance tests, see Townsend, 1990 [26], and Iseler, 1994 [11] and 1996b [13]) is fair in a context of strict hypothesis testing based on deliberately biased samples.

In summary, an adequate answer to the median paradox and to the requirement of strict and fair hypothesis testing would consist in the transition from median hypotheses to strict stochastic order as an explication of a 'positive' effect of a treatment b (vs. a) upon an ordinally scaled dependent variable. But of course it must be left to the individual researcher to judge whether his or her theory really implies a concept of an effect, where shifts of probability mass have to be regarded as positive or negative effects, if and only if the median is changed. This would e.g. mean to speak of a positive effect of treatment b (vs. a) in a situation, where the pth quantile (i.e., the scale value with a cumulative probability p) is greater under condition b only for values of pin the immediate neighbourhood of p=0.5. 23 If a psychological theory is based on a concept of a positive effect with these implications, then it would be unfair to replace a median hypothesis by the hypothesis of strict stochastic order: Results with crossing ogives would be regarded as instances of refutation, although the non-occurence of such results doesn't follow from the psychological theory. However, due to the median paradox a test of the aggregate hypothesis $\mu{}^*_{\pi a} < \mu{}^*_{\pi b}$would be unfair, too.

A more general methodological consequence to be drawn from the median paradox is a warning against a common practice of 'intuitive' derivation of statistical hypotheses, where a property, which is intended to refer to individuals, is translated into a formally identical aggregate hypothesis. Perhaps the disregard for such warnings is partly due to the fact that they have frequently been based on the lacking aggregation stability of highly formalized mathematical models like exponential learning models (Sidman, 1952 [21]) or the Thurstonean law of comparative judgment (Bakan, 1967/1970 [4]). But the difference in the behaviour under aggregation of expectations and medians should make us cautious towards assertions (e.g., Hager, 1992, p. 21 [9]) that the choice of particular parameters for a statistical hypothesis doesn't touch the validity of a hypothesis deduction from a psychological theory, which isn't itself formalized mathematically, and that this choice can be gouverned entirely by the scale level of the dependent variable and by needs of controlling the power of statistical tests. In order to derive an aggregate hypothesis leading to a valid and fair tests of a psychological hypothesis referring to individuals, the aggregation stability of the hypothesized property has to be paid due attention.


next up previous contents
Next: Bibliography Up: A Paradoxical Property of Previous: An Urn Model
Methods of Psychological Research 1997 Vol.1 No.4
© 1997 Pabst Science Publishers