Next: Quantitative TrendsTrend Tests, Up: MPR-online 1996Vol.1, No.4 Previous: Contents
Usually, it depends on one of two prerequisites whether tests for quantitative trends are applied or not. First, the independent variable is quantitative, and second, the independent variable is quantitative and a particular quantitative trend hypothesis is to be tested (see Keppel, 1973, p. 114) . In the first case, the experimenter does not proceed from certain expectations, he or she just looks for the best functional description of his or her data. In the second case, however, the data are examined as to their compatibility with predictions derived from a certain theory or substantive (i.e., psychological) hypothesis. Under these circumstances, it is always possible to specify the exact relations between the independent variable and the values of the dependent variable in advance. Although this article deals exclusively with the case of testing theories and (psychological) hypotheses by means of predictions derived from them, the considerations presented here will prove their importance for other cases, too.
The distinction between substantive or psychological hypotheses and statistical hypotheses is often either blurred or not taken into account in empirical psychological literature as well as in some textbooks. Psychological hypotheses refer to psychological constructs such as 'aggression,' 'self-esteem,' or 'imagery' and they 'treat the phenomena of nature and man' (Clark, 1963, p. 457 ). In contrast, 'statistical hypotheses concern the behavior of observable random variables' (Clark, 1963, pp. 456-457 ) such as 'population variances,' 'population means,' 'population correlations,' and 'distribution functions.' Most often, psychological hypotheses are examined using statistical hypotheses which are only loosely related to them. This is the case, for instance, when the psychological hypothesis enables the prediction of a certain rank order of parameters across several experimental conditions and the well known F test is applied, testing against the hypothesis that not all parameters (population means ) are equal or homogeneous.
Some authors call for a closer connection between the psychological hypothesis and the statistical hypothesis or hypotheses. They specifically demand that statistical hypotheses should be derived from the psychological one, 'even in a rather loose sense of derive' (cf. Hager, 1987 , 1992 ; Meehl, 1967 ; Wampold, Davis & Good, 1992 ; Westermann & Hager, 1986 ). Hager (1992, pp. 54-68)  has argued that this derivation should preserve the psychological hypothesis' empirical content as it is understood by Popper (1981 , 1992). To this aim, he has proposed two additional criteria of derivation, namely appropriateness and exhaustiveness.
'Appropriateness' means that the derived statistical hypothesis has to conform with the direction of the relation claimed in the psychological hypothesis, and 'exhaustiveness' means that a prediction has to encompass any relation or aspect of the psychological hypothesis which can be expressed by statistical concepts (see Hager, 1987 , 1992 , and Hager & Hasselhorn, 1995 , for further details). If a statistical hypothesis is connected to a psychological hypothesis by a derivation and if it meets with the two criteria just mentioned, it is called a statistical prediction (SP for short). This linkage between two kinds of hypotheses by a derivation together with two criteria seems necessary and sufficient to ensure an unambiguous separation of those results which are in complete accordance with the psychological hypothesis from those that contradict it. Such a partition of possible results conforms to demands formulated by Fisher (e.g., 1966 ) as well as by Popper (1980) . It is, however, very often the case in empirical psychological literature that this basic principle, advocated independently by a statistician and by a philosopher, is violated, as the analyses by Hager (1992) , by Hager and Westermann (1983)  and by Westermann and Hager (1986)  show.
A statistical prediction is a special statistical hypothesis which is not necessarily equivalent to the null or the alternative hypothesis of a (wide-spread and/or single) statistical test. A null hypothesis () is any statistical hypothesis which comprises one of the signs '=', '', or '' and which is testable by a given statistical test. It's opposite is an alternative hypothesis (), which usually is complementary to the and against which the test is performed. Furthermore an usually refers to the relations '', '>' or '<'. This distinction is made in most textbooks for psychologists (see Hays, 1988 ; Howell, 1992 ; Kirk, 1982 ; Wilcox, 1987 ; Winer, Brown & Michels, 1991)  and suffices for the purposes of this article. If the statistical prediction is not eqivalent to a single testable or , there are basically two options: either to perform a less well suited test and interpret the 'apparent' empirical relations among the sample statistics, or to apply more than one test. The more tests that are performed the greater the cumulation of statistical error probabilities, but the greater information gained in general. Besides, the cumulation can be adjusted for, but the possible adjustments will not be considered in any detail (see, among many others, Hochberg & Tamhane, 1987 ; Kirk, 1982 , 1994 ; Miller, 1981 ; Westermann & Hager, 1986) .
Choosing the first option means that either one or both of the principles of appropriateness and exhaustiveness with respect to the particular statistical prediction is violated by the statistical hypotheses actually tested, and/or that the decisions made are mainly data-based. Data-based decisions rely on statistical tests and on subsequent differential interpretations of data patterns. If - for example - the significance of an overall F test is taken as the basis for interpreting the rank order of sample means as being the same as of the population means, this is a data-based decision not covered by the test performed. If the F test is performed on a comparison with more than one degree of freedom, it does not refer to distances among the means, but to a quadratic function of these distances, which are squared, summed up, and averaged for the purposes of the F test. Besides, more individual decisions are made than are covered by the nominal significance level of the F test (Ramsey, 1980) , as the increase in the conditional probabilities and/or depends on the number of decisions actually made rather than on the number of tests performed. The 'correct' test-based interpretation of a significant F value only permits saying that there are at least two population means different from one another. The numerous techniques of multiple comparisons can be said to have been developed to replace mainly data-based statements with test-based propositions, controlling for the cumulation of the error probability . In contrast, test-based decisions are based on tests only and they are not modified, 'corrected,' or augmented by additional interpretations of the data patterns. These considerations should not be taken as an argument against careful data inspections, which always should be done. The present article deals with some testing strategies, the application of which enable making test-based decisions and avoiding data-based decisions.
If, on the other hand, the statistical hypotheses actually tested turn out to be only loosely linked to the psychological hypothesis of interest or to the statistical prediction derived from it, the probability of false decisions concerning the psychological hypothesis can be enhanced substantially, or in more general terms: the probability of false 'truths' can be enhanced greatly. I will cite no examples from current empirical literature to demonstrate this, but rather deal with some textbook presentations; empirical researchers should not be expected to act in a more sophisticated manner than textbook authors. To lower the probability of false 'truths' it is important to apply the criteria of adequateness and exhaustiveness when deriving testable statistical hypotheses from the statistical prediction or when decomposing it into testable partial hypotheses. Several of the subsequent considerations will focus on this demand.
If psychologists describe relations among variables by means of mathematical functions they aim for a greater degree of exactness or precision than is possible when using less precise methods of description. This goal of exactness, however, may be rendered unattainable by choosing tests which are not exact enough: One is working with a very precise and seemingly exact scientific terminology and hypotheses, but because of inappropriate statistical procedures the hypotheses actually tested do not reflect the quantification or the functionality to a sufficient degree. This lack of correspondence between the quantitative hypothesis to be tested and the one actually tested will be examined subsequently. It will also be argued that certain tests of qualitative trend hypotheses can result in analoguous problems.
By calling a trend hypothesis a statistical prediction, it is meant that hypotheses of this kind can occur in empirical research, that is, may serve as the target hypothesis to be tested. I shall not deal with psychological hypotheses leading to a particular statistical trend prediction, but intend to discuss some trend hypotheses and their relation to some commonly administered tests. Thus, the main question I seek to answer is: Given a particular (quantitative or qualitative) trend hypothesis, which of some well-known statistical tests is best suited to test it, whereby 'best suited' does not refer to statistical assumptions, but to features of trends. This question will be discussed from the perspective of the method of planned or focussed contrasts (among expectations of normally distributed random variables or population means since '... it is to the experimenter's advantage to specify a select, limited number of contrasts in advance' (Kirk, 1982, p. 106) . The usual parametric assumptions are taken for granted throughout, equal n's in a one-way layout are assumed, and the quantitative variable X (values through ) is equidistant. Despite these restrictions, the general considerations are applicable to other parameters, tests, and layouts than those addressed herein (see, for example, Marascuilo & Mc Sweeney, 1977) . Furthermore, it is assumed that appropriate power analyses for controlling both conditional error probabilities ( and ) takes place (see Cohen, 1988 ; Hager, 1987 , 1995 ). No reference will be made to more robust alternatives to the tests considered (see, e.g., Wilcox, 1987 ) and to the various procedures of ordering and selection which seem to be more appropriate for data analyses after data collection (see, e.g., Dykstra, Robertson & Wright, 1986 ; Lovie, 1986 ; Robertson, Wright & Dykstra, 1988 , and Wilcox, 1987, chap. 12 ). These techniques, however, may be applied in addition to the tests considered here, but the examination of psychological hypotheses formulated in advance should be separated carefully from additional data analyses which could also be interesting. Testing psychological hypotheses means that the kind of trend can and most importantly should be predicted prior to data collection. Since the testing strategies proposed subsequently mainly consist in suggestions of how to link certain well-known tests no reference will be made to particular computer programs for data analyses.
Next: Quantitative TrendsTrend Tests, Up: MPR-online 1996Vol.1, No.4 Previous: Contents Methods of Psychological Research 1996, Vol.1, No.4
© 1997 Pabst Science Publishers