| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
LETTERS TO THE EDITOR |
Abteilung für Psychosomatische Medizin und Psychotherapie, Universitätsklinikum Freiburg, Freiburg, Germany, armin.hartmann{at}uniklinik-freiburg.de
In a recent article, Vickers criticizes the use and abuse of Analysis of Variance (ANOVA) (1). He claims that there are at least two major problems. First, that the results of ANOVAs of randomized controlled trials (RCTs) were either badly reported (meaning that very often only p-values but no measures of effect size were reported), ANOVA did not provide adequate statistics to contrast groups, and adequate measures of effect size were missing. Second, that the results of ANOVAs for repeated measurement were hard to understand and easily misinterpreted and therefore other kinds of analyses such as "longitudinal mixed models" or "generalized linear modeling" were preferable.
In my opinion this is mainly a "cross-cultural" misunderstanding, as Vickers seems to belong to the culture of medical biometricians, whereas ANOVA is related to the culture of psychological methodologists (as I am). Let me therefore argue for my old "aunt" ANOVA.
RCTs and ANOVA
I do agree that any statistical analysis is incomplete if effect sizes are not determined. I would even claim that no reviewer of any journal should accept a manuscript where measures of effect size are missing. But this problem is not only found with ANOVA, it is the case with many other statistical procedures too. As a solution, Vickers suggests reporting differences between means and their corresponding confidence intervals, but I think that this is not an adequate measure of effect size. In psychotherapy research Cohens measures of effect size are widely used and well understood. An excellent and comprehensive overview is given by Rosenthal (2), providing formulas for the computation and showing the relation to other measures of effect size.
It is just not true that clinically relevant differences could not be detected or reported in case of an insignificant overall difference of groups. To my knowledge, any statistical package allows for the computation of contrasts between groups. Another issue is the power of trials, where we very often must conclude that the sample sizes were too small to detect small differences (which would be the case for the constructed trial of Vickers example).
Yes, not all ANOVA procedures of the standard statistical packages provide "clickable" options for the computation of effect sizes. On the other hand, an experienced statistician should be able to write some program code which, with modern statistical software, is not much more work than a few clicks. A simple table for the computation of Cohens d, realized with SAS-JMP, is available from the author and may be freely distributed.
To complain that the results of ANOVAs for repeated measurement were hard to understand is just not fair. Have you ever tried to explain the meaning of an odds ratio to somebody who is able to understand a risk difference only? Any scientific methodology must be learned and taught (sometimes to a whole community). The proposed solution, to use a regression with baseline scores, is statistically more or less the same. Its rationale and interpretation also need explanation or expertiseso it seems to me more a matter of taste (or belonging to a certain culture) which choice one makes.
I agree that is not sufficient to report the significance (and F, df) of a time x treatment-interaction only. It is like the first argument, that significance is worthless without effect size. A picture is worth a thousand numberssome visualization is required, as soon as there are more than two points of measurement. The proposed solution, applying mixed models or hierarchical linear models, is fine; but these models are even more complex and harder to understand for inexperienced statisticians, let alone clinicians. Last but not least, these models require some decisions about the nature of the development of scores over time. For example, change can be modeled with "growth curves"/hierarchical linear models (3,4). These are a special case of mixed models and they need a "formula" for the level-1 models of change. Researchers have to decide in advance on the best theoretical model of improvement. Linear, higher-order exponential, and logarithmic functions have been discussed and fitted (59), but it is still an open issue what we should expect and which function to use (10).
To summarize, my good old "aunt" ANOVA is not as bad as Vickers suggests. She can do more (contrasting groups). She makes fewer problems than the younger relatives (HLMs, mixed models) do. They make life complex, whereas she gives you a K.I.S.S. (keep it safe and simple). Her limitations and statistical necessities are well known (who cant spell homogeneity?). If you do not ask more than ANOVA promises to give (to compare means between groups and/or over time) you will get reliable and interpretable answers (if you have learned to speak ANOVA).
Therefore, I conclude what is needed is a "family tree" of methods and (longitudinal) designs, including the related family of event occurrence analysis (Survival, Cox Regression). A decision tree or a mental map of the advantages and the shortcomings of available designs and their corresponding statistical methods could show researchers which option best matches their research questions. Such a tool would show that, among others, ANOVA is still a good choice for the analysis of RCTs when the issue is testing differences of group means, in a design with a fixed schedule of measurement.
DOI:10.1097/01.psy.0000199925.51075.36
REFERENCES
| ||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |