Psychosomatic Medicine
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS

This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrowRequest Permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Petkova, E.
Right arrow Articles by Teresi, J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Petkova, E.
Right arrow Articles by Teresi, J.
Related Collections
Right arrow Other Epidemiology
Right arrow Aging
Psychosomatic Medicine 64:531-547 (2002)
© 2002 American Psychosomatic Society


ORIGINAL ARTICLES

Some Statistical Issues in the Analyses of Data From Longitudinal Studies of Elderly Chronic Care Populations

Eva Petkova, PhD and Jeanne Teresi, EdD, PhD

From the New York State Psychiatric Institute and Columbia University, Department of Biostatistics (E.P.), New York, New York.; Columbia University, Stroud Center, New York State Psychiatric Institute and Hebrew Home for the Aged at Riverdale (J.T.), Riverdale, New York.

Address reprint requests to: Eva Petkova, PhD, The New York State Psychiatric Institute and Columbia University, Department of Biostatistics, 1051 Riverside Drive, Unit 48, New York, NY 10032. Email: ep120{at}columbia.edu


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 OVERVIEW OF THE PROBLEMS
 STATISTICAL METHODS
 EXAMPLE
 SUMMARY AND DISCUSSION
 ACKNOWLEDGMENTS
 REFERENCES
 
OBJECTIVE: This article discusses broad statistical issues common to much medical research: intent-to-treat analysis vs. completers analysis; clustered hierarchical and repeated-measures data; missing data and dropouts; and assessment of direct, indirect, and total effects. Traditional approaches and statistical techniques are reviewed and contrasted with modern methods for analysis of medical studies.

METHOD: The concepts are introduced and discussed in general terms; they are illustrated with an example. The example comes from a study of the effect of residence in special care units (SCUs) for demented elderly on the daily function of nursing homes residents. More than 700 residents from 22 nursing facilities, residing in either an SCU or a non-SCU were assessed three times at approximate 6-month intervals.

RESULTS: Results from both the application of traditional statistical techniques and modern methods for the analysis of repeated-measures of hierarchical multicenter data are presented, interpreted, and compared. Advantages and shortcomings of these approaches are discussed.

CONCLUSION: This article advocates the use of mixed models and proper causal reasoning and terminology in the analysis and publication of results from studies on aging and life course.

Key Words: observational studies, • direct and total effects, • dropouts, • intent-to-treat, • longitudinal data, • mixed-effects models.

Abbreviations: ANOVA = analysis of variance;; LOCF = last observation carried forward;; RCT = randomized clinical trials;; SCU = special care unit;; ITT = intent-to-treat;; TR = treatment received.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 OVERVIEW OF THE PROBLEMS
 STATISTICAL METHODS
 EXAMPLE
 SUMMARY AND DISCUSSION
 ACKNOWLEDGMENTS
 REFERENCES
 
The purpose of this article is to revisit some statistical issues related to the analysis of longitudinal data, with a special focus on elderly chronic care populations. Although long known to statisticians, some of these issues are not frequently addressed in the nonstatistical literature. This article includes several main focuses: design issues (eg, randomized trials vs. observational studies), clustering, analytic strategies (eg, intent-to-treat and last observation carried forward), statistical methods (eg, analysis of variance and mixed models), and model assumptions (eg, patterns of longitudinal correlation and missing data).

Although randomized clinical trials arguably constitute the gold standard for much research, special problems associated with the study of elderly, chronic care populations in some settings result in the use of different designs. Observational studies remain a mainstay for investigation of many topics in aging (and nonaging) research.

A focus of the example provided below is the comparison of two methods of analysis of the longitudinal data from an observational study, of which the central goal was to estimate the difference in efficacy of two types of delivering care to elderly nursing home residents: special care units and traditional care. All statistical models are introduced under the rubric of the general linear model; however, some models are simple analysis of variance (ANOVA), whereas others include both fixed and random effects. Although the latter is introduced here as "modern", the modeling of random effects has a long history in other fields and, in fact, has been used extensively in reliability studies by psychologists. However, only recently have software and computational capabilities emerged that allow widespread use of these statistical models for analysis of large data sets. Moreover, although the relatively restrictive assumptions of traditional ANOVA models are well known, such models are still frequently used as a first-line method of analyzing longitudinal data from clinical trials. Similarly, last observation carried forward (LOCF), an approach discussed below as inadequate, is still widely used, even by sophisticated researchers: peer-reviewed medical journals to this day publish articles reporting the results from LOCF. In part, the disconnect between statisticians and investigators in terms of recommendations and practices is due to a combination of lack of knowledge, mistrust or discomfort with new statistical methods proposed, or the belief that the choice of models doesn’t really make much difference. The purpose of this article is to illustrate and discuss the ways in which the application of nonoptimal methods can and does make a difference.

One article cannot deal with all statistical methods of potential use in medical research; therefore, discussed here are statistical techniques for the analysis of clustered, hierarchical and/or longitudinal data, reflecting the fact that most medical studies collect data that fall into one or more of these categories. The overall presentation is concerned with conceptual rather than with technical aspects of different approaches, emphasizing the assumptions underlying the statistical techniques and the conditions under which they provide a basis for valid inference.


    OVERVIEW OF THE PROBLEMS
 TOP
 ABSTRACT
 INTRODUCTION
 OVERVIEW OF THE PROBLEMS
 STATISTICAL METHODS
 EXAMPLE
 SUMMARY AND DISCUSSION
 ACKNOWLEDGMENTS
 REFERENCES
 
Below, we list several features of medical studies as well as the potential problems related to each. A comprehensive discussion of these and other important advanced topics in medical research are accessibly presented by Piantadosi (1).

Randomized Clinical Trials and Observational Studies
Randomized clinical trials (RCT) are characterized by randomization and blinding. Randomization is the principal method used to reduce selection bias in comparative studies. Randomization is effective in this role because it guarantees, theoretically, that both observed and unobserved baseline differences between the treatment groups are attributable to chance, the effects of which can be quantified. After accounting for chance variations, the remaining differences can be attributed reliably to the treatments so long as other sources of bias have been eliminated.

Observational studies are those in which the investigators do not control the assignment of treatment to individual study subjects. Observational studies include cohort and case-control studies. In a cohort study, a group of people is followed through some period of time to study the occurrence (or nonoccurrence) of a specified event of interest. In a case-control study, the subjects who develop the disease (the case subjects) are registered by some mechanism other than follow-up, and a group of healthy subjects (the control subjects) is used to represent subjects who do not develop the disease, thus eliminating the need for follow-up. Observational, nonexperimental studies also can provide valid and convincing evidence of treatment efficacy. For example, epidemiological designs such as those just mentioned provide an efficient way in which to study rare events and can control some important sources of random error and bias as well.

Observational studies, however, have been viewed as providing a lower standard of evidence for treatment efficacy than randomized trials because without randomization, unknown confounders cannot be controlled. In this view, treatment comparisons from observational studies may be suggestive but do not provide definitive tests of efficacy. The relative merits of RCTs and observational studies have recently been debated in the literature (2), with some authors (eg, Ref. 3), suggesting that socially complex service interventions may not be amenable to study using a classic RCT approach. Using meta-analysis of both types of studies, investigators have challenged the widely accepted guideline that RCTs are always of a higher order and superior to case-control and cohort studies. These analyses provide evidence that well-designed observational studies can produce valid results (4, 5). In our view, RCTs remain the gold standard; however, RCTs often are compromised by difficulties in recruitment and by dropout. Many factors contribute to the subjects’ failure to complete intended therapy, including side effects, disease progression, patient or physician preference for a different treatment, and/or change of mind. Moreover, some topics are not amenable to study with an RCT due to practical or ethical considerations, for example, attempting to assess the impact of residence in special care units (SCUs) for demented adults (this example is used later in the article). A controlled clinical trial would require randomizing subjects to either an SCU or to a traditional unit on admission to the nursing home. Although theoretically not impossible, such a controlled experiment is unlikely to take place because of ethical and practical issues relating to the ability of demented individuals to give informed consent and family members’ unwillingness to subject their relatives to experimentation. In such instances, the best available evidence may be based on carefully performed and carefully analyzed observational studies.

Multicenter, Clustered, and Other Sampling Designs
Many medical studies are multicenter studies in which the data are collected from several sites. Generally, multicenter studies are performed because there is not a sufficient number of suitable patients in any one center or with the deliberate intention of assessing the effectiveness of treatments in more than a single setting. Data sets in which the observations have a natural grouping are termed hierarchical. Measurements on subjects coming from the same center might be correlated. Additionally, there is often extra variability in treatment effect estimates due to differences among centers. The analysis of such data needs to accommodate these features.

One such mode of accomodation is to include the center effect and the center by treatment effects in the model used to estimate the treatment effect. Traditionally, this has been accomplished by fitting some fixed effects model such as two-factor (treatment and center) ANOVA with an interaction term. Fitting center and center by treatment effects as fixed might be appropriate when local causal inference is required (ie, when inferences pertain only to the centers used in the study). On the other hand, when global inference is sought regarding the circumstances and locations, in which the experimental centers can be viewed as a sample from the universe of centers, center and center by treatment effects should be fitted as random (see the section on mixed effects models). A significant interaction term (center by treatment) is an indication that the treatment effect was different at different centers, thus requiring the estimation of several treatment effects. If the interaction term center by treatment can be omitted, estimating the center effects as random can increase the accuracy of the treatment estimate because information from the center error stratum is used in addition to that of the residual error stratum. The amount of extra information available will depend on the degree of treatment imbalance within the centers and the relative sizes of the error variances. In the analysis of multicenter trials, it is important to check whether results from any particular center constitute outliers. If a center outlier is present, it might be an indication of a severe protocol violation or it might point to a distinct characteristic of a particular center that makes the treatment particularly successful or unsuccessful. When modeling the centers as fixed effects, spurious outlying estimates due to random variation may occur, particularly in centers with a small number of study subjects. By contrast, estimates obtained using a random effects model adjust for this spurious effect by producing shrunken (toward the overall mean) estimates of center and center by treatment effects.

Dropouts, Missing Data, and Nonadherence
Most medical studies produce data that are less perfect than those stipulated by the study protocol. Data imperfections often are occasioned by inability to collect a measurement at a scheduled time. Another reason for missing data, especially in psychiatric and psychological studies, might be a subjects’ refusal to complete some of the tests, often a result of the time-consuming nature of some interviews and questionnaires.

Traditional statistical approaches to this challenge have included imputation of numbers in place of the missing values or the use of data only from individuals with complete observations. Both approaches suffer from ineffective use of available data, and the inferences based on either of them are vulnerable to selection bias (6, 7). Modern statistical methods offer an alternative by allowing all available information from every study subject, weighted appropriately, to be incorporated in the analysis. Thus, the selection bias due to missing data, although not completely eliminated, is minimized.

Analysis "as Randomized" vs. "as Treated"
As discussed above, the analysis usually will be complicated by an unknown number of subjects who fail to appear for some assessment visits or who withdraw from the study before it is completed. Most work regarding the treatment of nonadherence has focused on ways of measuring or preventing nonadherence and on ways of improving statistical estimates of treatment effects when adherence is a problem. There is a continuing debate about the advantages and problems in the analysis of trials based on treatment assigned, as contrasted with treatment received (8, 9). Treatment-received (TR) connotes an approach in which subjects are analyzed only according to the treatment actually given, even if the randomization called for something else. In statistical literature, TR classically refers to the particular case when the two compared treatments are a new therapy vs. no therapy or standard/default treatment. In such cases, when a subject drops out from the new therapy, the subject is assumed to have switched to the alternative experimental therapy. The TR analysis then treats the dropouts from the new therapy as receiving the standard/default therapy. Many medical studies, however, compare the new treatment to a treatment that is not the standard/default treatment—the placebo treatment, given in many clinical trials, is not the standard treatment that would have been received if the patient was not in the study. Therefore, it would be inappropriate to treat the dropouts from the new treatment group as having received the comparison experimental therapy. In these cases, the dropouts would not be analyzed as part of any experimental group in the TR analysis. Thus, in such cases, the TR analysis is a "completers" analysis or "complete cases analysis."

Intent-to-treat (ITT) is the view that patients in a randomized clinical trial should be analyzed as part of the treatment group to which they were assigned even if they did not actually receive the intended treatment. The term "intent-to-treat" seems to have been originated by Hill in 1961 (10) and has been associated with the adage "once randomized, always analyzed." It can be defined generally as an analysis that "...includes all randomized patients in the groups to which they were randomly assigned, regardless of their adherence to the entry criteria, regardless of the treatment they actually received, and regardless of subsequent withdrawal from treatment or deviation from the protocol..." (11). Thus, ITT is an approach to several types of protocol nonadherence.

Investigators conducting clinical trials cannot guarantee that participants will always complete (or even receive) the treatment assigned. In nearly all circumstances, failure to complete the assigned therapy is partially an outcome of the study and, therefore, may produce nonignorable bias if used as a basis for selection of individuals for subanalysis (eg, completers of the treatment protocol). From this perspective, a clinical trial with ITT analysis provides a test of treatment policy and is not a test for the actual treatment. Inferences based on ITT refer to and are always unbiased only with respect to statements about the policy or programmatic efficacy but are not necessarily unbiased for treatment efficacy. Of course, the inference of the treatment policy will be valid only for populations with similar compliance pattern. For example, inference based on a sample from a population with high compliance (inpatients) should not be used to make treatment policy about populations with poor compliance (outpatients).

The limitations of the ITT approach have been discussed extensively (12, 13). For example, breakdown of the experimental paradigm can render an analysis plan based on ITT irrelevant for answering biological questions. This does not mean that TR analysis would provide the best solution because TR analyses are subject to biases of their own. The valid performance of the TR approach depends on adjusting for all covariates related to nonadherence or dropout. The investigator would have to be aware of all factors responsible for patients’ failure to obtain an assigned treatment and incorporate those factors in correct statistical models describing the treatment effect. This often is not feasible because investigators do not know the reasons why, or the covariates associated with, patients’ failure to successfully initiate or complete the assigned treatment.

There are a number of statistical methods for modeling noncompliance in intervention studies. In the case of randomized clinical trials, several approaches are available, a discussion of which is beyond the scope of this article: instrumental variables approach (14), latent class approach (15), and structural nested models (16). All these methods make use of the randomization as an instrumental variable. In the case of observational studies, the propensity score approach (17) provides a useful supplement to the usual ITT analysis and can often recover valid estimates of effectiveness when nonadherence is present (13). Propensity score analysis is based on the use of covariates to adjust for baseline differences among the compared groups. When TR analysis is applied, the differences between groups may be due not solely to chance, but to other factors related to compliance. In this approach a "propensity for compliance" (PC) score is computed based on all available information, and the comparison between the groups is performed having adjusted for this score. The resulting estimate of treatment effect will contain less bias than would have the unadjusted estimate. Potential for bias in the treatment effect still exists because of the possibility that important information related to compliance has not been collected and has not been incorporated in the PC score. However, the construction of the PC score helps to ensure that the estimate of the treatment effect contains minimum selection bias for given available information in subgroup analysis. Although a PC score analysis is a covariate adjustment method, it is the adjustment that has the best statistical properties among all possible such adjustments for bias reduction in observational studies (18), although it may not be the best method for RCTs.

The inferences drawn from a clinical trial and the validity of these inferences are a consequence of both the experimental design and the method of analysis. There are legitimate biological questions that cannot be answered effectively even using rigorous designs. Such are the attempts to estimate the efficacy of treatments that produce high dropout rates; for example, an exposure-based behavioral therapy for obsessive-compulsive disorder (OCD), where the patient is forced to confront the obsessive-compulsive behaviors, such as frequent hand-washing and checking. The dropout rate from such treatment is very high, between 50% and 60% (19), and is due to the patients’ difficulty in complying with the exposure required by the therapy. On the other hand, practically all completers of the treatment procedure are responders, ie, the severity of the OCD is reduced at least 50% compared with baseline, and many of the subjects are complete responders (they exhibit no OCD symptoms at the study end). From the patients’ point of view, it is important to know the chances for response if they actually undergo the whole course of the therapy; it would make a difference in patients’ commitment to stay in the treatment if they knew that the response rate among completers of therapy is 90%, 50%, or 25%. From a data-analytic point of view, the completers of the treatment are a subset of the initial random sample selected after the treatment has been initiated; therefore, the estimated treatment effect using only the treatment completers might be partly or entirely due to some unknown potential confounding factor. Nevertheless, in such circumstances, it is not wise to insist on an ITT analysis; instead, an approximate answer to the correct question will be more useful than the exact answer to the wrong question.


    STATISTICAL METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 OVERVIEW OF THE PROBLEMS
 STATISTICAL METHODS
 EXAMPLE
 SUMMARY AND DISCUSSION
 ACKNOWLEDGMENTS
 REFERENCES
 
Traditional Methods
The mainstay of analysis of both observational and clinical trials data has been the general linear model (cf. Ref. 20). Included under this general rubric are multiple regression (including path analysis), ANOVA, analysis of covariance (ANCOVA), repeated-measures ANOVA, and multivariate analysis of covariance (MANCOVA). A common feature of these statistical techniques is that they model the treatment groups’ means or mean changes in the outcome over time and not individual rates of change. These methods are based on strong assumptions about the data, many of which are typically not satisfied by medical research data. These assumptions are as follows.

Equal Variance of the Outcome Across Treatment Groups.
Experience shows that often in placebo-controlled clinical trials, the group treated with active medication has larger variance at study end than does the placebo-treated group. The reason for this is that although the subjects in two treatment groups start at similar severity levels according to the study inclusion/exclusion criteria, the group treated with active medication usually responds to the treatment, resulting in increased variance of the outcome, whereas the variance of the untreated group remains almost as at baseline. Although it is possible to model heterogeneous variances with traditional methods, such models are not optimal (21).

Balanced Data.
When repeated observations are taken, all study subjects should have the same number of measurements. Because data from most medical studies include missing values due either to occasional missed visits or dropout, the application of these methods requires the use of data from subjects with complete measurements only or imputation of missing data. The most commonly used data imputation technique is termed last observation carried forward. For every missing value, this method substitutes the last observed value of the variable. For example, in a 6-week antidepressant clinical trial, if the subject dropped out of the study at the end of week 2, the ITT analysis based on end of the study data will replace the missing week 6 measurement with the LOCF, ie, the value observed at the end of week 2. The LOCF technique of data imputation could be badly biased (22) in many common situations, and it is unfortunate that it is still so widely used. Alternative data imputation methods are available but rarely used, although often they would introduce less bias than LOCF. Such alternatives include imputing a number randomly selected from the observed end-of-study values on subjects with similar baseline characteristics as the individual who dropped out or imputing a number randomly selected from the observed end-of-study values of subjects with values at the second week (last time the individual who dropped out was observed) similar to the individual who dropped out. No one imputational technique is uniformly superior to the others over all circumstances encountered in medical studies, but it should be emphasized that LOCF is perhaps one of the worst of all alternatives. A good statistical practice involves performing the analysis based on imputed data using a variety of data imputation techniques and comparing the results. Consistency of the inference over a wide range of data imputation methods guarantees more certainty of the findings than does LOCF application alone.

Equal time intervals between repeated observations on the same unit.
The classic repeated-measures ANOVA model treats time of measurement as a nominal factor; this has several consequences. Let the time of the first measurement for all subjects be denoted by zero; for example, the factor "time" denotes time in treatment. If the second measurement on one subject is 7 days later as scheduled by protocol and the second measurement of another subject is 9 days later, that is, 2 days later than scheduled by protocol, it is not appropriate to assume that the measurements on these two subjects are taken at the same level of the factor "time." But this is exactly what is postulated when repeated-measures ANOVA analysis is applied to such data. Another consequence of treating time as a nominal factor, as opposed to a continuous variable, is that it presents some difficulties in understanding the mechanism through which time affects outcome. For example, it is less straightforward to assess a quadratic effect of time through classic repeated-measures ANOVA that assumes the levels of time to be nominal categories than it is through the application of methods that treat time as a continuous variable and at the same time allow for the repeated observations to be mutually correlated. Although the classic linear regression assumes independence among all observations, linear models can accommodate correlated data through the application of appropriate weights (23). Still, when estimating fixed effects for time, the standard error for the estimate of treatment effect should be adjusted to include between- and within-subject variation. However, when there are missing data this adjustment will not be adequate unless additional correction is made for the degrees of freedom.

Equal correlations between repeated observations on a subject.
The classic repeated-measures ANOVA is based on the assumption that every two observations on the same subject are as equally correlated as are any other two observations on the subject. This type of equal correlations is termed "compound symmetry." This assumption is at least questionable when the repeated factor is time: the correlation between severity of depression measured at two occasions 1 week apart is likely to be higher than the correlation between two depression measurements taken 6 weeks apart.

Although these traditional methods of analysis provide unbiased estimators of the regression parameters (ie, the parameters of interest), the estimates of the variances of the estimates of these parameters are biased, and, thus, inferences based on them (eg, hypothesis testing and estimation with confidence intervals) might be invalid. Reviews in the clinical literature discuss the problems inherent in typical repeated-measures ANOVA analysis as applied to mental health data (21, 24). In summary, these include violations of assumptions, eg, the assumption of compound symmetry; inability to handle unequally spaced or missing data; and/or inability to model individual differences in slopes (rates of change). Problems in the application of traditional multivariate models when data are missing due to unbalanced designs (subjects are measured at different time intervals) or incomplete data (subjects are missing from some waves of data) have lead to the recommended use of mixed-effects modeling, of which growth curve analysis and repeated measures can be viewed as special cases.

Mixed-Effects Models
The key distinguishing feature of mixed-effects models (MEMs) (2428) compared with traditional statistical methods used in medical research is that they are based on less restrictive assumptions. Among the assumptions that are relaxed in MEMs are those related to the correlation between observations. MEMs allow flexible modeling of the covariance structure of the data and thus can adequately model data in which the observations are not independent. MEMs are also less restrictive in that they can be applied to unbalanced data and to repeated measurements taken at unequal (between and within subjects) time intervals.

There are several types of MEMs that are mathematically equivalent: the random effects model, the random coefficient model, and the covariance pattern model. According to the random effects model, certain effects in the model are assumed to have arisen from a specified underlying distribution and thus to constitute another source of random variation in addition to the residual variance. An example is the center effect in multicenter trials as discussed in the section on multicenter, clustered, and other sampling designs.

In random coefficients models, the covariate effect is allowed to vary randomly. For example, in a 6-week antidepressant trial, the measurement focus might be on the rate of improvement over the course of 1.5 months under the experimental treatments, instead of merely on the response at the study end. The random coefficients model allows the effect of time to vary randomly among patients, corresponding to a subject-specific rate of improvement or a subject-specific slope. This is technically achieved by fitting patients and the patients by time interaction as random; these effects are then referred to as "random coefficients."

The covariance pattern model is a type of MEM that directly models a pattern of correlations among observations. For example, in the hypothetical 6-week antidepressant trial with weekly measurements, we can allow the correlations among the repeated observations on the same subject to follow a certain pattern. An obvious choice is for the lagged autocorrelations to decrease with time distance between the observations according to some monotone (eg, linear) function. Suitable covariance pattern models provide insight into the nature of the correlations among the repeated observations that leads to more accurate estimates of the fixed effects.

In summary, some of the advantages of using MEMs instead of traditional approaches in the analysis of medical studies are as follows:

Although MEMs are based on less restrictive assumptions about the data than are the traditional statistical methods for medical research, the validity of results obtained with MEMs still depends on whether certain assumptions are satisfied. The most important of these assumptions regards the mechanism of missingness, or dropout (ie, the reason for dropout). If the missingness/dropout is not related to the potential outcome (that cannot be observed when the subject drops out), then the missingness is called "ignorable" and also "noninformative" because it does not inform about the unobserved outcome. Otherwise it is called "nonignorable," or "informative" (29). For example, if a subject in an antidepressant trial does not complete the study because of committing suicide, the dropout is likely to be nonignorable, or informative; if, on the other hand, the same subject died from being hit by an object accidentally dropped from a window, the dropout is likely to be ignorable, or noninformative.

Treatment effect estimates based on MEMs are unbiased if the missingness/dropout is ignorable. In many medical studies, the implication of the reasoning about ignorability and informativeness above may seem circular because to decide whether the missingness/dropout is or is not ignorable, the researcher needs to know the outcome, which cannot be observed because the subject missed the visit or dropped out. This circularity in statistical reasoning is called an "untestable" assumption. Often the assumption of ignorable or noninformative missingness/dropout underlying the MEMs will be untestable based on the available data. This means that in many situations, the researcher cannot be sure whether the estimate of the treatment effect obtained through MEMs is unbiased.

In all instances, however, some reduction of the bias due to missingness/dropout is made possible by collecting relevant information, which is used to identify factors related to the missingness/dropout, and adjusting for the identified factors. The extent to which violation of the assumption about the ignorability of the missingness/dropout can affect the inference about treatment effect can be assessed by performing sensitivity analysis. Here, "sensitivity" refers to the sensitivity of the inference to the untestable assumptions about the dropout. Generally, the sensitivity analysis will consist of obtaining estimates of the treatment effect assuming a variety of mechanisms of missingness/dropout and comparing them with the estimate based on the assumption of ignorable missingness/dropout. For example, in an antidepressant study, the researcher may assume that at the time of dropout, 1) the subject was very depressed and thus unable to come for treatment, 2) the subject was in remission and thus did not feel a need for treatment, or 3) some mixture of the first two options, possibly dependent on the treatment. Dramatic differences between the treatment effects obtained under different assumptions indicate sensitivity of the results to the untestable assumptions made in the course of the analysis and will render the inferences less reliable. High sensitivity to the assumption of ignorable missingness/dropout is usually observed in studies with high missingness/dropout rates, but also in studies with relatively low missingness/dropout rates when the dropout is strongly related to the unobserved outcome, ie, when the dropout is informative (30). Finally, dropout that is more prevalent in one of the treatment groups than in the other (differential dropout) does not necessarily produce biased estimates of the efficacy because the dropout, although differential, may still be ignorable (ie, not related to the outcome of interest).

A problem with the use of increasingly complex statistical methods is the difficulty in communicating the results, thus there is a danger of obfuscation rather than elucidation. Estimation methods for MEMs are more complex than are those of traditional ANOVA methods, and the results therefore can be more difficult to justify to a nonstatistical audience. It is not usually realistic to describe the exact methodology, but a satisfactory explanation can often be given by emphasizing the key point that MEMs take into account the covariance structure or interdependence of the data and, hence, they provide more appropriate analysis.


    EXAMPLE
 TOP
 ABSTRACT
 INTRODUCTION
 OVERVIEW OF THE PROBLEMS
 STATISTICAL METHODS
 EXAMPLE
 SUMMARY AND DISCUSSION
 ACKNOWLEDGMENTS
 REFERENCES
 
The data used in this example are taken from the New York State site of the National Institute on Aging Collaborative Studies of Special Care Units for Alzheimer’s Disease (31, 32). This was a decade-long investigation of the impacts associated with specialized care for individuals with dementia. Major findings of this study are summarized in a recent issue of Research and Practice in Alzheimer’s Disease (33).

The main aim of the study was to investigate the impact of SCUs on the quality of life of demented and nondemented elderly residents in long-term care facilities. The focus of this analysis is on one aspect of quality of life: functional disorder. Here, the main goal is to compare functional decline over time between SCU and non-SCU residents.

Design of the Study
The study was an observational study that used a nonequivalent experimental comparison group design, with non-SCU residents matched (to the extent possible) to SCU residents in terms of cognitive status. There were three waves of data: baseline, first follow-up, and second follow-up. The planned interval among the waves of data collection was 6 months; however, this was not always the case in that there were unequal time intervals between waves of data collection. The sampling scheme involved two levels of clustering: within each nursing unit and within each nursing facility. Therefore, modeling the covariance structure over time has to take into account correlations due to repeated measures on the same subject, correlations due to residence in the same unit, which may have different effects depending on the type of unit involved (SCU or non-SCU), and, finally, correlations among subjects in SCUs and non-SCUs due to their residence in the same facility. Figure 1 is a schematic representation of the clustering of the study data: Figure 1, left, represents the covariance among individuals from a facility with only non-SCU residents, and Figure 1, right, represents the covariance among individuals from a facility with an SCU and a non-SCU. In the first case, the sampling has introduced covariance among the individuals due to their residence in the same unit; it is assumed to be the same between any two individuals and is denoted by {varsigma}0. The covariance of the three observations on the same subject is denoted by the 3 x 3 covariance matrix {Sigma}0. In the second case, when subjects are sampled both from the SCU and the non-SCU in the same facility, an additional association among individuals in the two units, due to presence of both units in the same facility, is introduced and is denoted by {varsigma} on Figure 1, right. Note that within-subject covariance of the three observations on each SCU resident, denoted by {Sigma}1, is allowed to be different from the within-subject covariance for non-SCU residents, {Sigma}0, as are the associations among subjects due to residence in the same unit: {varsigma}1 for SCU and {varsigma}0 for non-SCU residents.



View larger version (15K):
[in this window]
[in a new window]
 
Fig. 1. Schematic presentation of the covariances. {Sigma}0 and {Sigma}1 are 3 x 3 covariance matrices.

 
Sample
Facility sample.
All nursing homes in New York state were surveyed to determine whether they had an SCU, using standard criteria developed by the National Institute on Aging (34). A random sample of New York facilities, stratified in terms of their maintaining or not maintaining a dementia SCU, was then selected. The overall response rate among facilities was 70%; the sample consisted of 22 facilities, 11 of which contained SCUs.

Resident sample.
From each of the 22 facilities, a random sample of 50 non-SCU residents was selected and screened for cognitive impairment using a cognitive screening measure (35). They were classified as either cognitively impaired or as nonimpaired using a standard cut score of 24. Within each facility, a random sample of 20 of the cognitively impaired non-SCU residents and a random sample of 10 of the cognitively intact subjects were selected for this study. From each of the 11 facilities with SCUs, an additional random sample of 20 residents from the SCUs were selected.

The study sample was comprised of 857 people; a total of 706 subjects (186 SCU and 520 non-SCU residents) completed at least two waves of data; of the 706 subjects, 417 (105 SCU and 302 non-SCU residents) provided observations from all three occasions on all of the variables used in the analysis that follows. The response rates at the individual level varied according to the instrument, but were generally between 80% and 95%.

Variables
Functional limitation.
Function, the outcome variable, was measured using a modification of the Performance Activities of Daily Living (PADL) (36). The PADL measures capability to perform various upper and lower body tasks associated with basic activities of daily living such as eating, dressing, and grooming. The total score reflects overall impairment, weighted by the need for cuing and by the amount of time spent completing the task. The scores for all tasks were then standardized, and a scoring algorithm was used to obtain a total score for the PADL measure. Missing data algorithms were applied to proportionately weight items according to the amount of missing data. If more than 50% of the PADL items were missing due to refusal, the scale score was coded as missing. Individuals who were unable to attempt tasks due to severe cognitive impairment were given scores reflecting more impairment than individuals who attempted to perform the task but failed. The internal consistency of the PADL for this sample was 0.95. The final standardized score used in the analysis was a variable ranging in value from -10 to +15. Small (negative) values indicate relatively less functional limitation, and larger positive values indicate higher levels of functional disability.

Several covariates were included initially in the analysis: age, gender, length of stay in the facility, cognitive status, and comorbidity. The last two of these are described below.

Cognition.
Cognition was measured using the Mattis NIA Research Dementia Rating Scale (37, 38), a widely used neuropsychological test of cognitive impairment. For these analyses, the total scale was used. The internal consistency, estimated using Cronbach’s {alpha}, was 0.95 in this sample. The range of scores developed using this measure was from -3 to 144; small and negative values correspond to low cognitive abilities, and larger values indicate higher cognitive abilities.

Comorbidity.
Comorbidity was measured using the Minimum Data Set (39). An unweighted comorbidity index was constructed for this data set equaling the sum of all comorbidities the subject exhibited. The maximum number observed in this sample was 10 of a possible maximum score of 25.

Time in the facility.
This measures the time spent in the facility before baseline (in months). Table 1 shows that compared with non-SCU residents, SCU residents at baseline were significantly older, had lower cognition, more functional limitations, and fewer comorbidities, and had been in the facility for significantly less time.


View this table:
[in this window]
[in a new window]
 
Table 1. Comparison of SCU and Non-SCU Residents on Study Variables at Baselinea
 
All variables listed in Table 1 were examined. Variables were selected based on 1) whether they differed between SCU and non-SCU residents at baseline, 2) their theoretical importance, and 3) their noncollinearity, as evidenced by examination of zero-order correlations and tolerances. Examination for skewness indicated that no transformations were necessary.

Results From the Traditional Analysis
This section illustrates the application of traditional statistical methods to estimate the effect of residence in an SCU on the functioning of demented individuals by comparing the level of functional limitation at the study end between SCU and non-SCU residents.

Examination of the attrition rates showed that of the 706 subjects with baseline data on functioning, only 460 had observations on functioning at the time of the last study visit (ie, at the second wave), 129 from SCUs (129/188 = 0.69) and 331 from non-SCUs (331/518 = 0.64). Two approaches were possible: 1) application of a method that would have imputed the missing wave 2 data or 2) analysis that used only the data from subjects who had available both baseline and wave 2 observations. The first approach, corresponding to the ITT principle, is usually executed by applying the LOCF imputational technique. The second approach corresponds to the TR principle. The general merits and shortcomings of each of these approaches were discussed in the sections on analysis "as randomized" vs. "as treated" and traditional methods. In this particular study, the functional limitations are unlikely to decrease with time; thus, the question really is about the rate of increase of limitations in functioning. Also, it is known that the most of the missing data were due to death. Therefore, application of the LOCF imputation of missing functional limitation values at the second data collection wave with either wave 1 data or baseline data would likely have biased the estimates. The percentage of missing wave 2 functional limitation data in the non-SCU group was slightly larger than that encountered in the SCU group, possibly related to the higher comorbidity level among the non-SCU residents. This could lead to bias in the estimated difference between the two groups. It could be argued that such bias is less likely to be introduced by the second approach of using only available data because the missingness (due to death) might not be related to the outcome of interest (ie, functioning). Unfortunately, available data did not permit tests of this assumption, and arguments for the opposite view can be made as well. Therefore, reported below are the results of both the ITT analysis based on LOCF and the TR analysis.

A two-sample t test or one-factor ANOVA, where the factor is residence or nonresidence in an SCU, can be used to compare the mean functional limitations of SCU and non-SCU residents at the end of the study. A linear regression model with one predictor, the indicator for residence in an SCU, is equivalent to these methods; thus, for comparability with models presented later, we use the linear regression notations. In its simplest form, this model can be expressed as


where Yi2 is the functional limitation at the end of the study (second follow-up data) for the ith study subject, scui is an indicator for residence in SCU (equal to 1 if the ith subject is from SCU, and equal to 0 otherwise), and {epsilon}i is the error term for the ith subject. Here, ß0 corresponds to the end of study average functional limitation for the non-SCU group, and ß1 is the difference in average functional limitations between SCU and non-SCU residents at the study end.

Because baseline values are usually associated with the outcome at the end of the study and because of the differences at baseline between the two groups, an adjustment can be made for baseline functioning by using ANCOVA or, equivalently, via a linear regression model by adding an additional baseline effect in the model represented by Equation 1:


where Yi0 is the baseline value of functional limitation for the ith subject.

Adjustment can be made for other baseline covariates as well as in the model represented by Equation 2:


where X1i0, X2i, X3i, X4i, and X5i0 are baseline cognition, age, gender, duration of stay in the facility, and baseline comorbidity, respectively. Some of the terms in Equation 3 are not statistically significant and therefore might be omitted from the analysis. However, sometimes when the purpose of the study is to replicate previously reported results or to test a specific, theoretical model, retention of all terms in the model facilitates the comparison.

An important point is that more often than not, Equation 2 actually should contain an interaction term between the treatment variable, scu here, and the baseline value of the outcome variable, Y0 here. This is because of floor effects found in many medical studies; larger effects on the additive scale can be observed on subjects with higher baseline severity (ie, individuals who start at a higher level of severity have the potential to improve more than do those that start at a lower severity level). Indeed, in the current study, the interaction effect between SCU status and baseline functional limitations was statistically significant in the model represented by Equation 2 in both the ITT and the TR analyses. However, to keep the focus on the major statistical issues of this article, these results are not discussed.

The models have not yet taken into account the fact that the data were collected at different facilities. It is possible that measurements taken in some facilities may tend to be consistently different than those taken in other facilities. Such differences could be due, for example, to intersite differences in the training and supervision of personnel. It is also possible that some facilities might admit persons at different levels of severity of dementia and functioning and could therefore, on average, have higher or lower functional limitations. Allowance could be made for this possibility by adding a facility effect to Equation 3:


where fj, j = 1, 2,...22, is the effect of the jth facility in which subject i resides. Thus, part of the residual term in Equation 3 may now be explained by facility effect fj. If there are differences among the facilities, the model represented by Equation 4 will have a smaller error variance than was provided by the previous equations, thus allowing effects to be estimated with greater accuracy. However, estimating 22 fixed parameters for each nursing home leads to a loss of 22 degrees of freedom and thus may result in a larger error variance for the treatment effect parameter. In fact, this is what happens in the current example. Table 2 shows that the standard error for the estimate of ß1 in Equation 4 is larger than the error variance for Equation 3. Additionally, there are times when such "correction" is not desirable. For example, in the current study, the individual facilities were viewed as a random sample of all nursing homes for elderly individuals, and the goal was to generalize the results to all such institutions. Therefore, estimating each individual facility effect is not desirable. Instead, it is preferable to estimate the variance of the facility effects and to make inferences about the variability among nursing homes; therefore, the random effects model is more appropriate. Note that Equation 4 does not take into account the possibly differential effect of facility on SCU and non-SCU units, but only allows overall differences among facilities; the differential effect could be examined by introducing the interaction terms between the indicator variable for SCU and the facility indicators. Such interaction terms would be estimable only for the nursing homes that have both an SCU and a non-SCU (ie, 11 of the 22 facilities in this sample).


View this table:
[in this window]
[in a new window]
 
Table 2. Estimates of the Effect of Residence in SCU on the Limitation in Function Using the Traditional Analysesa
 
Studies of dementia indicate that cognition is a strong determinant of the level of functional limitations among individuals with the disease. Additionally, residence in SCUs has been shown to be related to accelerated decline of cognitive abilities, possibly due to cohabitation with severely cognitively impaired adults (40). This suggests that the effect of SCU residence on functioning might be decomposable into a sum of a direct effect of SCU and an indirect effect occurring through the effect of SCU on cognition. One can think of the direct effect as resulting from the specialized activities and programs in nursing homes for demented adults that are designed to keep the residents occupied, interested, and active. On the other hand, the indirect effect can be thought of as resulting from commingling with severely mentally impaired adults, which is a secondary function, or a side effect of being in an SCU that affects functioning through its effect on cognition. One can estimate the direct and indirect effects pertaining to any of the four models presented above. For the sake of simplicity, we report the results from decomposing the total effect estimated by the model represented by Equation 2, which adjusts the total effect only with respect to baseline functional limitations, Yi0. Equations 2 and 5 and Figure 2 correspond to this breakdown of the effect of interest into a direct and an indirect SCU effect through cognition.




View larger version (8K):
[in this window]
[in a new window]
 
Fig. 2. Total, direct, and indirect (through cognition) effects of SCU residence on functioning. For clarity of presentation, the paths involving baseline function and cognition are not shown.

 
Here X1i2 is the second wave cognition value for the ith individual. The first equation in the Equation 5 series permits estimating the direct effect of SCU residence on functioning at the second follow-up, ß1dir. In addition to all terms in the model for the total effect (Equation 2), this equation adjusts for cognition at second follow-up, thus reflecting our hypothesis that part of the observed functional limitations at the second follow-up is due to the cognitive decline from baseline to the study end. The second equation in the Equation 5 series relates cognition at the second follow-up to SCU status, adjusting for only these factors that have been used in the estimation of the total SCU effect (Equation 2), ie, baseline functioning. The third equation in the Equation 5 series describes functioning at the second follow-up as a function of cognition at the same time, which reflects the supposition that cognition affects functioning and not the other way around. This equation also includes a term for baseline functioning, as does the equation for the total effect (Equation 2) to render comparable the models for the total, direct, and indirect effects. The product {alpha}1 x {gamma}1 = ß1ind represents the indirect effect of SCUs on function through cognition. An important point is that the total effect of SCUs from Equation 21 = 2.02 from Table 2) is the sum of the direct effect of SCUs (ß1dir = 0.07) and indirect effect of SCUs acting through cognition 1ind = 1.95) in Equation 5. Notice that the direct effect is not significantly different from zero at {alpha} = 0.05 in either of the ITT and TR analyses, Table 2.

This section illustrates how traditional statistical analytic tools deal with the analysis of a typical medical study. Several shortcomings of these statistical approaches are apparent. First, not all of the data are used. For example, data from individuals who did not have wave 2 data were not included in the TR analysis, and the ITT analysis actually fabricated data that otherwise would be missing. Also, the data from the first follow-up were not analyzed; such data are, in fact, often ignored. Although they could have been analyzed in a fashion similar to that used for the second wave of data, the researcher then would have to report and interpret both sets of analysis, compare them, and reconcile any differences between them. Alternatively, one could apply a repeated-measures ANOVA to the data collected from subjects who completed all three observations. Such an approach would assume that all wave 1 observations were made at times equally distant from the baseline and that all wave 2 data were made at times equally distant from the wave 1 assessment times. In addition, in all models presented above, the validity of the emergent inferences is belied by an implicit assumption about the equality of variances of the measure of functioning across residency groups, units, and different values of the covariates. This assumption is common in the application of classic linear regression and of ANOVA-type models. However, as shown in the next section, this assumption is not satisfied in this example.

Results From Mixed-Effects Models Analysis
The flexibility of the MEMs allows use of all available data, selection of the most appropriate correlation structure for the measurements, and estimation of the total and direct effects of SCU on functioning using models adequate for the data. Two models are studied here: one for the total effect and one for the direct effect of SCU residence on functioning. These models are given below as Equations 6 and 7. Through the application of graphical and modeling techniques, the types of association between the outcome measure and time was first examined. Polynomial and other curvilinear functions were tested before choosing a linear relationship.

Before performing tests for the fixed effects, the covariance structure was examined and modeled. Modeling the covariance structure has several goals. First, an adequate modeling of the covariance among the observations ensures validity of the inference regarding the regression parameters of interest (ie, the effect of SCU residence on functioning). Second, the covariance structure characterizes the association among the outcome measures (eg, functioning at baseline, wave 1, and wave 2), identifies factors that affect it, and provides information about the phenomenon studied. For example, assessment and modeling of the lower correlation between repeated measures observed among residents in SCUs as contrasted with the correlation observed among residents in non-SCUs provides an important information that is not available from other analyses aimed at understanding the outcome. Such finding indicates less predictability of function over time among SCU residents than among non-SCU residents. Finally, explicit statements about the model that best fits the covariance structure provides information for planning, design, and analysis of future studies of the same phenomenon. The nested structure of the data represented in Figure 1 calls for the estimation of several covariance parameters: 1) the maximum of 12 parameters, 6 for each of the two possibly different 3 x 3 covariance matrices {Sigma}0 and {Sigma}1, or shld it just be (2 x ) reflecting the association between the three observations on each subject in SCUs and non-SCUs; 2) the parameters {varsigma}0 and {varsigma}1, possibly different, that describe the association between subjects from the same unit; and 3) the parameter {varsigma} modeling the association between subjects from different units within the same facility.

There are numerous possibilities for the form that could be assumed for the covariance matrices {Sigma}0 and {Sigma}1. They can have equal diagonal elements and equal off-diagonal elements, ie, they can be determined by only two parameters. Such a covariance matrix is appropriate when the variance of the outcome is the same at each observation time and when the correlations are the same between the outcome measured at baseline and first follow-up, baseline and the second follow-up, and at first and second follow-ups. This type of covariance pattern is called homogeneous compound symmetry and is assumed by the traditional repeated-measures ANOVA models. A less restrictive covariance pattern can allow for different variances of the outcome at the different observation times. A more adequate presentation of the correlations among repeated observations taken over time is achieved when the correlations are allowed to decrease when the time span between the measurements increases.

Different relationships among the three parameters mentioned above correspond to statements about the associations among the measurements in our data. For example, observing {varsigma} = 0 corresponds to a lack of association between subjects residing in the same facility but in different units, SCU and non-SCU, which is equivalent to a lack of facility effect. On the other hand, nonzero {varsigma}0 and {varsigma}1 corresponds to presence of correlation between subjects due to their residence in the same unit. This is equivalent to the presence of a unit effect; and {varsigma}0 != {varsigma}1 indicates a differential unit effect for SCUs and non-SCUs.

Note that the sampling structure and design of this study call for the simultaneous use of random effects together with covariance pattern mixed-effects models (see "Mixed-Effects Models"). The part of the model related to the parameter {varsigma} is of the type "random effects," assuming that the sampled facilities come from a population of nursing homes and represent a random sample of this population, about which inferences are to be made. The effect of each nursing home from this population is a random number, and it is the variance of these random numbers and not the individual numbers that is relevant to the inference about the SCU effect. Therefore, the MEM is used to estimate the variance of the random facility effects. In the same way, {varsigma}0 is the variance of the random effects of the population of non-SCUs, and the study uses the sampled non-SCUs to estimate this population variance, to be contrasted with estimating each individual non-SCU effect. Analogous considerations are made about the random effects of SCUs, the variance of which is denoted by {varsigma}1. On the other hand, the part of the model related to the 3 x 3 matrices {Sigma}0 and {Sigma}1 is of the type "covariance pattern" because it postulates a particular covariance for each subject’s three repeated observations. The covariance pattern that best describes the data can be identified by comparing models that assume a variety of different covariance structures, with respect to some criteria for goodness of fit.

A comparison of 17 structures, including compound symmetry and various decreasing functions of time for the correlation between repeated observations were examined using PROC MIXED in SAS software (41). Akaike’s Information Criteria and Schwarz’ Bayesian Criteria were used to compare the goodness of fit and to select the best correlation structure. The covariance model that best fit the data exhibited lack of facility effect ({varsigma} = 0) and presence of unit effect ({varsigma}0 = {varsigma}1 != 0). As expected, the variances of the outcome at different observation times were different within the same residency group and between the two residency groups (ie, {Sigma}0 != {Sigma}1), and both were heterogeneous (ie, had unequal variances on the diagonal). The correlations among repeated observations declined faster with time in the SCUs than they did in the non-SCUs.

To discuss the fixed effects part of the models, we adopt the following notations: Yik is the measure of functional limitations for the ith individual at visit k, where k = 0 corresponds to baseline, k = 1 corresponds to the first follow-up, and k = 2 corresponds to second follow-up; scui is the residency status of the ith subject, as in the section on results from the traditional analysis; timeik is the time since baseline of the kth visit for the ith subject (note that not all visits occurred at the time planned by the study protocol, and therefore timeik will not be the same for all i at a given k, k = 1, 2); as in the section on results from the traditional analysis, X1i0 is baseline cognition level for the ith subject, and X2i is the ith subject’s age at baseline.

The fixed part of the model for the total effect of SCU on functioning given below estimates the linear effect of time in residence on functional limitations, conditional on the baseline values. The model contains a quadratic effect of baseline cognition, reflecting a well-documented curvilinear relationship between functioning and cognition among demented individuals (40, 42).


This model is similar to the model represented by Equation 4 in that it conditions on baseline values by including the terms ß3X1i0 and ß4X2i. It is different from Equation 4 in the following ways: 1) it estimates rate of increase of functional limitations over time, ie, a slope of the increase of limitations over time; 2) it uses both follow-up waves of data; 3) it includes an interaction effect between time and indicator variable for residence in SCUs (ß12scui x timeik), which corresponds to different rates of increase in limitations between SCU and non-SCU residents (Figure 3); and 4) the covariance of the outcome measures assumed by these two models is different, ie, whereas Equation 4 assumes that the variance of the outcome at study end is the same in the two groups, Equation 6 estimates and uses different variances.



View larger version (13K):
[in this window]
[in a new window]
 
Fig. 3. Increase of functional limitations over time (Equation 6). For this figure, we have set the baseline cognition to a moderately impaired level (50) and the baseline age to about the average age for the entire sample (80 years).

 
As shown in Figure 3, the slope with respect to time is higher for patients from SCUs indicating that conditional on baseline cognition and age, the limitation of subjects residing in SCUs increases faster than the limitation of subjects from non-SCUs. At the same time, for every given combination of baseline cognitive level and age, the intercept of the line for SCU residents is smaller than is the intercept for non-SCU residents. This suggests that for a given baseline age and cognitive level, the SCU residents have fewer functional limitations at baseline. The two lines cross at approximately 10 months after baseline. This means that holding cognitive level the same, residents of SCUs initially have fewer functional limitations; however, after 10 months, residents of SCUs begin to exhibit more functional limitations than do residents of non-SCUs. Because the outcome measure is in standard units, the maximum change observed (three points) is meaningful. For example, such a change might mean more independence in several subtasks involved in eating or dressing; this change translates into better quality of life, more aide time available for other types of care, and some cost savings.

In Equation 6 the only explanatory variable that changed over time was time itself. To separate the direct effect of SCU residence from the indirect effect that occurs through the effect of residence on cognition, cognition (another time-varying explanatory variable) was included as a predictor in Equation 7. Cognition at the first and second follow-up is actually an outcome of residence, and, therefore, as discussed in "Analysis" "as randomized" vs. "as treated," it should not be used to perform a subgroup analysis, or as it is here, to adjust in the analysis when estimating the total residence effect. However, in this study, well-established knowledge about the course of dementia clearly places cognitive decline as preceding functional decline. This gives substantive validity to the attempt to separate the direct effect of residence associated with care and programs from the indirect effect due to cognitive decline. As shown in the prior section, using traditional analysis, examining only the total effect can be misleading because the indirect effect through cognition accounts for almost all of the effect of SCU status on functioning. The fixed part of the model for the direct effect of SCU on functioning given below includes two time-varying covariates: time and cognition.


Here, X1ik, k = 0, 1, 2, is the cognitive level of the ith subject at the kth observation. The interpretation of this model is complicated by the presence of the three-factor interaction between the indicator variable for SCU residence and the two time-varying predictors time and cognition (scui x timeik x X1ik). This model is graphically presented in Figure 4. The goodness of the linear fit was assessed and confirmed against quadratic and cubic alternatives. Each pair of lines on that figure corresponds to a fixed level of cognition. Therefore, the lines on the plots do not correspond to the course of increase of functional limitation over time for a particular individual because cognition does not stay fixed over time. Rather, these lines show the time effect on functional limitations for "virtual" residents from SCUs and from non-SCUs whose cognition does not change over time. This can be conceptualized as the direct effect of time on the limitations after the effect that occurs through cognition has been removed. This can also be interpreted as residence (SCU or non-SCU) and cognition acting as moderators of the direct effect of time in residence on functioning after the indirect effect of time in residence on functioning through cognition has beet accounted for. Residence in non-SCUs does not affect functioning directly, ie, it affects it only through the effect of time on cognition and cognition on functioning. Residence in SCUs affects functioning directly, and the effect depends on the subject’s cognitive level: for mildly and moderately cognitively impaired residents, the direct effect of time in residence on functioning is positive, and such subjects benefit more from residence in SCUs than from residence in non-SCUs, whereas for severely cognitively impaired subjects, residence in SCUs is not more beneficial than residence in non-SCU, and is detrimental when subjects are extremely cognitively impaired, ie, at the end stage of the disease.



View larger version (17K):
[in this window]
[in a new window]
 
Fig. 4. Direct effect of residence on functional limitations over time (Equation 7). The indirect effect of residence on functioning, occurring through cognition, has been controlled.

 

    SUMMARY AND DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 OVERVIEW OF THE PROBLEMS
 STATISTICAL METHODS
 EXAMPLE
 SUMMARY AND DISCUSSION
 ACKNOWLEDGMENTS
 REFERENCES
 
This article has focused on a comparison between traditional and modern methods in the analysis of health-related data. Many such data sets are based on observational studies with complex designs and sampling strategies and with attendant problems. For example, missing data may occur in one or more waves of data, data collection intervals are not always evenly spaced, and correlations among observations across follow-ups are not always equal. Additionally, the variances of outcome may differ between treatment and control groups, and treatment and comparison groups often may differ on key variables at baseline; moreover, these covariates may change over time. Finally, designs that include sampling of individuals from larger units (eg, residents from units and units from facilities or patients from doctors) violate assumptions regarding independence of observations and result in the so-called unit-of-analysis problem. These problems are not well (if at all) managed by traditional methods.

The pros and cons of ITT and TR analyses were reviewed; although policy analysts prefer ITT because it focuses on what happens globally (within the whole population) when a treatment policy is established, regardless of whether or not the individuals took the prescribed treatment, clinicians and patients may be more interested in individual outcomes of subjects who actually took the treatment, which corresponds to the TR analysis. TR analyses result in loss of more data, whereas ITT frequently relies on imputation of missing measurements that may bear little resemblance to the truth. A propensity score approach provides a way of modeling important explanatory factors contributing to dropout.

In the illustrative case, use of traditional methodology would lead to the conclusion that SCUs are deleterious in terms of functional decline. That is, the estimates of the total effect of SCUs for all traditional ANOVA and ANCOVA models were significant in the direction of there being more functional limitation in SCUs. However, even using traditional methods, it was seen that examination of total effects alone is misleading. Most of the effect of SCU status on functional decline is through cognitive status. Thus, the direct effect of SCU on decline was not significant, using either ITT or TR traditional approaches to analysis. The mixed model approach examining direct and indirect effects showed that there is actually a benefit of SCUs for some levels of cognitive function. At moderate through mildly severe levels of cognitive disorder, SCU members exhibited better function. At very severe levels, however, SCU members declined more rapidly, lending support to experts in dementia care who maintain that such units should be targeted for residents in the moderate ranges of dementia rather than for those at the very severe, end stage of the illness. These findings indicate that remaining in an SCU after the point of benefit is deleterious in terms of functional status; other findings from this data set (40) indicate that remaining too long in an SCU also can result in excess cognitive disability. The findings argue for targeting, tailoring, and discharge criteria. Yet national data (43) show that SCUs are populated by individuals with severe and profound cognitive impairment. For example, about one-third of SCU residents are at the very severe, end stages of the disease. It is unlikely that they can benefit from any activities or behavioral therapies at this point.

Several conclusions result from the analyses presented in the example. As is true in most analyses of observational data, the addition of covariates generally improved estimation (reduced standard errors) because, if not included in the model, systematic, uncontrolled variation between treatment and control groups introduced additional unexplained error. The use of mixed models resulted in additional benefits. First, mixed models allowed proper modeling of the clustering or nesting feature of the design. Second, time-varying covariates could be included. Third, stringent assumptions regarding compound symmetry, balance, and spacing of data collection could be relaxed. Fourth, better modeling of direct and indirect effects could be achieved. Fifth, MEMs provided better ability to handle regression dilution bias (regression toward the mean) because random intercepts and slopes (eg, different starting points affecting the rate of change) could be incorporated (44). Finally, more of the data could be used in the analysis because a subject could be included as long as there are at least two waves of data. Thus, there was no need either to use the LOCF imputation method or to delete all subjects with any missing data.

Although MEMs are generally to be preferred over traditional approaches, some caveats are noted. First, MEMs assume, just as all traditional statistical methods do, that data are missing at random (that the reasons for missing data are not related to the outcome). This assumption may not always be reasonable and should be investigated before proceeding. Second, mixed models assume multivariate normality of the random terms in the model, which may not always be the case; for example, the random center effects might have some nonsymmetric distribution. Third, the MEMs most commonly used in medical research present the outcome as a linear function of time, as opposed to quadratic or some other nonlinear function. Although nothing in the theory or the software for fitting MEMs requires linear functions, the complexity of MEMs and their estimating algorithm makes it more difficult to fit and interpret relationships of higher order than linear functions. However, some constructs may not follow a linear decline/increase pattern.

Finally, mixed models cannot account for unknown time of disease onset. For example, as shown in (42), if the duration of time between the (unmeasured) onset of cognitive decline and its first measured observation is correlated with explanatory (risk) factors, and the rate of decline is nonlinear, then confounding occurs and mixed models may not handle this well; if residents in SCUs had earlier disease onset than those in non-SCUs, they would be at a later stage of illness when the first observation is taken and might seem to decline more rapidly on cognitive tests. Some newer nonlinear methods (which are extensions of growth-curve models) that do not assume a known time of onset have been suggested for use in such circumstances (40, 45).

In summary, these analyses show that the two analytic approaches produce different results. The traditional model results would suggest a deleterious effect (or a null effect using the path analytic approach of examining direct and indirect effects), whereas the mixed model approach suggests a positive effect in earlier (mild through early severe) stages of illness, but a deleterious effect later. There are several reasons for these differences in results. First, the mixed procedure allows more efficient use of available data, thus increasing statistical power. Second, inclusion of time-varying covariates (cognition changes over time, which needs to be modeled) increases accuracy of estimation. Third, proper modeling of the covariance structure (the different variances of the SCU and non-SCU residents and the pattern of correlations over time) improves estimation. Finally, inferences based on analyses that do not take into account the design feature of clustering (modeling the correlation due to repeated measures and due to presence in the same unit and the same facility) may be invalid. For these reasons, it is recommended that MEMs be considered in the analysis of longitudinal experimental and observational data.


    ACKNOWLEDGMENTS
 TOP
 ABSTRACT
 INTRODUCTION
 OVERVIEW OF THE PROBLEMS
 STATISTICAL METHODS
 EXAMPLE
 SUMMARY AND DISCUSSION
 ACKNOWLEDGMENTS
 REFERENCES
 
This work was supported, in part, by National Institute on Aging Coordinating Center for the Collaborative Studies of Special Care Units for Alzheimer Disease Grant AG10330 and National Institute on Aging Grant AG08948. The authors thank Douglas Holmes for his helpful suggestions on several drafts of this manuscript.

Received for publication August 8, 2000.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 OVERVIEW OF THE PROBLEMS
 STATISTICAL METHODS
 EXAMPLE
 SUMMARY AND DISCUSSION
 ACKNOWLEDGMENTS
 REFERENCES
 

  1. Piantadosi S. Clinical trials: a methodological perspective. New York: John Wiley & Sons; 1997.
  2. Pocock SJ, Elbourne DR. Randomized trials or observational tribulations? N Engl J Med 2000; 342: 1907–9.[Free Full Text]
  3. Wolff N. Using randomized controlled trials to evaluate socially complex services: problems, challenges and recommendations. J Ment Health Policy Econ 2000; 3: 97–109.[Medline]
  4. Benson K, Hartz AJ. A comparison of observational studies and randomized, controlled trials. N Engl J Med 2000; 342: 1878–86.[Abstract/Free Full Text]
  5. Concato J, Shah N, Horwitz RI. Randomized, controlled trials, observational studies, and the hierarchy of research designs. N Engl J Med 2000; 342: 1887–92.[Abstract/Free Full Text]
  6. Wang-Clow F, Lange M, Laird NM, Ware JH. A simulation study of estimators for rates of change in longitudinal studies with attrition. Stat Med 1995; 14: 283–97.[Medline]
  7. Wu MC, Carroll RJ. Estimation and comparison of changes in the presence of informative right censoring: modeling the censoring process. Biometrics 1988; 44: 175–88.
  8. Newell DJ. Intention-to-treat analysis: implications for quantitative and qualitative research. Int J Epidemiol 1992; 21: 837–41.[Abstract/Free Full Text]
  9. Lewis JA, Machin D. Intention-to-treat: who should use ITT? Br J Cancer 1993; 68: 647–50.[Medline]
  10. Hill AB. Principles of medical statistics. 7th ed. London: The Lancet; 1961.
  11. Fisher LD, Dixon DO, Herson J, Frankowski RK, Hearron MS, Peace KE. Intention-to-treat in clinical trials. In: Peace KE, editor. Statistical issues in drug research and development. New York: Marsel Dekker; 1990.
  12. Feinstein AR. Intention-to-treat policy for analyzing randomized trials: statistical distortions and neglected clinical challenges. In: Cramer JA, Spilker B, editors. Patient compliance in medical practice and clinical trials. New York: Raven Press; 1991.
  13. Sheiner LB, Rubin DB. Intention-to-treat analysis and the goals of clinical trials. Clin Pharmacol Ther 1995; 57: 6–15.[Medline]
  14. Angrist JD, Imbens GW, Rubin DB. Identification of causal effects using instrumental variables. J Am Statistical Assoc 1996; 91: 444.
  15. Rubin DB. More powerful randomization-based p values in double blind trials with non-compliance. Stat Med 1998; 17: 371–85.[Medline]
  16. Mark SD, Robins JM. A method for the analysis of randomized trials with compliance information: an application to the multiple risk factor intervention trial. Control Clin Trials 1993; 14: 79–97.[Medline]
  17. Rosenbaum PR, Rubin DB. The central role of propensity score in observational studies. Biometrika 1983; 70: 41–55.[Abstract/Free Full Text]
  18. Rosenbaum PR, Rubin DB. Reducing bias in observational studies using subclassification on the propensity score. J Am Statistical Assoc 1984; 79: 516–24.
  19. Simpson HB, Gorfinkle KS, Liebowitz MR. Cognitive-behavioral therapy as an adjunct to serotonin reuptake inhibitors in OCD: an open trial. J Clin Psychiatry 1999; 60: 584–90.[Medline]
  20. Cohen J. Multiple regression as a general data-analytic system. Psychol Bull 1968; 70: 426–43.
  21. Gibbons RD, Hedeker D, Elkin I, Waternaux C, Kraemer H, Greenhouse JB, Shea TM, Imber SD, Sotsky SM. Some conceptual and statistical issues in analysis of longitudinal psychiatric data. Arch Gen Psychiatry 1993; 50: 739–50.[Abstract/Free Full Text]
  22. Lavori PW, Dawson R, Shera D. A multiple imputation strategy for clinical trials with truncation of patient data. Stat Med 1995; 14: 1913–25.[Medline]
  23. Carroll RJ, Ruppert D. Transformation and weighting in regression. New York: Chapman & Hall; 1988.
  24. Gibbons RD, Hedeker D, Waternaux C, David JM. Random regression models: a comprehensive approach to the analysis of longitudinal psychiatric data. Psychopharmacol Bull 1988; 24: 438–43.[Medline]
  25. Laird NM, Ware JH. Random effects models for longitudinal data. Biometrics 1982; 38: 963–74.[Medline]
  26. Zeger SL, Liang K-Y. Overview of methods for analysis of longitudinal data. Stat Med 1992; 11: 1825–39.[Medline]
  27. Diggle PJ, Liang K-Y, Zeger SL. Analysis of longitudinal data. Oxford: Oxford University Press; 1994.
  28. Brown H, Prescott R. Applied mixed models in medicine. Chichester, England: John Wiley & Sons; 1999.
  29. Little RJ, Rubin DB. Statistical analysis with missing data. New York: John Wiley & Sons; 1987.
  30. Liu X, Waternaux C, Petkova E. Influence of HIV infection on neurological impairment: analysis of longitudinal binary data with informative dropout. J Royal Statistical Soc Series C Appl Statistics 1999; 48: 103–15.
  31. Holmes D, Teresi JA. Relating personnel cost in special care units and in traditional care units to resident characteristics. J Ment Health Policy Econ 1998; 1: 31–40.[Medline]
  32. Ory MG. Dementia special care units: the development of a national research initiative. Alzheimer Dis Assoc Disord 1994; 8 (Suppl 1): S389–94.
  33. Holmes D, Teresi JA, Ory M. Special care units: overview of the volume. Res Practice Alzheimer Dis 2000; 4: 7–17.
  34. Holmes D, Teresi JA, Monaco C. Special care units in nursing homes: prevalence in five states. Gerontologist 1992; 32: 191–6.[Abstract]
  35. Folstein M, Folstein E, McHugh P. Mini-mental state: a practical guide for grading the cognitive state of patients for the clinician. J Psychiatr Res 1975; 12: 189–98.[Medline]
  36. Kuriansky J, Gurland B. The Performance Test of Activities of Daily Living. Int J Aging Human Develop 1976; 7: 343–52.
  37. Mattis S. Mental status examination for organic mental syndrome in the elderly patient. In: Bellak L, Karasu TB, editors. Geriatric psychiatry: a handbook for psychiatrists and primary care physicians. New York: Grune & Stratton; 1976. p. 77–121.
  38. Dementia Rating Scale. Lutz (FL): Psychological Assessment Resources; 1988.
  39. Morris JN, Hawes C, Fries BE, Phillips CD, Mor U, Kats S, Murphy K, Drusovich ML, Friedlob AS. Designing the National Resident Assessment Instrument for Nursing Homes. Gerontologist 1990; 39: 293–307.
  40. Liu X, Teresi JA, Waternaux C. Modeling the decline pattern in functional measures from a prevalent cohort study. Stat Med 2000; 19: 1593–606.[Medline]
  41. SAS Institute Inc. SAS/STAT Software: changes and enhancements through release 6.11. Cary (NC): SAS Institute; 1996.
  42. Milliken JK, Edland SD. Mixed effects models of longitudinal Alzheimer’s disease data: a cautionary note. Stat Med 2000; 19: 1617–29.[Medline]
  43. Teresi JA, Moris JN, Mattis S, Reisberg B. Cognitive impairment among SCU and non-SCU residents in the United States: prevalence estimates from the National Institute on Aging Collaborative Studies of Special Care Units for Alzheimer’s Disease. Res Practice Alzheimer Dis 2000; 4: 117–38.
  44. Brookmeyer R, Zeger S. Statistical issues in prevention and therapeutic trials of Alzheimer’s disease. Alzheimer Dis Assoc Disord 1996; 10 (Suppl): 27–30.
  45. Liu X, Tsai W, Stern Y. A functional decline model for prevalent cohort data. Stat Med 1996; 15: 1023–32.[Medline]



This article has been cited by other articles:


Home page
Journal of Special EducationHome page
P. L. Morgan, M. L. Frisco, G. Farkas, and J. Hibel
A Propensity Score Matching Analysis of the Effects of Special Education Services
Journal of Special Education, February 1, 2010; 43(4): 236 - 254.
[Abstract] [PDF]


Home page
JAMAHome page
R. F. Riemersma-van der Lek, D. F. Swaab, J. Twisk, E. M. Hol, W. J. G. Hoogendijk, and E. J. W. Van Someren
Effect of Bright Light and Melatonin on Cognitive and Noncognitive Function in Elderly Residents of Group Care Facilities: A Randomized Controlled Trial
JAMA, June 11, 2008; 299(22): 2642 - 2655.
[Abstract] [Full Text] [PDF]


Home page
Psychosom. Med.Home page
E. Blackwell, C. F. M. de Leon, and G. E. Miller
Applying Mixed Regression Models to the Analysis of Repeated-Measures Data in Psychosomatic Medicine
Psychosom Med, November 1, 2006; 68(6): 870 - 878.
[Abstract] [Full Text] [PDF]


Home page
Psychosom. Med.Home page
M. G. Ory and M. Chesney
Aging and the Life-Course: Advancing Psychosomatic Medicine Research
Psychosom Med, May 1, 2002; 64(3): 367 - 369.
[Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrowRequest Permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Petkova, E.
Right arrow Articles by Teresi, J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Petkova, E.
Right arrow Articles by Teresi, J.
Related Collections
Right arrow Other Epidemiology
Right arrow Aging


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS