| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
STATISTICAL CORNER |
From Duke University Medical Center, Durham, NC.
Address correspondence and reprint requests to Michael A. Babyak, PhD, Department of Psychiatry and Behavioral Science, Duke University Medical Center, Box 3119, Durham, NC 27710. E-mail: michael.babyak{at}duke.edu
ABSTRACT
OBJECTIVE: Statistical models, such as linear or logistic regression or survival analysis, are frequently used as a means to answer scientific questions in psychosomatic research. Many who use these techniques, however, apparently fail to appreciate fully the problem of overfitting, ie, capitalizing on the idiosyncrasies of the sample at hand. Overfitted models will fail to replicate in future samples, thus creating considerable uncertainty about the scientific merit of the finding. The present article is a nontechnical discussion of the concept of overfitting and is intended to be accessible to readers with varying levels of statistical expertise. The notion of overfitting is presented in terms of asking too much from the available data. Given a certain number of observations in a data set, there is an upper limit to the complexity of the model that can be derived with any acceptable degree of uncertainty. Complexity arises as a function of the number of degrees of freedom expended (the number of predictors including complex terms such as interactions and nonlinear terms) against the same data set during any stage of the data analysis. Theoretical and empirical evidencewith a special focus on the results of computer simulation studiesis presented to demonstrate the practical consequences of overfitting with respect to scientific inference. Three common practicesautomated variable selection, pretesting of candidate predictors, and dichotomization of continuous variablesare shown to pose a considerable risk for spurious findings in models. The dilemma between overfitting and exploring candidate confounders is also discussed. Alternative means of guarding against overfitting are discussed, including variable aggregation and the fixing of coefficients a priori. Techniques that account and correct for complexity, including shrinkage and penalization, also are introduced.
Key Words: statistical models, regression, simulation, dichotomization, overfitting.
Abbreviations: ANOVA = analysis of variance.
This article has been cited by other articles:
![]() |
B. D. Thombs and R. C. Ziegelstein Diabetes, Depression, and Death: A Randomized Controlled Trial of a Depression Treatment Program for Older Adults Based in Primary Care (PROSPECT): Response to Bogner et al. Diabetes Care, June 1, 2008; 31(6): e54 - e54. [Full Text] [PDF] |
||||
![]() |
H. R. Bogner, K. H. Morales, E. P. Post, and M. L. Bruce Diabetes, Depression, and Death: A Randomized Controlled Trial of a Depression Treatment Program for Older Adults Based in Primary Care (PROSPECT): Response to Thombs and Ziegelstein Diabetes Care, June 1, 2008; 31(6): e55 - e55. [Full Text] [PDF] |
||||
![]() |
B. D. Thombs and R. C. Ziegelstein The Effect of a Primary Care Practice-Based Depression Intervention on Mortality in Older Adults Ann Intern Med, February 5, 2008; 148(3): 244 - 245. [Full Text] [PDF] |
||||
![]() |
N. Frasure-Smith and F. Lesperance Depression and Anxiety as Predictors of 2-Year Cardiac Events in Patients With Stable Coronary Artery Disease Arch Gen Psychiatry, January 1, 2008; 65(1): 62 - 71. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. C. Coyne and S. C. Palmer More Claims About Depression, Immune Function, and Survival That Exceed the Evidence J. Clin. Oncol., November 20, 2007; 25(33): 5328 - 5329. [Full Text] [PDF] |
||||
![]() |
J. L. Steel, T. C. Gamblin, D. A. Geller, M. C. Olek, and B. I. Carr In Reply J. Clin. Oncol., November 20, 2007; 25(33): 5329 - 5331. [Full Text] [PDF] |
||||
![]() |
B. D. Thombs, K. Parakh, and R. C. Ziegelstein Throw the Window Out the Door J. Am. Coll. Cardiol., October 9, 2007; 50(15): 1519 - 1520. [Full Text] [PDF] |
||||
![]() |
T. B. Comfere, J. Sprung, K. A. Case, P. T. Dye, J. L. Johnson, B. A. Hall, D. R. Schroeder, A. C. Hanson, M. E. S. Marienau, and D. O. Warner Predictors of mortality following symptomatic pulmonary embolism in patients undergoing noncardiac surgery: [Les indicateurs de mortalite a la suite d'embolies pulmonaires symptomatiques chez des patients subissant une chirurgie non cardiaque] Can J Anesth, August 1, 2007; 54(8): 634 - 641. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Das-Munshi, R. Stewart, K. Ismail, P. E. Bebbington, R. Jenkins, and M. J. Prince Diabetes, Common Mental Disorders, and Disability: Findings From the UK National Psychiatric Morbidity Survey Psychosom Med, July 1, 2007; 69(6): 543 - 550. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. H. Wirtz, S. Elsenbruch, L. Emini, K. Rudisuli, S. Groessbauer, and U. Ehlert Perfectionism and the Cortisol Response to Psychosocial Stress in Men Psychosom Med, April 1, 2007; 69(3): 249 - 255. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. H. Wirtz, U. Ehlert, L. Emini, K. Rudisuli, S. Groessbauer, J. Gaab, S. Elsenbruch, and R. von Kanel Anticipatory Cognitive Stress Appraisal and the Acute Procoagulant Stress Response in Men Psychosom Med, November 1, 2006; 68(6): 851 - 858. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Dayan, C. Creveuil, M. N. Marks, S. Conroy, M. Herlicoviez, M. Dreyfus, and S. Tordjman Prenatal Depression, Prenatal Anxiety, and Spontaneous Preterm Birth: A Prospective Cohort Study Among Women With Early and Regular Care Psychosom Med, November 1, 2006; 68(6): 938 - 946. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Lindfors, O. Lundberg, and U. Lundberg Allostatic load and clinical risk as related to sense of coherence in middle-aged women. Psychosom Med, September 1, 2006; 68(5): 801 - 807. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. von Kanel, U. Hepp, C. Buddeberg, M. Keel, L. Mica, K. Aschbacher, and U. Schnyder Altered Blood Coagulation in Patients With Posttraumatic Stress Disorder Psychosom Med, July 1, 2006; 68(4): 598 - 604. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. P. van Melle, P. de Jonge, J. Ormel, H. J.G.M. Crijns, D. J. van Veldhuisen, A. Honig, A. H. Schene, M. P. van den Berg, and for the MIND-IT investigators Relationship between left ventricular dysfunction and depression following myocardial infarction: data from the MIND-IT Eur. Heart J., December 2, 2005; 26(24): 2650 - 2656. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Frasure-Smith and F. Lesperance Reflections on Depression as a Cardiac Risk Factor Psychosom Med, May 1, 2005; 67(Supplement_1): S19 - S25. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. C. McGregor, P. W. Kim, E. N. Perencevich, D. D. Bradham, J. P. Furuno, K. S. Kaye, J. C. Fink, P. Langenberg, M.-C. Roghmann, and A. D. Harris Utility of the Chronic Disease Score and Charlson Comorbidity Index as Comorbidity Measures for Use in Epidemiologic Studies of Antibiotic-resistant Organisms Am. J. Epidemiol., March 1, 2005; 161(5): 483 - 493. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. J. Lustman, R. E. Clouse, P. S. Ciechanowski, I. B. Hirsch, and K. E. Freedland Depression-Related Hyperglycemia in Type 1 Diabetes: A Mediational Approach Psychosom Med, March 1, 2005; 67(2): 195 - 199. [Abstract] [Full Text] [PDF] |
||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |