Psychosomatic Medicine Tips for Better Browsing
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS

This Article
Right arrow Figures Only
Right arrow Full Text
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Babyak, M. A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Babyak, M. A.
Related Collections
Right arrow Statistical Corner
Right arrow Reviews
Psychosomatic Medicine 66:411-421 (2004)
© 2004 American Psychosomatic Society


STATISTICAL CORNER

What You See May Not Be What You Get: A Brief, Nontechnical Introduction to Overfitting in Regression-Type Models

Michael A. Babyak, PhD

From Duke University Medical Center, Durham, NC.

Address correspondence and reprint requests to Michael A. Babyak, PhD, Department of Psychiatry and Behavioral Science, Duke University Medical Center, Box 3119, Durham, NC 27710. E-mail: michael.babyak{at}duke.edu

ABSTRACT

OBJECTIVE: Statistical models, such as linear or logistic regression or survival analysis, are frequently used as a means to answer scientific questions in psychosomatic research. Many who use these techniques, however, apparently fail to appreciate fully the problem of overfitting, ie, capitalizing on the idiosyncrasies of the sample at hand. Overfitted models will fail to replicate in future samples, thus creating considerable uncertainty about the scientific merit of the finding. The present article is a nontechnical discussion of the concept of overfitting and is intended to be accessible to readers with varying levels of statistical expertise. The notion of overfitting is presented in terms of asking too much from the available data. Given a certain number of observations in a data set, there is an upper limit to the complexity of the model that can be derived with any acceptable degree of uncertainty. Complexity arises as a function of the number of degrees of freedom expended (the number of predictors including complex terms such as interactions and nonlinear terms) against the same data set during any stage of the data analysis. Theoretical and empirical evidence—with a special focus on the results of computer simulation studies—is presented to demonstrate the practical consequences of overfitting with respect to scientific inference. Three common practices—automated variable selection, pretesting of candidate predictors, and dichotomization of continuous variables—are shown to pose a considerable risk for spurious findings in models. The dilemma between overfitting and exploring candidate confounders is also discussed. Alternative means of guarding against overfitting are discussed, including variable aggregation and the fixing of coefficients a priori. Techniques that account and correct for complexity, including shrinkage and penalization, also are introduced.

Key Words: statistical models, • regression, • simulation, • dichotomization, • overfitting.

Abbreviations: ANOVA = analysis of variance.




This article has been cited by other articles:


Home page
Diabetes CareHome page
B. D. Thombs and R. C. Ziegelstein
Diabetes, Depression, and Death: A Randomized Controlled Trial of a Depression Treatment Program for Older Adults Based in Primary Care (PROSPECT): Response to Bogner et al.
Diabetes Care, June 1, 2008; 31(6): e54 - e54.
[Full Text] [PDF]


Home page
Diabetes CareHome page
H. R. Bogner, K. H. Morales, E. P. Post, and M. L. Bruce
Diabetes, Depression, and Death: A Randomized Controlled Trial of a Depression Treatment Program for Older Adults Based in Primary Care (PROSPECT): Response to Thombs and Ziegelstein
Diabetes Care, June 1, 2008; 31(6): e55 - e55.
[Full Text] [PDF]


Home page
ANN INTERN MEDHome page
B. D. Thombs and R. C. Ziegelstein
The Effect of a Primary Care Practice-Based Depression Intervention on Mortality in Older Adults
Ann Intern Med, February 5, 2008; 148(3): 244 - 245.
[Full Text] [PDF]


Home page
Arch Gen PsychiatryHome page
N. Frasure-Smith and F. Lesperance
Depression and Anxiety as Predictors of 2-Year Cardiac Events in Patients With Stable Coronary Artery Disease
Arch Gen Psychiatry, January 1, 2008; 65(1): 62 - 71.
[Abstract] [Full Text] [PDF]


Home page
JCOHome page
J. C. Coyne and S. C. Palmer
More Claims About Depression, Immune Function, and Survival That Exceed the Evidence
J. Clin. Oncol., November 20, 2007; 25(33): 5328 - 5329.
[Full Text] [PDF]


Home page
JCOHome page
J. L. Steel, T. C. Gamblin, D. A. Geller, M. C. Olek, and B. I. Carr
In Reply
J. Clin. Oncol., November 20, 2007; 25(33): 5329 - 5331.
[Full Text] [PDF]


Home page
J Am Coll CardiolHome page
B. D. Thombs, K. Parakh, and R. C. Ziegelstein
Throw the Window Out the Door
J. Am. Coll. Cardiol., October 9, 2007; 50(15): 1519 - 1520.
[Full Text] [PDF]


Home page
Canadian J. AnesthesiaHome page
T. B. Comfere, J. Sprung, K. A. Case, P. T. Dye, J. L. Johnson, B. A. Hall, D. R. Schroeder, A. C. Hanson, M. E. S. Marienau, and D. O. Warner
Predictors of mortality following symptomatic pulmonary embolism in patients undergoing noncardiac surgery: [Les indicateurs de mortalite a la suite d'embolies pulmonaires symptomatiques chez des patients subissant une chirurgie non cardiaque]
Can J Anesth, August 1, 2007; 54(8): 634 - 641.
[Abstract] [Full Text] [PDF]


Home page
Psychosom. Med.Home page
J. Das-Munshi, R. Stewart, K. Ismail, P. E. Bebbington, R. Jenkins, and M. J. Prince
Diabetes, Common Mental Disorders, and Disability: Findings From the UK National Psychiatric Morbidity Survey
Psychosom Med, July 1, 2007; 69(6): 543 - 550.
[Abstract] [Full Text] [PDF]


Home page
Psychosom. Med.Home page
P. H. Wirtz, S. Elsenbruch, L. Emini, K. Rudisuli, S. Groessbauer, and U. Ehlert
Perfectionism and the Cortisol Response to Psychosocial Stress in Men
Psychosom Med, April 1, 2007; 69(3): 249 - 255.
[Abstract] [Full Text] [PDF]


Home page
Psychosom. Med.Home page
P. H. Wirtz, U. Ehlert, L. Emini, K. Rudisuli, S. Groessbauer, J. Gaab, S. Elsenbruch, and R. von Kanel
Anticipatory Cognitive Stress Appraisal and the Acute Procoagulant Stress Response in Men
Psychosom Med, November 1, 2006; 68(6): 851 - 858.
[Abstract] [Full Text] [PDF]


Home page
Psychosom. Med.Home page
J. Dayan, C. Creveuil, M. N. Marks, S. Conroy, M. Herlicoviez, M. Dreyfus, and S. Tordjman
Prenatal Depression, Prenatal Anxiety, and Spontaneous Preterm Birth: A Prospective Cohort Study Among Women With Early and Regular Care
Psychosom Med, November 1, 2006; 68(6): 938 - 946.
[Abstract] [Full Text] [PDF]


Home page
Psychosom. Med.Home page
P. Lindfors, O. Lundberg, and U. Lundberg
Allostatic load and clinical risk as related to sense of coherence in middle-aged women.
Psychosom Med, September 1, 2006; 68(5): 801 - 807.
[Abstract] [Full Text] [PDF]


Home page
Psychosom. Med.Home page
R. von Kanel, U. Hepp, C. Buddeberg, M. Keel, L. Mica, K. Aschbacher, and U. Schnyder
Altered Blood Coagulation in Patients With Posttraumatic Stress Disorder
Psychosom Med, July 1, 2006; 68(4): 598 - 604.
[Abstract] [Full Text] [PDF]


Home page
Eur Heart JHome page
J. P. van Melle, P. de Jonge, J. Ormel, H. J.G.M. Crijns, D. J. van Veldhuisen, A. Honig, A. H. Schene, M. P. van den Berg, and for the MIND-IT investigators
Relationship between left ventricular dysfunction and depression following myocardial infarction: data from the MIND-IT
Eur. Heart J., December 2, 2005; 26(24): 2650 - 2656.
[Abstract] [Full Text] [PDF]


Home page
Psychosom. Med.Home page
N. Frasure-Smith and F. Lesperance
Reflections on Depression as a Cardiac Risk Factor
Psychosom Med, May 1, 2005; 67(Supplement_1): S19 - S25.
[Abstract] [Full Text] [PDF]


Home page
Am J EpidemiolHome page
J. C. McGregor, P. W. Kim, E. N. Perencevich, D. D. Bradham, J. P. Furuno, K. S. Kaye, J. C. Fink, P. Langenberg, M.-C. Roghmann, and A. D. Harris
Utility of the Chronic Disease Score and Charlson Comorbidity Index as Comorbidity Measures for Use in Epidemiologic Studies of Antibiotic-resistant Organisms
Am. J. Epidemiol., March 1, 2005; 161(5): 483 - 493.
[Abstract] [Full Text] [PDF]


Home page
Psychosom. Med.Home page
P. J. Lustman, R. E. Clouse, P. S. Ciechanowski, I. B. Hirsch, and K. E. Freedland
Depression-Related Hyperglycemia in Type 1 Diabetes: A Mediational Approach
Psychosom Med, March 1, 2005; 67(2): 195 - 199.
[Abstract] [Full Text] [PDF]




HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Copyright © 2004 by the American Psychosomatic Society