Psychosomatic Medicine Faster Service from Outside North America
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS

Right arrow Help viewing high resolution images
Right arrow Return to article
Click on image to view larger version.



Figure 2. Pure noise variables still produce good R2 values if the model is overfitted. The distribution of R2 values from a series of simulated regression models containing only noise variables. The model contained 15 predictors, each consisting of randomly generated values, and a response variable, whose values were also randomly generated. Thus, the true model has an R2 of 0. Four sets of 10,000 random samples were drawn, each of sample size N = 50, N = 100, N = 150, and N = 200. The smoothed frequency distribution of the R2 values generated by each of the 10,000 models is plotted here for the 4 sample size conditions. Note that even when the number of cases per predictor is reasonably good (200/15=13.3), there are, solely because of the chance of the draw, a fair number of non-0 R2 values. When there were only approximately 50/15=3.3 observations per predictor, the frequency of large R2 values was quite high.





Right arrow Return to article


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS