American Statistical Association
A model selection criterion is often formulated by constructing an approximately unbiased estimator of an expected discrepancy, a measure that gauges the separation between the true model and a fitted candidate model. The expected discrepancy reflects how well, on average, the fitted candidate model predicts “new” data generated under the true model. A related measure, the estimated discrepancy, reflects how well the fitted candidate model predicts the data at hand. In general, a model selection criterion consists of a goodness-of-fit term and a penalty term. The natural estimator of the expected discrepancy, the estimated discrepancy, corresponds to the goodness-of-fit term of the criterion. However, the estimated discrepancy yields an overly optimistic assessment of how effectively the fitted model predicts new data. It therefore serves as a negatively biased estimator of the expected discrepancy. Correcting for this bias leads to the penalty term. Specifically, the penalty term provides an approximation to the expectation of the difference between the expected discrepancy and the estimated discrepancy, a measure known as the expected optimism.
Classical approaches to approximating the expected optimism often lead to simplistic penalty terms based on the sample size and the dimension of the fitted candidate model. However, such approaches generally involve large-sample arguments, restrictive assumptions on the form of the candidate model, or both. The resulting penalty terms may fail to perform adequately in small-sample applications or in settings where the requisite assumptions do not hold. Modern computational statistical methods, such as Monte Carlo simulation, bootstrapping, and cross validation, facilitate the development of flexible and accurate estimators of the expected optimism. Model selection criteria based on such penalty terms often provide more realistic measures of predictive efficacy than their classical counterparts, thereby resulting in superior model determinations.
In this talk, we review the general paradigm for discrepancy-based model selection criteria, and discuss computationally intensive approaches to approximating the expected optimism. We illustrate the utility of some of the resulting criteria in a simulation study based on the state-space time series modeling framework, and in an application from dentistry that involves generalized linear models.
|Date:||Wednesday, February 25, 2009|
Presentation: 11:00 A.M. - 12:00 P.M.
Discussion: 12:00 - 12:30 P.M.
Mailman School of Public Health
Department of Biostatistics
722 West 168th Street
Biostatistics Computer Lab
6th Floor - Room 656
New York, New York