External validation of a model is the process of assessing the generalizability or transportability of its predictions to an independent sample. The process of external validation may lead to rejection of a proposed model, or the proposed model may be accepted (validated) if the performance characteristics are deemed to be sufficient. Resampling methods are used for ‘internal validation’ of the model for data originating under the same setting. Though internal validation can be thought of as an estimate of external validation, it is not sufficient evidence of external validation. An independent dataset suitable for external validation generally has one or more of the following properties (Moons et al. (2012)):
- Temporal differences
- Data may be collected from the same locations, but over different periods of time.
- Geographic differences
- Data was collected from different locations.
- Institutional differences
- Data was collected from an organization not connected with the original source.
A principled approach to external validation follows these steps:
- Collect a suitable independent sample of sufficient size.
- Create a descriptive summary table that compares the characteristics of the original sample vs. the external sample.
- Compare prediction performance estimates for the following scenarios
- Original apparent: The original model applied to the original data.
- Original internally validated: The cross-validitory based estimate from the original model applied to the original data.
- Original externally validated: The original model applied to the new data.
- Compare model parameter estimates and prediction performance estimates from
- The original model
- The original model selection algorithm applied to the new data only
- The original model selection algorithm applied to the combined data (original + new)
- Potentially an updated model selection algorithm applied to the combined data (original + new)
- Discuss differences for these outcomes, whether they might be due to population differences, overfitting, underfitting, differences in data capture, extrapolation, etc.