A Variable Selection Algorithm for Creating Replicable Simple Structure Factor Solutions
Factor analysis is commonly employed to identify and explore underlying factor structures. Unfortunately, when variable selection is involved, results often fluctuate across studies making it difficult to determine the "best" and most replicable factor structure. We propose a new factor analytic variable selection algorithm that examines the reproducibility, quality, and model fit of the factor solutions using training and validation subsamples. Observed variables are considered for removal sequentially from a factor structure based on commonly used statistical measures and practices in factor analysis, such as the magnitude of the primary and secondary loadings, model fit statistics, and under-representation of factors. This study also proposes a new statistic, known as the D-score, which measures the relative "distance" between the primary and secondary factor loadings and is used as part of the variable selection process. Using simulated data, this study explored the algorithm's performance across 108 samples from 36 factor structures varying in model complexity, interfactor correlation magnitudes (p = 0,.3,& .6) and sample size conditions (n = 300,500,& 1000). Overall, the algorithm's performance was a function of the model complexity, interfactor correlation, and sample size. Poor model performance often occurred with highly correlated factors, small sample sizes, and complex factor structures under the algorithm defaults. As with other variable selection algorithms, changes to the algorithm's criteria can yield increased (or decreased) model performance and change the final model selected. Implications and discussion of the algorithm's results are provided.