Abstract
High-dimensional datasets frequently suffer from cellwise contamination — a form of outlier corruption affecting individual cells rather than entire observations. This paper conducts a systematic comparative analysis of robust variable selection procedures, evaluating their performance under varying contamination rates and dimensions.
Methods Compared
- LASSO with robust scale estimators
- Elastic Net with cellwise-robust preprocessing
- Sparse PCA with contamination-resistant covariance estimation
- Adaptive LASSO incorporating MCD-based weights
Key Findings
Under moderate contamination (10–15%), cellwise-robust preprocessing consistently improves variable selection accuracy by 18–34% compared to naive implementations. At high contamination rates (>25%), sparse PCA-based approaches demonstrate superior stability.
Citation
Journal: Dhaka University Journal of Science
DOI: 10.3329/dujs.v73i2.82773
Year: 2025