Abstract

High-dimensional datasets frequently suffer from cellwise contamination — a form of outlier corruption affecting individual cells rather than entire observations. This paper conducts a systematic comparative analysis of robust variable selection procedures, evaluating their performance under varying contamination rates and dimensions.

Methods Compared

  • LASSO with robust scale estimators
  • Elastic Net with cellwise-robust preprocessing
  • Sparse PCA with contamination-resistant covariance estimation
  • Adaptive LASSO incorporating MCD-based weights

Key Findings

Under moderate contamination (10–15%), cellwise-robust preprocessing consistently improves variable selection accuracy by 18–34% compared to naive implementations. At high contamination rates (>25%), sparse PCA-based approaches demonstrate superior stability.

Citation

Journal: Dhaka University Journal of Science
DOI: 10.3329/dujs.v73i2.82773
Year: 2025