Publications

ALPCAH: Subspace Learning for Sample-wise Heteroscedastic Data

Published in IEEE Transactions on Signal Processing (TSP), 2025

Principal component analysis (PCA) is a key tool in the field of data dimensionality reduction. However, some applications involve heterogeneous data that vary in quality due to noise characteristics associated with each data sample. Heteroscedastic methods aim to deal with such mixed data quality. This paper develops a subspace learning method, named ALPCAH, that can estimate the sample-wise noise variances and use this information to improve the estimate of the subspace basis associated with the low-rank structure of the data. Our method makes no distributional assumptions of the low-rank component and does not assume that the noise variances are known. Further, this method uses a soft rank constraint that does not require subspace dimension to be known. Additionally, this paper develops a matrix factorized version of ALPCAH, named LR-ALPCAH, that is much faster and more memory efficient at the cost of requiring subspace dimension to be known or estimated. Simulations and real data experiments show the effectiveness of accounting for data heteroscedasticity compared to existing algorithms. Code available at https://github.com/javiersc1/ALPCAH.

Recommended citation: J. Salazar Cavazos, J. A. Fessler and L. Balzano, "ALPCAH: Subspace Learning for Sample-wise Heteroscedastic Data," in IEEE Transactions on Signal Processing, doi: 10.1109/TSP.2025.3537867. keywords: {Heteroscedastic data; heterogeneous data quality; subspace basis estimation; subspace learning} http://javiersc1.github.io/files/paper_alpcah_journal.pdf

ALPCAH: Sample-wise Heteroscedastic PCA with Tail Singular Value Regularization

Published in Sampling Theory and Applications (SampTA), 2023

Principal component analysis (PCA) is a key tool in the field of data dimensionality reduction that is useful for various data science problems. However, many applications involve heterogeneous data that varies in quality due to noise characteristics associated with different sources of the data. Methods that deal with this mixed dataset are known as heteroscedastic methods. Current methods like HePPCAT make Gaussian assumptions of the basis coefficients that may not hold in practice. Other methods such as Weighted PCA (WPCA) assume the noise variances are known, which may be difficult to know in practice. This paper develops a PCA method that can estimate the sample-wise noise variances and use this information in the model to improve the estimate of the subspace basis associated with the low-rank structure of the data. This is done without distributional assumptions of the low-rank component and without assuming the noise variances are known. Simulations show the effectiveness of accounting for such heteroscedasticity in the data, the benefits of using such a method with all of the data versus retaining only good data, and comparisons are made against other PCA methods established in the literature like PCA, Robust PCA (RPCA), and HePPCAT. Code available at https://github.com/javiersc1/ALPCAH

Recommended citation: J. A. S. Cavazos, J. A. Fessler, and L. Balzano. ALPCAH: Sample-wise heteroscedastic PCA with tail singular value regularization. In Fourteenth International Conference on Sampling Theory and Applications, 2023. http://javiersc1.github.io/files/paper_alpcah.pdf