By exploiting the similarities across tasks, one can hope to overcome data scarcity. Under a canonical scenario where each task is drawn from a mixture of \(k\) linear regressions, we study a fundamental question: Can abundant small-data tasks compensate for the lack of big-data tasks? This work introduces a spectral approach that is simultaneously robust under both data scarcity and outlier tasks. We design an outlier-robust principal component analysis algorithm that achieves an optimal accuracy. This is followed by a sum-of-squares algorithm to exploit the information from higher order moments. Together, this approach is robust against outliers and achieves a graceful statistical trade-off.
This project resulted in two publications: