# Robust Mixed Linear Regression using heterogeneous batches

Presented at University of Washington, Seattle, WA, USA, 2020

By exploiting the similarities across tasks, one can hope to overcome data scarcity. Under a canonical scenario where each task is drawn from a mixture of $k$ linear regressions, we study a fundamental question: Can abundant small-data tasks compensate for the lack of big-data tasks? This work introduces a spectral approach that is simultaneously robust under both data scarcity and outlier tasks. We design an outlier-robust principal component analysis algorithm that achieves an optimal accuracy. This is followed by a sum-of-squares algorithm to exploit the information from higher order moments. Together, this approach is robust against outliers and achieves a graceful statistical trade-off.

This project resulted in two publications: