Scaling Limits of Algorithms on Large Matrices
Ph.D. thesis, University of Washington, Seattle, WA, USA
This work has been accepted as my Ph.D. thesis at the University of Washington. Please find the full text of the thesis here.
Ph.D. thesis, University of Washington, Seattle, WA, USA
This work has been accepted as my Ph.D. thesis at the University of Washington. Please find the full text of the thesis here.
Research project, University of Washington, Seattle, WA, USA
The non-linear training dynamics of two-layer NNs can be modeled as a mean-field interacting particle system, where neurons in the hidden layer act as “particles.” These dynamics often lead to Wasserstein gradient flows, treating the problem as an optimization over probability measures due to the permutation symmetry of neurons. Extending this to multi-layer NNs, which exhibit more complex symmetries as large computational graphs, this work describes the analytical scaling limits of stochastic optimization algorithms as network size grows. By leveraging the theory of exchangeable arrays, graphons, gradient flows on metric spaces, and propagation of chaos, we characterize this scaling limit. We discover a generalized McKean-Vlasov equation on graphons, where propagation of chaos holds, and in the zero-noise limit, this scaling limit becomes a gradient flow on the metric space of graphons.
Research project, University of Washington, Seattle, WA, USA
For the problem of learning Mixed Linear Regression, this work introduces a spectral approach that is simultaneously robust under both data scarcity and outlier tasks.
Research project, Microsoft Research Lab - India, Bengaluru, India
This is an attempt to understand how stochasticity in an optimization algorithm affect generalization properties of a Neural Network.
Research project, Microsoft Research Lab - India, Bengaluru, India
This work proposes and demonstrates a surprising pattern in the training of neural networks: there is a one to one relation between the values of any pair of losses (such as cross entropy, mean squared error, \(0/1\) error etc.) evaluated for a model arising at (any point of) a training run. This pattern is universal in the sense that this one to one relationship is identical across architectures (such as VGG, Resnet, Densenet etc.), algorithms (SGD and SGD with momentum) and training loss functions (cross entropy and mean squared error).
Research project, Microsoft Research Lab - India, Bengaluru, India
We study the problem of sparse regression where the goal is to learn a sparse vector that best optimizes a given objective function. Under the assumption that the objective function satisfies restricted strong convexity (RSC), we analyze Orthogonal Matching Pursuit (OMP) and obtain support recovery result as well as a tight generalization error bound for OMP. Furthermore, we obtain lower bounds for OMP, showing that both our results on support recovery and generalization error are tight up to logarithmic factors. To the best of our knowledge, these support recovery and generalization bounds are the first such matching upper and lower bounds (up to logarithmic factors) for any sparse regression algorithm under the RSC assumption.
Bachelor Thesis Project, Department of Mathematics, IIT Guwahati, Indian Institute of Technology Guwahati
The project aims at using different recommendation methods for different kinds of real world data like rating matrices, images and text, using Deep Learning and Optimization.