Montreal, Canada during NeurIPS 2018
Posted on:
Visited Montreal, Canada with Microsoft Research Labmates to attend and present at NeurIPS 2018 Read more
Posted on:
Visited Montreal, Canada with Microsoft Research Labmates to attend and present at NeurIPS 2018 Read more
Posted on:
Visited Melbourne, Australia to attend and present at WSDM 2019 Read more
Posted on:
Visited Vancouver, Canada to attend NeurIPS 2019 and present at SEDL 2019 Read more
Posted on:
Blog in progress. Read more
Posted on:
The project aims at using different recommendation methods for different kinds of real world data like rating matrices, images and text, using Deep Learning and Optimization. Read more
Posted on:
We study the problem of sparse regression where the goal is to learn a sparse vector that best optimizes a given objective function. Under the assumption that the objective function satisfies restricted strong convexity (RSC), we analyze Orthogonal Matching Pursuit (OMP) and obtain support recovery result as well as a tight generalization error bound for OMP. Furthermore, we obtain lower bounds for OMP, showing that both our results on support recovery and generalization error are tight up to logarithmic factors. To the best of our knowledge, these support recovery and generalization bounds are the first such matching upper and lower bounds (up to logarithmic factors) for any sparse regression algorithm under the RSC assumption. Read more
Posted on:
This work proposes and demonstrates a surprising pattern in the training of neural networks: there is a one to one relation between the values of any pair of losses (such as cross entropy, mean squared error, \(0/1\) error etc.) evaluated for a model arising at (any point of) a training run. This pattern is universal in the sense that this one to one relationship is identical across architectures (such as VGG, Resnet, Densenet etc.), algorithms (SGD and SGD with momentum) and training loss functions (cross entropy and mean squared error). Read more
Posted on:
This is an attempt to understand how stochasticity in an optimization algorithm affect generalization properties of a Neural Network. Read more
Posted on:
For the problem of learning Mixed Linear Regression, this work introduces a spectral approach that is simultaneously robust under both data scarcity and outlier tasks. Read more
Posted on:
Stochastic optimization algorithms are used to train large (Deep) Neural Networks ((D)NNs). The non-linear training dynamics of two-layer NNs can be described as a mean-field interacting particle system where the “particles” are the neurons in the single hidden layer. Wasserstein gradient flows often arise from such mean-field interactions among exchangeable particles. This relies on the observation that the permutation symmetry of the neurons in the hidden layer allows the problem to be viewed as an optimization problem over probability measures. Going beyond, multi-layer NNs can be considered as large computational graphs and therefore can possess different groups of symmetries. This body of work aims to describe analytical scaling limits of stochastic optimization algorithms as the size of the network grows. We use and develop on the existing theory of exchangeable arrays, graphons (analytical limits of dense graphs), the general theory of gradient flows on metric spaces, and insights from propagation of chaos to characterize this scaling limit. We discover a generalized notion of the McKean-Vlasov equation on graphons where the phenomenon of propagation of chaos holds. In the asymptotically zero-noise case, this limit is a gradient flow on the metric space of graphons. Read more
The paper has been accepted for an oral persentation (84/511 submissions ≈ 16% Acceptance Rate). Read more
The paper has been accepted for Spotlight presentation. Read more
We study the distribution of the Stochastic Gradient Noise during the training and observe that for batch sizes \(256\) and above, the distribution is best described as Gaussian at-least in the early phases of training. Read more
The paper has been accepted for a presentation. Read more
The paper has been accepted for a poster. Read more
The paper is published at the Journal of Theoretical Probability. Read more