# Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

## Pages

## Posts

## Note on the Kadison-Singer Problem and its Solution

** Posted on: **

The Kadison-Singer problem arose from the work on quantum mechanics done by Paul Dirac in the 1930s. The problem is equivalent to fundamental problems in areas like Operator theory, Hilbert and Banach space theory, Frame theory, Harmonic Analysis, Discrepancy theory, Graph theory, Signal Processing and theoretical Computer Science. The Kadison-Singer problem had been long standing and defied the efforts of most Mathematicians until it was recently solved by Adam Wade Marcus, Daniel Alan Spielman and Nikhil Srivastava in 2013. ** Read more**

## A note on Conformal Symplectic and Relativistic Optimization

** Posted on: **

This note on a spotlight paper at NeurIPS 2020, has been made while I had been reading the literature on the principle connections between continuous and discrete optimization. The motivation is to understand and create accelerated discrete large scale optimization algorithms from first principles via considering the geometry of phase spaces and numerical integration, specifically symplectic integration. Recent works successfully have been able to throw sufficient light on the two and therefore has attracted my attention. ** Read more**

## Geometry of Relativistic Spacetime Physics

** Posted on: **

This article introduces and describes the mathematical structures and frameworks needed to understand the modern fundamental theory of Relativistic Spacetime Physics. The self-referential and self-contained nature of Mathematics provides enough power to prescribe a rigorous language needed to formulate the building components of the standard Einstein’s General Theory of Relativity like Spacetime, Matter, and Gravity, along with their behaviors and interactions. In these notes, we will introduce and understand these abstract components, starting with defining the arena of smooth manifolds and then adding the necessary and suffcient differential geometric structures needed to build the primers to the General Theory of Relativity. ** Read more**

## Dual spaces and the Fenchel conjugate

** Posted on: **

Dual spaces lie at the core of linear algebra and allows us to formally reason about the concept of duality in mathematics. Duality shows up naturally and elegantly in measure theory, functional analysis, and mathematical optimization. In this post, I have tried to learn and explore the nature of duality via Dual spaces, its interpretation in general linear algebra, all of which was motivated by the so called *convex conjugate*, or the *Fenchel conjugate* in mathematical optimization. ** Read more**

## A survey on Strongly Rayleigh measures and their mixing time analysis

** Posted on: **

Strongly Rayleigh measures are natural generalizations of measures that satisfy the notion of negative dependence. The class of Strongly Rayleigh measures provides the most useful characterization of Negative Dependence by grounding it in the theory of multivariate stable polynomials. This post attempts to throw some light on the origin of Strongly Rayleigh measures and Determinantal Point Processes and highlights the fast mixing time analysis of the natural MCMC chain in the support of a Strongly Rayleigh measure as shown by Anari, Gharan and Rezaei 2016. ** Read more**

## Analysis of Newton’s Method

** Posted on: **

In optimization, Netwon’s method is used to find the roots of the derivative of a twice differentiable function given the oracle access to its gradient and hessian. By having super-liner memory in the dimension of the ambient space, Newton’s method can take the advantage of the second order curvature and optimize the objective function at a quadratically convergent rate. Here I consider the case when the objective function is smooth and strongly convex. ** Read more**

## Deriving the Fokker-Planck equation

** Posted on: **

In the theory of dynamic systems, Fokker-Planck equation is used to describe the time evolution of the probability density function. It is a partial differential equation that describes how the density of a stochastic process changes as a function of time under the influence of a potential field. Some common application of it are in the study of Brownian motion, Ornstein–Uhlenbeck process, and in statistical physics. The motivation behind understanding the derivation is to study Levy flight processes that has caught my recent attention. ** Read more**

## SGD without replacement

** Posted on: **

This article is in continuation of my previous blog, and discusses about the work by Prateek Jain, Dheeraj Nagaraj and Praneeth Netrapalli 2019. The authors provide tight rates for SGD without replacement for general smooth, and general smooth and strongly convex functions using the method of exchangeable pairs to bound Wasserstein distances, and techniques from optimal transport. ** Read more**

## Non-asymptotic rate for Random Shuffling for Quadratic functions

** Posted on: **

This article is in continuation of my previous blog, and discusses about a section of the work by Jeffery Z. HaoChen and Suvrit Sra 2018, in which the authors come up with a non-asymptotic rate of \(\mathcal{O}\left(\frac{1}{T^2} + \frac{n^3}{T^3} \right)\) for Random Shuffling Stochastic algorithm which is strictly better than that of SGD. ** Read more**

## Bias-Variance Trade-offs for Averaged SGD in Least Mean Squares

** Posted on: **

This article is on the work by Défossez and Bach 2014, in which the authors develop an operator view point for analyzing Averaged SGD updates to show the Bias-Variance Trade-off and provide tight convergence rates of Least Mean Squared problem. ** Read more**

## Random Reshuffling converges to a smaller neighborhood than SGD

** Posted on: **

This article is on the recent work by Ying et. al. 2018, in which the authors show that SGD with Random Reshuffling outperforms independent sampling with replacement. ** Read more**

## Nesterov’s Acceleration

** Posted on: **

This post contains an error vector analysis of the Nesterov’s accelerated gradient descent method and some insightful implications that can be derived from it. ** Read more**

## Some resources to start with Fundamentals of Machine Learning

** Posted on: **

With a number of courses, books and reading material out there here is a list of some which I personally find useful for building a fundamental understanding in Machine Learning. ** Read more**

## A survey on Large Scale Optimization

** Posted on: **

This post contains a summary and survey of the theoretical understandings of Large Scale Optimization by referring some talks, papers, and lectures that I have come across in the recent. ** Read more**

## misc

## Montreal, Canada during NeurIPS 2018

** Posted on: **

Visited Montreal, Canada with Microsoft Research Labmates to attend and present at NeurIPS 2018 ** Read more**

## Melbourne, Australia during WSDM 2019

** Posted on: **

Visited Melbourne, Australia to attend and present at WSDM 2019 ** Read more**

## Vancouver, Canada during NeurIPS 2019

** Posted on: **

Visited Vancouver, Canada to attend NeurIPS 2019 and present at SEDL 2019 ** Read more**

## Personal thoughts about Mathematics

** Posted on: **

Blog in progress. ** Read more**

## projects

## Some Approaches of Building Recommendation Systems

** Posted on: **

The project aims at using different recommendation methods for different kinds of real world data like rating matrices, images and text, using Deep Learning and Optimization. ** Read more**

## Sparse Regression and Support Recovery bounds for Orthogonal Matching Pursuit

** Posted on: **

We study the problem of sparse regression where the goal is to learn a sparse vector that best optimizes a given objective function. Under the assumption that the objective function satisfies restricted strong convexity (RSC), we analyze Orthogonal Matching Pursuit (OMP) and obtain support recovery result as well as a tight generalization error bound for OMP. Furthermore, we obtain lower bounds for OMP, showing that both our results on support recovery and generalization error are tight up to logarithmic factors. To the best of our knowledge, these support recovery and generalization bounds are the first such matching upper and lower bounds (up to logarithmic factors) for any sparse regression algorithm under the RSC assumption. ** Read more**

## Universality Patterns in the Training of Neural Networks

** Posted on: **

This work proposes and demonstrates a surprising pattern in the training of neural networks: there is a one to one relation between the values of any pair of losses (such as cross entropy, mean squared error, \(0/1\) error etc.) evaluated for a model arising at (any point of) a training run. This pattern is universal in the sense that this one to one relationship is identical across architectures (such as VGG, Resnet, Densenet etc.), algorithms (SGD and SGD with momentum) and training loss functions (cross entropy and mean squared error). ** Read more**

## Connections between Stochasticity of SGD and Generalizability

** Posted on: **

This is an attempt to understand how stochasticity in an optimization algorithm affect generalization properties of a Neural Network. ** Read more**

## Robust Mixed Linear Regression using heterogeneous batches

** Posted on: **

For the problem of learning Mixed Linear Regression, this work introduces a spectral approach that is simultaneously robust under both data scarcity and outlier tasks. ** Read more**

## Scaling laws of optimization algorithms for Deep Learning - the Graphon perspective

** Posted on: **

Stochastic optimization algorithms are used to train large (Deep) Neural Networks ((D)NNs). The non-linear training dynamics of two-layer NNs can be described as a mean-field interacting particle system where the “particles” are the neurons in the single hidden layer. Wasserstein gradient flows often arise from such mean-field interactions among exchangeable particles. This relies on the observation that the permutation symmetry of the neurons in the hidden layer allows the problem to be viewed as an optimization problem over probability measures. Going beyond, multi-layer NNs can be considered as large computational graphs and therefore can possess different groups of symmetries. This body of work aims to describe analytical scaling limits of stochastic optimization algorithms as the size of the network grows. We use and develop on the existing theory of exchangeable arrays, graphons (analytical limits of dense graphs), the general theory of gradient flows on metric spaces, and insights from propagation of chaos to characterize this scaling limit. We discover a generalized notion of the McKean-Vlasov equation on graphons where the phenomenon of *propagation of chaos* holds. In the asymptotically zero-noise case, this limit is a gradient flow on the metric space of graphons. ** Read more**

## publications

## Clustered Monotone Transforms for Rating Factorization

*Raghav Somani*, Gaurush Hiranandani*, Sanmi Koyejo & Sreangsu Acharyya*

**Published at:**

*Web Search and Data Mining (WSDM)*, 2019

The paper has been accepted for an oral persentation (84/511 submissions ≈ *16% Acceptance Rate*). ** Read more**

## Support Recovery for Orthogonal Matching Pursuit: Upper and Lower bounds

*Raghav Somani*, Chirag Gupta*, Prateek Jain & Praneeth Netrapalli*

**Published at:**

*Neural Information Processing Systems (NeurIPS)*, 2018

The paper has been accepted for **Spotlight** presentation. ** Read more**

## Non-Gaussianity of Stochastic Gradient Noise

*Abhishek Panigrahi, Raghav Somani, Navin Goyal & Praneeth Netrapalli*

**Published at:**

*Science meets Engineering of Deep Learning (SEDL) workshop, Neural Information Processing Systems (NeurIPS)*, 2019

We study the distribution of the Stochastic Gradient Noise during the training and observe that for batch sizes \(256\) and above, the distribution is best described as Gaussian at-least in the early phases of training. ** Read more**

## Meta-learning for Mixed Linear Regression

*Weihao Kong, Raghav Somani, Zhao Song, Sham Kakade, Sewoong Oh*

**Published at:**

*International Conference on Machine Learning (ICML)*, 2020

The paper has been accepted for a presentation. ** Read more**

## Robust Meta-learning for Mixed Linear Regression with Small Batches

*Weihao Kong, Raghav Somani, Sham Kakade, Sewoong Oh*

**Published at:**

*Neural Information Processing Systems (NeurIPS)*, 2020

The paper has been accepted for a poster. ** Read more**

## Gradient Flows on Graphons: Existence, Convergence, Continuity Equations

*Sewoong Oh, Soumik Pal, Raghav Somani & Raghav Tripathi*

**Published at:**

*Journal of Theoretical Probability and the Optimal Transport and Machine Learning (OTML) workshop, Neural Information Processing Systems (NeurIPS)*, 2021

The paper is published at the Journal of Theoretical Probability. ** Read more**