# Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

## Analysis of Newton’s Method

Posted on:

In optimization, Netwon’s method is used to find the roots of the derivative of a twice differentiable function given the oracle access to its gradient and hessian. By having super-liner memory in the dimension of the ambient space, Newton’s method can take the advantage of the second order curvature and optimize the objective function at a quadratically convergent rate. Here I consider the case when the objective function is smooth and strongly convex. Read more

## Deriving the Fokker-Planck equation

Posted on:

In the theory of dynamic systems, Fokker-Planck equation is used to describe the time evolution of the probability density function. It is a partial differential equation that describes how the density of a stochastic process changes as a function of time under the influence of a potential field. Some common application of it are in the study of Brownian motion, Ornstein–Uhlenbeck process, and in statistical physics. The motivation behind understanding the derivation is to study Levy flight processes that has caught my recent attention. Read more

## SGD without replacement

Posted on:

This article is in continuation of my previous blog, and discusses about the work by Prateek Jain, Dheeraj Nagaraj and Praneeth Netrapalli 2019. The authors provide tight rates for SGD without replacement for general smooth, and general smooth and strongly convex functions using the method of exchangeable pairs to bound Wasserstein distances, and techniques from optimal transport. Read more

## Non-asymptotic rate for Random Shuffling for Quadratic functions

Posted on:

This article is in continuation of my previous blog, and discusses about a section of the work by Jeffery Z. HaoChen and Suvrit Sra 2018, in which the authors come up with a non-asymptotic rate of $\mathcal{O}\left(\frac{1}{T^2} + \frac{n^3}{T^3} \right)$ for Random Shuffling Stochastic algorithm which is strictly better than that of SGD. Read more

Posted on:

Posted on:

## Nesterov’s Acceleration

Posted on:

This post contains an error vector analysis of the Nesterov’s accelerated gradient descent method and some insightful implications that can be derived from it. Read more

Posted on:

With a number of courses, books and reading material out there here is a list of some which I personally find useful for building a fundamental understanding in Machine Learning. Read more

## A survey on Large Scale Optimization

Posted on:

This post contains a summary and survey of the theoretical understandings of Large Scale Optimization by referring some talks, papers, and lectures that I have come across in the recent. Read more

## Montreal, Canada during NeurIPS 2018

Posted on:

Visited Montreal, Canada with Microsoft Research Labmates to attend and present at NeurIPS 2018 Read more

## Melbourne, Australia during WSDM 2019

Posted on:

Visited Melbourne, Australia to attend and present at WSDM 2019 Read more

## Mini Search Engine

Posted on:

We used data structures like Hash Tables, Balanced Trees in order to design a text search engine that gives the frequency of the searched word in a given folder of files. Read more

## Modelling Economic Policy Uncertainty Index using Text Classification

Posted on:

Using Soft Margin Kernel Support Vector Machine to classify newspaper articles to model an Economic Policy Uncertainty Index for India. Read more

## Some Approaches of Building Recommendation Systems

Posted on:

The project aims at using different recommendation methods for different kinds of real world data like rating matrices, images and text, using Deep Learning and Optimization. Read more

## A case study of Empirical Bayes in Recommendation system

Posted on:

We provide a formulation of empirical bayes described by Atchadé (2011) to tune the hyperparameters of priors used in Bayesian set up of collaborative filter. Read more

## Clustered Monotone Transforms for Rating Factorization

Posted on:

We propose Clustered Monotone Transforms for Rating Factorization (CMTRF), a novel approach to perform regression up to unknown monotonic transforms over unknown population segments. For recommendation systems, the technique searches for monotonic transformations of the rating scales resulting in a better fit. This is combined with an underlying matrix factorization regression model that couples the user-wise ratings to exploit shared low dimensional structure. The rating scale transformations can be generated for each user (N-CMTRF), for a cluster of users (CMTRF), or for all the users at once (1-CMTRF), forming the basis of three simple and efficient algorithms proposed, all of which alternate between transformation of the rating scales and matrix factorization regression. Despite the non-convexity, CMTRF is theoretically shown to recover a unique solution under mild conditions. Read more

## Sparse Regression and Support Recovery bounds for Orthogonal Matching Pursuit

Posted on:

We study the problem of sparse regression where the goal is to learn a sparse vector that best optimizes a given objective function. Under the assumption that the objective function satisfies restricted strong convexity (RSC), we analyze Orthogonal Matching Pursuit (OMP) and obtain support recovery result as well as a tight generalization error bound for OMP. Furthermore, we obtain lower bounds for OMP, showing that both our results on support recovery and generalization error are tight up to logarithmic factors. To the best of our knowledge, these support recovery and generalization bounds are the first such matching upper and lower bounds (up to logarithmic factors) for any sparse regression algorithm under the RSC assumption. Read more

## Universality Patterns in the Training of Neural Networks

Posted on:

This work proposes and demonstrates a surprising pattern in the training of neural networks: there is a one to one relation between the values of any pair of losses (such as cross entropy, mean squared error, $0/1$ error etc.) evaluated for a model arising at (any point of) a training run. This pattern is universal in the sense that this one to one relationship is identical across architectures (such as VGG, Resnet, Densenet etc.), algorithms (SGD and SGD with momentum) and training loss functions (cross entropy and mean squared error). Read more

## A case study of empirical Bayes in a user-movie recommendation system

Arabin Kumar Dey, Raghav Somani & Sreangsu Acharyya
Published at: Communications in Statistics: Case Studies, Data Analysis and Applications Vol. 3, 2017 - Issue 1-2

We provide a formulation of empirical Bayes Atchadé to tune the hyperparameters of priors used in Bayesian set-up of collaborative filter. Read more

[paper] [arXiv] [bib]

## Clustered Monotone Transforms for Rating Factorization

Raghav Somani*, Gaurush Hiranandani*, Sanmi Koyejo & Sreangsu Acharyya
Published at: Web Search and Data Mining (WSDM), 2019

The paper has been accepted for an oral persentation (84/511 submissions ≈ 16% Acceptance Rate). Read more

[paper] [arXiv] [bib] [code]

## Support Recovery for Orthogonal Matching Pursuit: Upper and Lower bounds

Raghav Somani*, Chirag Gupta*, Prateek Jain & Praneeth Netrapalli
Published at: Neural Information Processing Systems (NeurIPS), 2018

The paper has been accepted for Spotlight presentation (168/4856 submissions ≈ 3.5% Acceptance Rate). Read more

[paper] [bib]

## Non-Gaussianity of Stochastic Gradient Noise

Abhishek Panigrahi, Raghav Somani, Navin Goyal & Praneeth Netrapalli
Published at: Science meets Engineering of Deep Learning (SEDL) workshop, Neural Information Processing Systems (NeurIPS), 2019

We study the distribution of the Stochastic Gradient Noise during the training and observe that for batch sizes $256$ and above, the distribution is best described as Gaussian at-least in the early phases of training. Read more

[arXiv] [bib]