# Blog Notes

## A survey on Strongly Rayleigh measures and their mixing time analysis

Posted on:

Strongly Rayleigh measures are natural generalizations of measures that satisfy the notion of negative dependence. The class of Strongly Rayleigh measures provides the most useful characterization of Negative Dependence by grounding it in the theory of multivariate stable polynomials. This post attempts to throw some light on the origin of Strongly Rayleigh measures and Determinantal Point Processes and highlights the fast mixing time analysis of the natural MCMC chain in the support of a Strongly Rayleigh measure as shown by Anari, Gharan and Rezaei 2016. Read more

## Analysis of Newton’s Method

Posted on:

In optimization, Netwon’s method is used to find the roots of the derivative of a twice differentiable function given the oracle access to its gradient and hessian. By having super-liner memory in the dimension of the ambient space, Newton’s method can take the advantage of the second order curvature and optimize the objective function at a quadratically convergent rate. Here I consider the case when the objective function is smooth and strongly convex. Read more

## Deriving the Fokker-Planck equation

Posted on:

In the theory of dynamic systems, Fokker-Planck equation is used to describe the time evolution of the probability density function. It is a partial differential equation that describes how the density of a stochastic process changes as a function of time under the influence of a potential field. Some common application of it are in the study of Brownian motion, Ornstein–Uhlenbeck process, and in statistical physics. The motivation behind understanding the derivation is to study Levy flight processes that has caught my recent attention. Read more

## SGD without replacement

Posted on:

This article is in continuation of my previous blog, and discusses about the work by Prateek Jain, Dheeraj Nagaraj and Praneeth Netrapalli 2019. The authors provide tight rates for SGD without replacement for general smooth, and general smooth and strongly convex functions using the method of exchangeable pairs to bound Wasserstein distances, and techniques from optimal transport. Read more

## Non-asymptotic rate for Random Shuffling for Quadratic functions

Posted on:

This article is in continuation of my previous blog, and discusses about a section of the work by Jeffery Z. HaoChen and Suvrit Sra 2018, in which the authors come up with a non-asymptotic rate of $\mathcal{O}\left(\frac{1}{T^2} + \frac{n^3}{T^3} \right)$ for Random Shuffling Stochastic algorithm which is strictly better than that of SGD. Read more

## Bias-Variance Trade-offs for Averaged SGD in Least Mean Squares

Posted on:

This article is on the work by Défossez and Bach 2014, in which the authors develop an operator view point for analyzing Averaged SGD updates to show the Bias-Variance Trade-off and provide tight convergence rates of Least Mean Squared problem. Read more

## Random Reshuffling converges to a smaller neighborhood than SGD

Posted on:

This article is on the recent work by Ying et. al. 2018, in which the authors show that SGD with Random Reshuffling outperforms independent sampling with replacement. Read more

## Nesterov’s Acceleration

Posted on:

This post contains an error vector analysis of the Nesterov’s accelerated gradient descent method and some insightful implications that can be derived from it. Read more

## Some resources to start with Fundamentals of Machine Learning

Posted on:

With a number of courses, books and reading material out there here is a list of some which I personally find useful for building a fundamental understanding in Machine Learning. Read more

## A survey on Large Scale Optimization

Posted on:

This post contains a summary and survey of the theoretical understandings of Large Scale Optimization by referring some talks, papers, and lectures that I have come across in the recent. Read more