Jekyll2021-04-04T21:33:28-07:00https://raghavsomani.github.io/feed.xmlRaghav Somani2<sup>nd</sup> year Ph.D. student
University of WashingtonRaghav SomaniNote on the Kadison-Singer Problem and its Solution2021-03-21T00:00:00-07:002021-03-21T00:00:00-07:00https://raghavsomani.github.io/posts/2021/03/blog-post-14<p>The <a href="https://en.wikipedia.org/wiki/Kadison%E2%80%93Singer_problem" target="_blank">Kadison-Singer problem</a> arose from the work on quantum mechanics done by <a href="https://en.wikipedia.org/wiki/Paul_dirac" target="_blank">Paul Dirac</a> in the 1930s. The problem is equivalent to fundamental problems in areas like Operator theory, Hilbert and Banach space theory, Frame theory, Harmonic Analysis, Discrepancy theory, Graph theory, Signal Processing and theoretical Computer Science. The Kadison-Singer problem had been long standing and defied the efforts of most Mathematicians until it was recently solved by <a href="https://en.wikipedia.org/wiki/Adam_Marcus_(mathematician)" target="_blank">Adam Wade Marcus</a>, <a href="https://en.wikipedia.org/wiki/Daniel_Spielman" target="_blank">Daniel Alan Spielman</a> and <a href="https://en.wikipedia.org/wiki/Nikhil_Srivastava" target="_blank">Nikhil Srivastava</a> for which they were awarded the <a href="https://evoq-eval.siam.org/prizes-recognition/major-prizes-lectures/detail/george-polya-prize-for-mathematics" target="_blank">George Polya Prize in Mathematics</a> in 2014, and very recently, the <a href="http://www.nasonline.org/programs/awards/michael-and-sheila-held-prize.html?fbclid=IwAR3C3V9b7UGnSrTNaL55qMKWGQoNx4AN8stkcq3v7gDOms29HeW8_UqslP4" target="_blank">Michael and Sheila Held Prize</a> in 2021. The proof uses an existence argument which reduces the problem to bounding the roots of the expected characteristic polynomial of certain random matrices employing tools from the theory of random polynomials.</p>
<p>This note was a part of my <a href="https://courses.cs.washington.edu/courses/cse525/21wi/" target="_blank">Randomized Algorithms and Probabilistic Analysis</a> course project. The complete note can be found <a href="\files\KS.pdf" target="_blank">here</a>.</p>Raghav SomaniThe Kadison-Singer problem arose from the work on quantum mechanics done by Paul Dirac in the 1930s. The problem is equivalent to fundamental problems in areas like Operator theory, Hilbert and Banach space theory, Frame theory, Harmonic Analysis, Discrepancy theory, Graph theory, Signal Processing and theoretical Computer Science. The Kadison-Singer problem had been long standing and defied the efforts of most Mathematicians until it was recently solved by Adam Wade Marcus, Daniel Alan Spielman and Nikhil Srivastava in 2013.A note on Conformal Symplectic and Relativistic Optimization2020-12-28T00:00:00-08:002020-12-28T00:00:00-08:00https://raghavsomani.github.io/posts/2020/12/blog-post-13<p>This note on a spotlight paper at NeurIPS 2020, has been made while I had been reading the literature on the principle connections between continuous and discrete optimization. The motivation is to understand and create accelerated discrete large scale optimization algorithms from first principles via considering the geometry of phase spaces and numerical integration, specifically symplectic integration. Recent works successfully have been able to throw sufficient light on the two and therefore has attracted my attention.</p>
<p>The complete notes can be found <a href="\files\CSRO.pdf" target="_blank">here</a>.</p>Raghav SomaniThis note on a spotlight paper at NeurIPS 2020, has been made while I had been reading the literature on the principle connections between continuous and discrete optimization. The motivation is to understand and create accelerated discrete large scale optimization algorithms from first principles via considering the geometry of phase spaces and numerical integration, specifically symplectic integration. Recent works successfully have been able to throw sufficient light on the two and therefore has attracted my attention.Geometry of Relativistic Spacetime Physics2020-11-24T00:00:00-08:002020-11-24T00:00:00-08:00https://raghavsomani.github.io/posts/2020/11/blog-post-12<p>This article introduces and describes the mathematical structures and frameworks needed to understand the modern fundamental theory of Relativistic Spacetime Physics. The self-referential and self-contained nature of Mathematics provides enough power to prescribe a rigorous language needed to formulate the building components of the standard Einstein’s General Theory of Relativity like Spacetime, Matter, and Gravity, along with their behaviors and interactions. These notes introduce these abstract components, starting with defining the arena of smooth manifolds and then adding the necessary and suffcient differential geometric structures needed to build the primers to the General Theory of Relativity.</p>
<p>The complete notes can be found <a href="\files\MathFoundationGR.pdf" target="_blank">here</a>.</p>
<p>These notes were made while I had been doing the <a href="https://www.youtube.com/playlist?list=PLFeEvEPtX_0S6vxxiiNPrJbLu9aK1UVC_" target="_blank">Central Lecture course</a> of the <a href="https://gravity-and-light.herokuapp.com/" target="_blank">International Winter School on Gravity and Light 2015</a>. Many thanks to the excellent teachings by <a href="https://people.utwente.nl/f.p.schuller?tab=about-me" target="_blank">Prof. Frederic P. Schuller</a> in these lectures and all other online available lectures on the Mathamatical anatomy of Theoretical Physics.</p>Raghav SomaniThis article introduces and describes the mathematical structures and frameworks needed to understand the modern fundamental theory of Relativistic Spacetime Physics. The self-referential and self-contained nature of Mathematics provides enough power to prescribe a rigorous language needed to formulate the building components of the standard Einstein's General Theory of Relativity like Spacetime, Matter, and Gravity, along with their behaviors and interactions. In these notes, we will introduce and understand these abstract components, starting with defining the arena of smooth manifolds and then adding the necessary and suffcient differential geometric structures needed to build the primers to the General Theory of Relativity.Dual spaces and the Fenchel conjugate2020-02-09T00:00:00-08:002020-02-09T00:00:00-08:00https://raghavsomani.github.io/posts/2020/02/blog-post-11<p>\(\newcommand{\round}[1]{\left( #1 \right)}
\newcommand{\curly}[1]{\left\lbrace #1 \right\rbrace}
\newcommand{\squarebrack}[1]{\left\lbrack #1 \right\rbrack}
\newcommand{\sumi}[2]{\sum\limits_{i=#1}^{#2}}
\newcommand{\sumj}[2]{\sum\limits_{j=#1}^{#2}}
\newcommand{\sumk}[2]{\sum\limits_{k=#1}^{#2}}
\newcommand{\sump}[2]{\sum\limits_{p=#1}^{#2}}
\newcommand{\suml}[2]{\sum\limits_{l=#1}^{#2}}
\newcommand{\sumn}[2]{\sum\limits_{n=#1}^{#2}}
\newcommand{\summ}[2]{\sum\limits_{m=#1}^{#2}}
\newcommand{\sumt}[2]{\sum\limits_{t=#1}^{#2}}
\newcommand{\Sum}{\sum_{i = 1}^{n}}
\newcommand{\Sumi}[1]{\sum\limits_{i = 1}^{#1}}
\newcommand{\Sumt}[1]{\sum\limits_{t = 1}^{#1}}
\newcommand{\abs}[1]{\left\lvert #1 \right\rvert}
\newcommand{\norm}[2]{\left\lVert#2\right\rVert_{#1}}
\newcommand{\esqnorm}[1]{\left\lVert#1\right\rVert_2^2}
\newcommand{\enorm}[1]{\left\lVert#1\right\rVert_2}
\newcommand{\infnorm}[1]{\left\lVert#1\right\rVert_\infty}
\newcommand{\opnorm}[1]{\left\lVert#1\right\rVert_\text{op}}
\newcommand{\normF}[1]{\left\lVert#1\right\rVert_{\text{F}}}
\newcommand{\inner}[1]{\left\langle#1\right\rangle}
\newcommand{\ceil}[1]{\left\lceil#1\right\rceil}
\newcommand{\floor}[1]{\left\lfloor#1\right\rfloor}
\newcommand{\zero}{\mathbf{0}}
\newcommand{\one}{\mathbf{1}}
\newcommand{\avec}{\mathbf{a}}
\newcommand{\bvec}{\mathbf{b}}
\newcommand{\cvec}{\mathbf{c}}
\newcommand{\dvec}{\mathbf{d}}
\newcommand{\e}{\mathbf{e}}
\newcommand{\f}{\mathbf{f}}
\newcommand{\g}{\mathbf{g}}
\newcommand{\h}{\mathbf{h}}
\newcommand{\ivec}{\mathbf{i}}
\newcommand{\jvec}{\mathbf{j}}
\newcommand{\kvec}{\mathbf{k}}
\newcommand{\lvec}{\mathbf{l}}
\newcommand{\m}{\mathbf{m}}
\newcommand{\n}{\mathbf{n}}
\newcommand{\ovec}{\mathbf{o}}
\newcommand{\p}{\mathbf{p}}
\newcommand{\q}{\mathbf{q}}
\newcommand{\rvec}{\mathbf{r}}
\newcommand{\s}{\mathbf{s}}
\newcommand{\tvec}{\mathbf{t}}
\newcommand{\uvec}{\mathbf{u}}
\newcommand{\vvec}{\mathbf{v}}
\newcommand{\w}{\mathbf{w}}
\newcommand{\x}{\mathbf{x}}
\newcommand{\y}{\mathbf{y}}
\newcommand{\z}{\mathbf{z}}
\newcommand{\A}{\mathbf{A}}
\newcommand{\B}{\mathbf{B}}
\newcommand{\C}{\mathbf{C}}
\newcommand{\D}{\mathbf{D}}
\newcommand{\Emat}{\mathbf{E}}
\newcommand{\F}{\mathbf{F}}
\newcommand{\G}{\mathbf{G}}
\newcommand{\Hmat}{\mathbf{H}}
\newcommand{\I}{\mathbf{I}}
\newcommand{\J}{\mathbf{J}}
\newcommand{\K}{\mathbf{K}}
\newcommand{\Lmat}{\mathbf{L}}
\newcommand{\M}{\mathbf{M}}
\newcommand{\N}{\mathbf{N}}
\newcommand{\Omat}{\mathbf{O}}
\newcommand{\Pmat}{\mathbf{P}}
\newcommand{\Q}{\mathbf{Q}}
\newcommand{\Rmat}{\mathbf{R}}
\newcommand{\Smat}{\mathbf{S}}
\newcommand{\T}{\mathbf{T}}
\newcommand{\U}{\mathbf{U}}
\newcommand{\V}{\mathbf{V}}
\newcommand{\W}{\mathbf{W}}
\newcommand{\X}{\mathbf{X}}
\newcommand{\Y}{\mathbf{Y}}
\newcommand{\Z}{\mathbf{Z}}
\newcommand{\SIGMA}{\mathbf{\Sigma}}
\newcommand{\LAMBDA}{\mathbf{\Lambda}}
\newcommand{\Acal}{\mathcal{A}}
\newcommand{\Bcal}{\mathcal{B}}
\newcommand{\Ccal}{\mathcal{C}}
\newcommand{\Dcal}{\mathcal{D}}
\newcommand{\Ecal}{\mathcal{E}}
\newcommand{\Fcal}{\mathcal{F}}
\newcommand{\Gcal}{\mathcal{G}}
\newcommand{\Hcal}{\mathcal{H}}
\newcommand{\Ical}{\mathcal{I}}
\newcommand{\Jcal}{\mathcal{J}}
\newcommand{\Kcal}{\mathcal{K}}
\newcommand{\Lcal}{\mathcal{L}}
\newcommand{\Mcal}{\mathcal{M}}
\newcommand{\Ncal}{\mathcal{N}}
\newcommand{\Ocal}{\mathcal{O}}
\newcommand{\Pcal}{\mathcal{P}}
\newcommand{\Qcal}{\mathcal{Q}}
\newcommand{\Rcal}{\mathcal{R}}
\newcommand{\Scal}{\mathcal{S}}
\newcommand{\Tcal}{\mathcal{T}}
\newcommand{\Ucal}{\mathcal{U}}
\newcommand{\Vcal}{\mathcal{V}}
\newcommand{\Wcal}{\mathcal{W}}
\newcommand{\Xcal}{\mathcal{X}}
\newcommand{\Ycal}{\mathcal{Y}}
\newcommand{\Zcal}{\mathcal{Z}}
\newcommand{\alphavec}{\boldsymbol{\alpha}}
\newcommand{\betavec}{\boldsymbol{\beta}}
\newcommand{\gammavec}{\boldsymbol{\gamma}}
\newcommand{\deltavec}{\boldsymbol{\delta}}
\newcommand{\epsvec}{\boldsymbol{\epsilon}}
\newcommand{\etavec}{\boldsymbol{\eta}}
\newcommand{\nuvec}{\boldsymbol{\nu}}
\newcommand{\tauvec}{\boldsymbol{\tau}}
\newcommand{\rhovec}{\boldsymbol{\rho}}
\newcommand{\lmbda}{\boldsymbol{\lambda}}
\newcommand{\muvec}{\boldsymbol{\mu}}
\newcommand{\thetavec}{\boldsymbol{\theta}}
\newcommand{\BigO}[1]{\mathcal{O}\round{#1}}
\newcommand{\BigOmega}[1]{\Omega\round{#1}}
\newcommand{\R}{\mathbb{R}}
\newcommand{\Rd}[1]{\mathbb{R}^{#1}}
\newcommand{\Natural}{\mathbb{N}}
\newcommand{\Complex}{\mathbb{C}}
\newcommand{\Integer}{\mathbb{Z}}
\newcommand{\Rational}{\mathbb{Q}}
\newcommand{\Field}{\mathbb{F}}
\newcommand{\E}[1]{\mathbb{E}\squarebrack{#1}}
\newcommand{\Exp}[2]{\mathbb{E}_{#1}\squarebrack{#2}}
\newcommand{\Prob}[1]{P\curly{#1}}
\newcommand{\Var}[1]{\mathrm{Var}\squarebrack{#1}}
\newcommand{\inv}[1]{\frac{1}{#1}}
\newcommand{\indicator}[2]{\mathbbm{1}_{#1}\squarebrack{#2}}
\newcommand{\Tr}[1]{\text{Tr}\squarebrack{#1}}
\newcommand{\BOX}[1]{\fbox{\parbox{\linewidth}{\centering#1}}}
\newcommand{\textequal}[1]{\stackrel{#1}{=}}
\newcommand{\textleq}[1]{\stackrel{#1}{\leq}}
\newcommand{\textgeq}[1]{\stackrel{#1}{\geq}}
\newcommand{\dd}[2]{\frac{d #1}{d #2}}
\newcommand{\ddn}[3]{\frac{d^{#1} #2}{d #3^{#1}}}
\newcommand{\dodo}[2]{\frac{\partial #1}{\partial #2}}
\newcommand{\dodon}[3]{\frac{\partial^{#1} #2}{\partial {#3}^{#1}}}\)Dual spaces lie at the core of linear algebra and allows us to formally reason about the concept of duality in mathematics. Duality shows up naturally and elegantly in measure theory, functional analysis, and mathematical optimization. In this post, I have tried to learn and explore the nature of duality via Dual spaces, its interpretation in general linear algebra, all of which was motivated by the so called <em>convex conjugate</em>, or the <em>Fenchel conjugate</em> in mathematical optimization.</p>
<p>For the sake of abstractness and general applicability, we will consider that we are in a general field $\Field$ which for example can be $\R$, $\Complex$ etc. Our space of vectors, is the vector space $\Vcal$, which in general can be subspace of $\Field^d$ for some $d\in\Natural$ for example.</p>
<h1 id="dual-spaces">Dual spaces</h1>
<blockquote>
<p><strong>Dual space</strong> - The <em>dual space</em> $\Vcal^{*}$ of the vector space $\Vcal$, is the set of all linear transformations $f$ from $\Vcal$ to $\Field$ denoted as $\Vcal^{*} = \mathscr{L}(\Vcal,\Field)$.</p>
</blockquote>
<blockquote>
<p><strong>Linear functionals</strong> - An element $f:\Vcal\to\Field$ of $\Vcal^{*}$ is called a <em>linear functional</em>, which takes in a vector $x\in\Vcal$, and outputs an element in $\Field$.</p>
</blockquote>
<p>Note that if $f,g\in\Vcal^{*}$, then $\alpha f + \beta g$ is also in $\Vcal^{*}$ for $\alpha,\beta\in\Field$ and would be defined as $(\alpha f + \beta g)(x) = \alpha f(x) + \beta g(x)$ for all $x\in\Vcal$. Let us look at a few examples :</p>
<ol>
<li>If $\Vcal = \Pcal_n$ is the space of all univariate polynomial over the field $\R$ with degree $n\in\Natural$, then $f$ defined as $f(p) = p(1)+2p’(0)$, is a linear functional contained in $\Vcal^{*} = \mathscr{L}(\Pcal_n,\R)$.</li>
<li>If $\Vcal = \Rd{m\times n}$ be the space of all real matrices of size $m\times n$, then $f$ defined as $f(\M) = \Tr{\M}$, is a linear functional contained in $\Vcal^{*} = \mathscr{L}(\Rd{m\times n},\R)$.</li>
<li>If $\Vcal = \Ccal\round{\squarebrack{-\pi,\pi}}$ is the space of all real valued continuous functions on $\squarebrack{-\pi,\pi}$, then $f$ defined as $f(g) = \int_{-\pi}^{\pi} g(x)\cos(2x)dx$, is a linear functional contained in $\Vcal^{*} = \mathscr{L}(\squarebrack{-\pi,\pi},\R)$. This function essentially outputs the 2<sup>nd</sup> Fourier coefficient of its input function.</li>
</ol>
<h1 id="dual-basis">Dual basis</h1>
<p>Let $\Vcal$ be a finite dimensional vector space, and $\beta = \curly{v_i}_{i=1}^n$ be a basis of $\Vcal$. Therefore, to determine the action of any linear functional $f\in\Vcal^{*}$, it is enough to know its action on the basis vectors $\curly{v_i}_{i=1}^n$, i.e., it is enough to know what $\curly{f(v_i)\in\Field}_{i=1}^n$ are.</p>
<blockquote>
<p><strong>Dual basis</strong> - If $\Vcal$ is finite dimensional and $\beta = \curly{v_i}_{i=1}^n$ is a basis of $\Vcal$, then $\beta^{*} = \curly{f_i}_{i=1}^n$, the set of functionals defined by $f_i(v_j) = \delta_{ij}\ \forall\ i,j\in\squarebrack{n}$, is the corresponding basis of $\Vcal^{*}$ also known as a dual basis of $\Vcal$.</p>
</blockquote>
<p>To verify that $\beta^{*}$ is indeed a basis of $\Vcal^{*}$, we just need to check the following</p>
<ol>
<li>$\beta^{*}$ is a linearly independent set of linear functionals, i.e., if $\sumi{1}{n}a_if_i = f_0$ for some set $\curly{a_j\in\Field}_{j=1}^n$, then $a_j=0\ \forall\ j\in\squarebrack{n}$, where $f_0$ is the $0$ functional. To verify this, we just need to check this condition for all the basis elements of $\Vcal$, $v \in\beta$, which gives us $\sumi{1}{n}a_if_i(v_j) = 0\ \forall\ j \in\squarebrack{n}$, and implies that $a_j = 0\ \forall\ j \in\squarebrack{n}$ establishing the linear independence of the set $\beta^{*}$.</li>
<li>$\beta^{*}$ spans $\Vcal^{*}$, i.e., for any $f\in\Vcal^{*}$, we have some set of elements $\curly{a_j\in\Field}_{j=1}^n$ such that $f(v) = \sumi{1}{n}a_if_i(v)\ \forall\ v\in\Vcal$. To check this, we evaluate $f$ on basis vectors $v_j\in\beta$ for $j\in\squarebrack{n}$, which gives us $f(v_j) = \sumi{1}{n}a_if_i(v_j) = \sumi{1}{n}a_i\delta_{ij} = a_j\ \forall\ j\in\squarebrack{n}$, or, $a_j = f(v_j)\ \forall\ j \in\squarebrack{n}$. Therefore, if $f$ is in the span of $\beta^{*}$, then it must be the case that the coefficients satisfy $a_j=f(v_j)\ \forall\ j \in \squarebrack{n}$. Now, if $g = \sumi{1}{n}f(v_i)f_i$, then to show that $g=f$, it is enough to show that they agree on the basis vectors $\curly{v_j}$ of $\Vcal$, that is $f(v_i)=g(v_i)$. This is simply true by the construction of $\curly{f_i}_{i=1}^n$, therefore $\beta^{*}$ spans $\Vcal^{*}$.</li>
</ol>
<p>This establishes that $\beta^{*} = \curly{f_i}_{i=1}^n$ is indeed a basis of $\Vcal^{*}$, and any $f\in\Vcal^{*}$ can be decomposed into the basis of $\Vcal^{*}$ as</p>
<blockquote>
\[\begin{equation}
f = \sumi{1}{n}f(v_i)f_i
\end{equation}\]
</blockquote>
<p>where $\beta = \curly{v_i}$ is the basis of $\Vcal$. The existence of dual basis $\beta^{*}$ and the relation with $\beta$ also shows that $\Vcal$ and $\Vcal^{*}$ are isomorphic.</p>
<h1 id="transpose">Transpose</h1>
<p>The existence of the dual basis $\beta^{*}$ helps us define the concept of <em>transpose</em> in linear algebra which is extensively used in application.</p>
<blockquote>
<p><strong>Transpose</strong> - Let $T:\Vcal\to\Wcal$ be a linear transformation from the vector space $\Vcal$ to the vector space $\Wcal$, then $T^\top:\Wcal^{*}\to\Vcal^{*}$ is a linear transformation from $\Wcal^{*}$ to $\Vcal^{*}$ that takes in a linear functional $g\in\Wcal^{*}$, and outputs another linear functional $T^\top(g)\in\Vcal^{*}$ defined as $T^\top(g)(x) = g\round{T(x)}\ \forall\ x\in\Vcal$.</p>
</blockquote>
<div>
$$
\begin{align}
x &\xrightarrow{T} T(x) &\xrightarrow{g\in\Wcal^{*}}&\ g(T(x)) &\qquad\equiv\qquad& x &\xrightarrow{T^\top(g)\in\Vcal^*}& g(T(x))\\
\Vcal &\xrightarrow{T}\Wcal &\xrightarrow{g\in\Wcal^*}&\ \Field &\qquad\qquad& \Vcal &\xrightarrow{T^\top(g)\in\Vcal^*}& \Field
\end{align}
$$
</div>
<blockquote>
<p><strong>Theorem</strong> - Let $T:\Vcal\to\Wcal$ be a linear transformation from vector space $\Vcal$ to vector space $\Wcal$. Let $\beta = \curly{v_i}_{i=1}^n$ and $\beta^{*} = \curly{f_i}_{i=1}^n$ be the basis of $\Vcal$ and $\Vcal^{*}$ respectively, and $\gamma = \curly{w_j}_{j=1}^m$ and $\gamma^{*} = \curly{g_j}_{j=1}^m$ be the basis of $\Wcal$ and $\Wcal^{*}$ respectively. Let $A = \squarebrack{T}_{\beta}^\gamma$ be the matrix which transforms a vector in $\Vcal$ to a vector in $\Wcal$. Then $\squarebrack{T^\top}_{\gamma^{*}}^{\beta^{*}}=A^\top$.</p>
</blockquote>
<p>The matrix of $\squarebrack{T^\top}_{\gamma^{*}}^{\beta^{*}}$ takes in an element of $\Wcal^{*}$ represented in its basis $\gamma^{*}$, and outputs an element in $\Vcal^{*}$ represented in its basis $\beta^{*}$. Therefore, it is sufficient to represent every element in $\curly{T^\top(g_j)}_{j=1}^m$ in the basis $\beta^{*}$ of $\Vcal^{*}$. Since any vector $f$ in $\Vcal^{*}$ can be expressed as a linear combination of the basis vectors $\beta^{*}$ as we proved earlier, we have</p>
<div>
$$
\begin{align}
T^\top(g_j) &= \sumi{1}{n}T^\top(g_j)(v_i)f_i\\
&= \sumi{1}{n}g_j(T(v_i))f_i
\end{align}
$$
</div>
<p>Therefore, the $(i,j)$-th element of the matrix $\squarebrack{T^\top}_{\gamma^{*}}^{\beta^{*}}$ is $g_j(T(v_i))$. Now, the elements of the matrix $\squarebrack{T}_{\beta}^{\gamma}$ are the representations of the basis vectors $\beta$ of $\Vcal$, in the basis vectors $\gamma$ of $\Wcal$. Therefore we have,</p>
<div>
$$
\begin{align}
T(v_i) &= \sumk{1}{m}A_{ki}w_k\\
\implies g_j(T(v_i)) &= g_j\round{\sumk{1}{m}A_{ki}w_k}\\
&= \sumk{1}{m}A_{ki}g_j(w_k)\qquad(\because g \text{ is a linear functional})\\
&= A_{ji}\qquad(\because g_j(w_k)=\delta_{jk})
\end{align}
$$
</div>
<p>Which finally gives us</p>
<div>
$$
\begin{align}
T^\top(g_j) &= \sumi{1}{n}A_{ji}f_i
\end{align}
$$
</div>
<p>and shows us that the $(i,j)$-th element of the matrix $\squarebrack{T^\top}_{\gamma^{*}}^{\beta^{*}}$ is nothing but the $(j,i)$-th element of $A$, which implies that the matrix $\squarebrack{T^\top}_{\gamma^{*}}^{\beta^{*}} = A^\top$.</p>
<h1 id="double-dual">Double Dual</h1>
<p>If $\Vcal$ is a vector space, then its dual space $\Vcal^{*}$ is also a vector space. We can again define the set of all linear transformations from $\Vcal^{*}$ to $\Field$ as its dual space $\Vcal^{**} = \mathscr{L}(\Vcal^{*},\Field)$, which makes it the double dual space of $\Vcal$. Unlike $\Vcal^{*}$ where we needed to define basis, there is an elegant way to go from $\Vcal$ to $\Vcal^{**}$ via an injective map $\Psi$. If $x\in\Vcal$, and $\hat{x} = \Psi(x)\in\Vcal^{**}$ acts on a linear functional $f\in\Vcal^{*}$ and evaluates $f$ at $x$ returning an element in the field $\Field$, i.e.,</p>
<div>
$$
\begin{align}
f & \xrightarrow{\hat{x}\in\Vcal^{**}} & f(x)\\
\Vcal^* & \xrightarrow{\hat{x}\in\Vcal^{**}} & \Field
\end{align}
$$
</div>
<blockquote>
<p><strong>Theorem</strong> - If $\Vcal$ is a vector space of finite dimension, then $\Psi : \Vcal\to\Vcal^{**}$ is an isomorphism.
First, we show that $\Psi$ is linear. If $x,y\in\Vcal$, $f\in\Vcal^{*}$, and $c\in\Field$, then</p>
</blockquote>
<div>
$$
\begin{align}
\Psi(x+cy)(f) &= f(x+cy)\\
&= f(x)+cf(y)\\
&= (\Psi(x) + c\Psi(y))(f)
\end{align}
$$
</div>
<p>Next, we show that $\Psi$ is one to one. Suppose $\hat{x} = 0$, the zero functional, then $x = 0$. Let $\beta = \curly{v_i}_{i=1}^n$, then $x = \sumi{1}{n}a_iv_i$ for some $\curly{a_i\in\Field}$. Therefore, if $\Psi(x) = \hat{x}=0$, then</p>
<div>
$$
\begin{align}
\Psi(x) &= \Psi\round{\sumi{1}{n}a_iv_i}\\
&= \sumi{1}{n}a_i\Psi(v_i)\\
\implies \Psi(x)(f) &= \sumi{1}{n}a_i\Psi(v_i)(f)\qquad \forall\ f \in\Vcal^*\\
&= \sumi{1}{n}a_if(v_i)
\end{align}
$$
</div>
<p>For $f\in\curly{f_i}_{i=1}^n$, we get $a_i = 0\ \forall\ i \in\squarebrack{n}$. This implies that $x = \sumi{1}{n}a_iv_i = 0$. Since $\mathrm{dim}(\Vcal) = \mathrm{dim}(\Vcal^{**})$, we have that $\Psi$ is one to one and therefore an isomorphism.
The isomorphism between $\Vcal$ and $\Vcal^{**}$, $\Psi$, allows us to map each element in $\Vcal$ to a unique element in $\Vcal^{**}$.</p>
<h1 id="the-infinite-sequence-of-dual-spaces">The infinite sequence of dual spaces</h1>
<p>We can continue thinking about the dual of the parent space and there always will exist an isomorphism between alternate dual spaces. Formally, Let us denote $\Vcal^{(i+1)*} = \round{\Vcal^{(i)*}}^{*}$ for all $i\in\Natural$ where $\Vcal^{(0)*} = \Vcal$ is a finite dimensional vector space. Let $\Psi_i$ be the isomorphism between $\Vcal^{(i-1)*}$ and $\Vcal^{(i+1)*}$. Then, in general define $\hat{x}^{(i+1)} = \Psi_{i}\round{\hat{x}^{(i-1)}}$ for $\hat{x}^{(i-1)}\in\Vcal^{(i-1)*}$ when $i$ is odd, and $\hat{f}^{(i+1)} = \Psi_{i}\round{\hat{f}^{(i-1)}}$ for $\hat{f}^{(i-1)}\in\Vcal^{(i-1)*}$ when $i$ is an even natural number, with $\hat{x}^{(0)} = x\in\Vcal$, and $\hat{f}^{(1)} = f\in\Vcal^{*}$. Then by the property of these isomorphisms we have</p>
<div>
$$
\hat{x}^{(i+1)}\round{\hat{f}^{(i)}} = \Psi_i\round{\hat{x}^{(i-1)}}\round{\hat{f}^{(i)}} = \hat{f}^{(i)}\round{\hat{x}^{(i-1)}} = \ldots = \hat{x}^{(2)}\round{\hat{f}^{(1)}} = \hat{x}^{(2)}(f)
$$
</div>
<p>when $i$ is odd, and</p>
<div>
$$
\hat{f}^{(i+1)}\round{\hat{x}^{(i)}} = \Psi_{i}\round{\hat{f}^{(i-1)}}\round{\hat{x}^{(i)}} = \hat{x}^{(i)}\round{\hat{f}^{(i-1)}} = \ldots = f^{(1)}(x) = f(x)
$$
</div>
<p>when $i$ is an even natural number. However, note that $\hat{x}^{(2)}(f)$ and $f(x)$ are the same quantities in $\Field$, the difference lies in the order of application of these linear functionals, which makes them the dual representation of each other.</p>
<h1 id="application-to-fenchel-conjugate">Application to Fenchel conjugate</h1>
<p>With this formalization, one can reason about the mathematical object, Fenchel conjugate in an elegant way. Let us first define what a Fenchel conjugate of a function is.</p>
<blockquote>
<p><strong>Fenchel conjugate</strong> - Let $\Vcal^{*} = \mathscr{L}(\Vcal,\Field)$ be the dual space of $\Vcal$, and $\inner{\ \cdot\ ,\ \cdot\ } : \Vcal^{*} \times \Vcal \to \Field$ denote the dual pairing between the two vector spaces. For a function $f : \Vcal\to \Field\cup\curly{\infty_{\Field}}$, its convex conjugate $f^{*} : \Vcal^{*}\to \Field\cup\curly{\infty_{\Field}}$ is defined as</p>
\[\begin{equation}
f^{*}(g) = \sup\limits_{\x\in\Vcal}\curly{\inner{g,\x} - f(\x)} \qquad \forall\ g\in\Vcal^{*}
\end{equation}\]
</blockquote>
<p>For a differentiable function $f:\Vcal\to\Field$, we have from the Taylor’s expansion that $\nabla f(\x)$ is a linear functional for all $\x\in\Vcal$, and therefore lies in the dual space $\Vcal^{*}$. If $f^{*}:\Vcal^{*}\to\Field$ is the convex conjugate of $f$, we have that $\nabla f^{*}(g)$ is an element in $\Vcal^{**}$ for all $g\in\Vcal^{*}$. Since there exists a natural isomorphism $\Psi : \Vcal\to\Vcal^{**}$ between $\Vcal$ and $\Vcal^{**}$ which is the evaluation map, this in particular gives us $\Psi(\x) = \hat{\x} = \nabla f^{*}\round{\nabla f(\x)}\in\Vcal^{**}$.</p>Raghav SomaniDual spaces lie at the core of linear algebra and allows us to formally reason about the concept of duality in mathematics. Duality shows up naturally and elegantly in measure theory, functional analysis, and mathematical optimization. In this post, I have tried to learn and explore the nature of duality via Dual spaces, its interpretation in general linear algebra, all of which was motivated by the so called _convex conjugate_, or the _Fenchel conjugate_ in mathematical optimization.A survey on Strongly Rayleigh measures and their mixing time analysis2020-01-02T00:00:00-08:002020-01-02T00:00:00-08:00https://raghavsomani.github.io/posts/2020/01/blog-post-10<p>Strongly Rayleigh measures are natural generalizations of measures that satisfy the notion of negative dependence. The Negative Dependence property reflects a <em>repelling</em> nature of items, a property that occurs across probability theory, combinatorial optimization, physics, and other fields. Negatively dependent probability measures provide a powerful tool for modeling non-i.i.d. data and thus can impact all aspects of learning and algorithm design like anomaly detection, information maximization, experimental design, fast MCMC sampling, interpretable learning etc. The class of Strongly Rayleigh measures provides the most useful characterization of Negative Dependence by grounding it in the theory of multivariate stable polynomials. This post attempts to throw some light on the origin of Strongly Rayleigh measures and Determinantal Point Processes and highlights the fast mixing time analysis of the natural MCMC chain in the support of a Strongly Rayleigh measure as shown by <a href="https://arxiv.org/abs/1602.05242">Anari, Gharan and Rezaei 2016</a>.</p>
<p>This work has been done with <a href="https://mitchellnw.github.io/">Mitchell Wortsman</a> as a part of our joint course project for <a href="https://courses.cs.washington.edu/courses/cse521/19au/">Design and Analysis of Algorithms</a> instructed by <a href="https://homes.cs.washington.edu/~shayan/">Prof. Shayan Oveis Gharan</a>. The complete PDF post can be viewed <a href="\files\SRmixing.pdf" target="_blank">here</a>.</p>Raghav SomaniStrongly Rayleigh measures are natural generalizations of measures that satisfy the notion of negative dependence. The class of Strongly Rayleigh measures provides the most useful characterization of Negative Dependence by grounding it in the theory of multivariate stable polynomials. This post attempts to throw some light on the origin of Strongly Rayleigh measures and Determinantal Point Processes and highlights the fast mixing time analysis of the natural MCMC chain in the support of a Strongly Rayleigh measure as shown by [Anari, Gharan and Rezaei 2016](https://arxiv.org/abs/1602.05242).Analysis of Newton’s Method2019-10-12T00:00:00-07:002019-10-12T00:00:00-07:00https://raghavsomani.github.io/posts/2019/10/blog-post-9<p>In optimization, Netwon’s method is used to find the roots of the derivative of a twice differentiable function given the oracle access to its gradient and hessian. By having super-liner memory in the dimension of the ambient space, Newton’s method can take the advantage of the second order curvature and optimize the objective function at a quadratically convergent rate. Here I consider the case when the objective function is smooth and strongly convex.</p>
<p>The complete PDF post can be viewed <a href="\files\newton.pdf" target="_blank">here</a>.</p>Raghav SomaniIn optimization, Netwon's method is used to find the roots of the derivative of a twice differentiable function given the oracle access to its gradient and hessian. By having super-liner memory in the dimension of the ambient space, Newton's method can take the advantage of the second order curvature and optimize the objective function at a quadratically convergent rate. Here I consider the case when the objective function is smooth and strongly convex.Deriving the Fokker-Planck equation2019-06-11T00:00:00-07:002019-06-11T00:00:00-07:00https://raghavsomani.github.io/posts/2019/06/blog-post-8<p>In the theory of dynamic systems, Fokker-Planck equation is used to describe the time evolution of the probability density function. It is a partial differential equation that describes how the density of a stochastic process changes as a function of time under the influence of a potential field. Some common application of it are in the study of Brownian motion, Ornstein–Uhlenbeck process, and in statistical physics. Here I attempt to note the formal derivation of the partial differential equation by deriving a master equation and using Taylor series to obtain the Kramers-Moyal expansion. A special case of the expansion with finite sum is called the Fokker-Planck equation.</p>
<p>The complete PDF post can be viewed <a href="\files\fokker_planck.pdf" target="_blank">here</a>.</p>Raghav SomaniIn the theory of dynamic systems, Fokker-Planck equation is used to describe the time evolution of the probability density function. It is a partial differential equation that describes how the density of a stochastic process changes as a function of time under the influence of a potential field. Some common application of it are in the study of Brownian motion, Ornstein–Uhlenbeck process, and in statistical physics. The motivation behind understanding the derivation is to study Levy flight processes that has caught my recent attention.SGD without replacement2019-03-24T00:00:00-07:002019-03-24T00:00:00-07:00https://raghavsomani.github.io/posts/2019/03/blog-post-7<p>This article is in continuation of my <a href="https://raghavsomani.github.io/posts/2018/07/blog-post-6/">previous blog</a>, and discusses about the work by <a href="https://arxiv.org/pdf/1903.01463.pdf" target="_blank">Prateek Jain, Dheeraj Nagaraj and Praneeth Netrapalli 2019</a> which attempts to answer Léon Bottou’s (2009) open question of understanding SGD without replacement. The authors provide tight rates for SGD without replacement for general smooth, and general smooth and strongly convex functions using the method of exchangeable pairs to bound Wasserstein distances, and techniques from optimal transport. They show that SGD without replacement on general smooth and strongly convex functions can achieve a rate of \(\mathcal{O}\left( \frac{1}{K^2} \right)\) where \(K\) is the number of passes over the data. This result requires \(K\in\mathcal{\Omega}(\kappa^2)\) where $\kappa$ is the condition number of the problem. They show that SGD without replacement matches the rate of SGD with replacement when $K$ is smaller than \(\mathcal{O}(\kappa^2)\). This is strictly better than SGD with replacement which has a rate of \(\mathcal{O}\left( \frac{1}{K} \right)\) in the similar setting. Their analysis does not require the Hessian Lipschitz condition as required by other previous works and holds for any general smooth and strongly convex function. They also show that SGD without replacement is at least as good as SGD with replacement for general smooth convex functions in the absence of strong convexity.</p>
<p>The complete PDF post can be viewed <a href="\files\SGD_without_replacement.pdf" target="_blank">here</a>.</p>Raghav SomaniThis article is in continuation of my [previous blog](https://raghavsomani.github.io/posts/2018/04/blog-post-6/), and discusses about the work by [Prateek Jain, Dheeraj Nagaraj and Praneeth Netrapalli 2019](https://arxiv.org/pdf/1903.01463.pdf){:target="_blank"}. The authors provide tight rates for SGD without replacement for general smooth, and general smooth and strongly convex functions using the method of exchangeable pairs to bound Wasserstein distances, and techniques from optimal transport.Non-asymptotic rate for Random Shuffling for Quadratic functions2018-07-12T00:00:00-07:002018-07-12T00:00:00-07:00https://raghavsomani.github.io/posts/2018/07/blog-post-6<p>This article is in continuation of my <a href="https://raghavsomani.github.io/posts/2018/04/blog-post-4/" target="_blank">previous blog</a>, and discusses about a section of the work by <a href="https://arxiv.org/pdf/1806.10077.pdf" target="_blank">Jeffery Z. HaoChen and Suvrit Sra 2018</a>, in which the authors come up with a non-asymptotic rate of \(\mathcal{O}\left(\frac{1}{T^2} + \frac{n^3}{T^3} \right)\) for Random Shuffling Stochastic algorithm which is strictly better than that of SGD. The article talks about the simple case when the objective function is a sum of quadratic functions where with a fixed step-size and after a reasonable number of epochs, we can guarentee a faster rate for Random Shuffling.</p>
<p>The complete PDF post can be viewed <a href="\files\RRQuadratic.pdf" target="_blank">here</a>.</p>Raghav SomaniThis article is in continuation of my [previous blog](https://raghavsomani.github.io/posts/2018/04/blog-post-4/), and discusses about a section of the work by [Jeffery Z. HaoChen and Suvrit Sra 2018](https://arxiv.org/pdf/1806.10077.pdf){:target="_blank"}, in which the authors come up with a non-asymptotic rate of $$\mathcal{O}\left(\frac{1}{T^2} + \frac{n^3}{T^3} \right)$$ for Random Shuffling Stochastic algorithm which is strictly better than that of SGD.Bias-Variance Trade-offs for Averaged SGD in Least Mean Squares2018-07-04T00:00:00-07:002018-07-04T00:00:00-07:00https://raghavsomani.github.io/posts/2018/07/blog-post-5<p>This article is on the work by <a href="https://arxiv.org/pdf/1412.0156.pdf" target="_blank">Défossez and Bach 2014</a>, in which the authors develop an operator view point for analyzing Averaged SGD updates to show the Bias-Variance Trade-off and provide tight convergence rates of Least Mean Squared problem.</p>
<p>The complete PDF post can be viewed <a href="\files\BiasVariance.pdf" target="_blank">here</a>.</p>Raghav SomaniThis article is on the work by [Défossez and Bach 2014](https://arxiv.org/pdf/1412.0156.pdf){:target="_blank"}, in which the authors develop an operator view point for analyzing Averaged SGD updates to show the Bias-Variance Trade-off and provide tight convergence rates of Least Mean Squared problem.