Dual spaces and the Fenchel conjugate

Posted on:

\(\newcommand{\round}[1]{\left( #1 \right)} \newcommand{\curly}[1]{\left\lbrace #1 \right\rbrace} \newcommand{\squarebrack}[1]{\left\lbrack #1 \right\rbrack} \newcommand{\sumi}[2]{\sum\limits_{i=#1}^{#2}} \newcommand{\sumj}[2]{\sum\limits_{j=#1}^{#2}} \newcommand{\sumk}[2]{\sum\limits_{k=#1}^{#2}} \newcommand{\sump}[2]{\sum\limits_{p=#1}^{#2}} \newcommand{\suml}[2]{\sum\limits_{l=#1}^{#2}} \newcommand{\sumn}[2]{\sum\limits_{n=#1}^{#2}} \newcommand{\summ}[2]{\sum\limits_{m=#1}^{#2}} \newcommand{\sumt}[2]{\sum\limits_{t=#1}^{#2}} \newcommand{\Sum}{\sum_{i = 1}^{n}} \newcommand{\Sumi}[1]{\sum\limits_{i = 1}^{#1}} \newcommand{\Sumt}[1]{\sum\limits_{t = 1}^{#1}} \newcommand{\abs}[1]{\left\lvert #1 \right\rvert} \newcommand{\norm}[2]{\left\lVert#2\right\rVert_{#1}} \newcommand{\esqnorm}[1]{\left\lVert#1\right\rVert_2^2} \newcommand{\enorm}[1]{\left\lVert#1\right\rVert_2} \newcommand{\infnorm}[1]{\left\lVert#1\right\rVert_\infty} \newcommand{\opnorm}[1]{\left\lVert#1\right\rVert_\text{op}} \newcommand{\normF}[1]{\left\lVert#1\right\rVert_{\text{F}}} \newcommand{\inner}[1]{\left\langle#1\right\rangle} \newcommand{\ceil}[1]{\left\lceil#1\right\rceil} \newcommand{\floor}[1]{\left\lfloor#1\right\rfloor} \newcommand{\zero}{\mathbf{0}} \newcommand{\one}{\mathbf{1}} \newcommand{\avec}{\mathbf{a}} \newcommand{\bvec}{\mathbf{b}} \newcommand{\cvec}{\mathbf{c}} \newcommand{\dvec}{\mathbf{d}} \newcommand{\e}{\mathbf{e}} \newcommand{\f}{\mathbf{f}} \newcommand{\g}{\mathbf{g}} \newcommand{\h}{\mathbf{h}} \newcommand{\ivec}{\mathbf{i}} \newcommand{\jvec}{\mathbf{j}} \newcommand{\kvec}{\mathbf{k}} \newcommand{\lvec}{\mathbf{l}} \newcommand{\m}{\mathbf{m}} \newcommand{\n}{\mathbf{n}} \newcommand{\ovec}{\mathbf{o}} \newcommand{\p}{\mathbf{p}} \newcommand{\q}{\mathbf{q}} \newcommand{\rvec}{\mathbf{r}} \newcommand{\s}{\mathbf{s}} \newcommand{\tvec}{\mathbf{t}} \newcommand{\uvec}{\mathbf{u}} \newcommand{\vvec}{\mathbf{v}} \newcommand{\w}{\mathbf{w}} \newcommand{\x}{\mathbf{x}} \newcommand{\y}{\mathbf{y}} \newcommand{\z}{\mathbf{z}} \newcommand{\A}{\mathbf{A}} \newcommand{\B}{\mathbf{B}} \newcommand{\C}{\mathbf{C}} \newcommand{\D}{\mathbf{D}} \newcommand{\Emat}{\mathbf{E}} \newcommand{\F}{\mathbf{F}} \newcommand{\G}{\mathbf{G}} \newcommand{\Hmat}{\mathbf{H}} \newcommand{\I}{\mathbf{I}} \newcommand{\J}{\mathbf{J}} \newcommand{\K}{\mathbf{K}} \newcommand{\Lmat}{\mathbf{L}} \newcommand{\M}{\mathbf{M}} \newcommand{\N}{\mathbf{N}} \newcommand{\Omat}{\mathbf{O}} \newcommand{\Pmat}{\mathbf{P}} \newcommand{\Q}{\mathbf{Q}} \newcommand{\Rmat}{\mathbf{R}} \newcommand{\Smat}{\mathbf{S}} \newcommand{\T}{\mathbf{T}} \newcommand{\U}{\mathbf{U}} \newcommand{\V}{\mathbf{V}} \newcommand{\W}{\mathbf{W}} \newcommand{\X}{\mathbf{X}} \newcommand{\Y}{\mathbf{Y}} \newcommand{\Z}{\mathbf{Z}} \newcommand{\SIGMA}{\mathbf{\Sigma}} \newcommand{\LAMBDA}{\mathbf{\Lambda}} \newcommand{\Acal}{\mathcal{A}} \newcommand{\Bcal}{\mathcal{B}} \newcommand{\Ccal}{\mathcal{C}} \newcommand{\Dcal}{\mathcal{D}} \newcommand{\Ecal}{\mathcal{E}} \newcommand{\Fcal}{\mathcal{F}} \newcommand{\Gcal}{\mathcal{G}} \newcommand{\Hcal}{\mathcal{H}} \newcommand{\Ical}{\mathcal{I}} \newcommand{\Jcal}{\mathcal{J}} \newcommand{\Kcal}{\mathcal{K}} \newcommand{\Lcal}{\mathcal{L}} \newcommand{\Mcal}{\mathcal{M}} \newcommand{\Ncal}{\mathcal{N}} \newcommand{\Ocal}{\mathcal{O}} \newcommand{\Pcal}{\mathcal{P}} \newcommand{\Qcal}{\mathcal{Q}} \newcommand{\Rcal}{\mathcal{R}} \newcommand{\Scal}{\mathcal{S}} \newcommand{\Tcal}{\mathcal{T}} \newcommand{\Ucal}{\mathcal{U}} \newcommand{\Vcal}{\mathcal{V}} \newcommand{\Wcal}{\mathcal{W}} \newcommand{\Xcal}{\mathcal{X}} \newcommand{\Ycal}{\mathcal{Y}} \newcommand{\Zcal}{\mathcal{Z}} \newcommand{\alphavec}{\boldsymbol{\alpha}} \newcommand{\betavec}{\boldsymbol{\beta}} \newcommand{\gammavec}{\boldsymbol{\gamma}} \newcommand{\deltavec}{\boldsymbol{\delta}} \newcommand{\epsvec}{\boldsymbol{\epsilon}} \newcommand{\etavec}{\boldsymbol{\eta}} \newcommand{\nuvec}{\boldsymbol{\nu}} \newcommand{\tauvec}{\boldsymbol{\tau}} \newcommand{\rhovec}{\boldsymbol{\rho}} \newcommand{\lmbda}{\boldsymbol{\lambda}} \newcommand{\muvec}{\boldsymbol{\mu}} \newcommand{\thetavec}{\boldsymbol{\theta}} \newcommand{\BigO}[1]{\mathcal{O}\round{#1}} \newcommand{\BigOmega}[1]{\Omega\round{#1}} \newcommand{\R}{\mathbb{R}} \newcommand{\Rd}[1]{\mathbb{R}^{#1}} \newcommand{\Natural}{\mathbb{N}} \newcommand{\Complex}{\mathbb{C}} \newcommand{\Integer}{\mathbb{Z}} \newcommand{\Rational}{\mathbb{Q}} \newcommand{\Field}{\mathbb{F}} \newcommand{\E}[1]{\mathbb{E}\squarebrack{#1}} \newcommand{\Exp}[2]{\mathbb{E}_{#1}\squarebrack{#2}} \newcommand{\Prob}[1]{P\curly{#1}} \newcommand{\Var}[1]{\mathrm{Var}\squarebrack{#1}} \newcommand{\inv}[1]{\frac{1}{#1}} \newcommand{\indicator}[2]{\mathbbm{1}_{#1}\squarebrack{#2}} \newcommand{\Tr}[1]{\text{Tr}\squarebrack{#1}} \newcommand{\BOX}[1]{\fbox{\parbox{\linewidth}{\centering#1}}} \newcommand{\textequal}[1]{\stackrel{#1}{=}} \newcommand{\textleq}[1]{\stackrel{#1}{\leq}} \newcommand{\textgeq}[1]{\stackrel{#1}{\geq}} \newcommand{\dd}[2]{\frac{d #1}{d #2}} \newcommand{\ddn}[3]{\frac{d^{#1} #2}{d #3^{#1}}} \newcommand{\dodo}[2]{\frac{\partial #1}{\partial #2}} \newcommand{\dodon}[3]{\frac{\partial^{#1} #2}{\partial {#3}^{#1}}}\)Dual spaces lie at the core of linear algebra and allows us to formally reason about the concept of duality in mathematics. Duality shows up naturally and elegantly in measure theory, functional analysis, and mathematical optimization. In this post, I have tried to learn and explore the nature of duality via Dual spaces, its interpretation in general linear algebra, all of which was motivated by the so called convex conjugate, or the Fenchel conjugate in mathematical optimization.

For the sake of abstractness and general applicability, we will consider that we are in a general field $\Field$ which for example can be $\R$, $\Complex$ etc. Our space of vectors, is the vector space $\Vcal$, which in general can be subspace of $\Field^d$ for some $d\in\Natural$ for example.

Dual spaces

Dual space - The dual space $\Vcal^{*}$ of the vector space $\Vcal$, is the set of all linear transformations $f$ from $\Vcal$ to $\Field$ denoted as $\Vcal^{*} = \mathscr{L}(\Vcal,\Field)$.

Linear functionals - An element $f:\Vcal\to\Field$ of $\Vcal^{*}$ is called a linear functional, which takes in a vector $x\in\Vcal$, and outputs an element in $\Field$.

Note that if $f,g\in\Vcal^{*}$, then $\alpha f + \beta g$ is also in $\Vcal^{*}$ for $\alpha,\beta\in\Field$ and would be defined as $(\alpha f + \beta g)(x) = \alpha f(x) + \beta g(x)$ for all $x\in\Vcal$. Let us look at a few examples :

  1. If $\Vcal = \Pcal_n$ is the space of all univariate polynomial over the field $\R$ with degree $n\in\Natural$, then $f$ defined as $f(p) = p(1)+2p’(0)$, is a linear functional contained in $\Vcal^{*} = \mathscr{L}(\Pcal_n,\R)$.
  2. If $\Vcal = \Rd{m\times n}$ be the space of all real matrices of size $m\times n$, then $f$ defined as $f(\M) = \Tr{\M}$, is a linear functional contained in $\Vcal^{*} = \mathscr{L}(\Rd{m\times n},\R)$.
  3. If $\Vcal = \Ccal\round{\squarebrack{-\pi,\pi}}$ is the space of all real valued continuous functions on $\squarebrack{-\pi,\pi}$, then $f$ defined as $f(g) = \int_{-\pi}^{\pi} g(x)\cos(2x)dx$, is a linear functional contained in $\Vcal^{*} = \mathscr{L}(\squarebrack{-\pi,\pi},\R)$. This function essentially outputs the 2nd Fourier coefficient of its input function.

Dual basis

Let $\Vcal$ be a finite dimensional vector space, and $\beta = \curly{v_i}_{i=1}^n$ be a basis of $\Vcal$. Therefore, to determine the action of any linear functional $f\in\Vcal^{*}$, it is enough to know its action on the basis vectors $\curly{v_i}_{i=1}^n$, i.e., it is enough to know what $\curly{f(v_i)\in\Field}_{i=1}^n$ are.

Dual basis - If $\Vcal$ is finite dimensional and $\beta = \curly{v_i}_{i=1}^n$ is a basis of $\Vcal$, then $\beta^{*} = \curly{f_i}_{i=1}^n$, the set of functionals defined by $f_i(v_j) = \delta_{ij}\ \forall\ i,j\in\squarebrack{n}$, is the corresponding basis of $\Vcal^{*}$ also known as a dual basis of $\Vcal$.

To verify that $\beta^{*}$ is indeed a basis of $\Vcal^{*}$, we just need to check the following

  1. $\beta^{*}$ is a linearly independent set of linear functionals, i.e., if $\sumi{1}{n}a_if_i = f_0$ for some set $\curly{a_j\in\Field}_{j=1}^n$, then $a_j=0\ \forall\ j\in\squarebrack{n}$, where $f_0$ is the $0$ functional. To verify this, we just need to check this condition for all the basis elements of $\Vcal$, $v \in\beta$, which gives us $\sumi{1}{n}a_if_i(v_j) = 0\ \forall\ j \in\squarebrack{n}$, and implies that $a_j = 0\ \forall\ j \in\squarebrack{n}$ establishing the linear independence of the set $\beta^{*}$.
  2. $\beta^{*}$ spans $\Vcal^{*}$, i.e., for any $f\in\Vcal^{*}$, we have some set of elements $\curly{a_j\in\Field}_{j=1}^n$ such that $f(v) = \sumi{1}{n}a_if_i(v)\ \forall\ v\in\Vcal$. To check this, we evaluate $f$ on basis vectors $v_j\in\beta$ for $j\in\squarebrack{n}$, which gives us $f(v_j) = \sumi{1}{n}a_if_i(v_j) = \sumi{1}{n}a_i\delta_{ij} = a_j\ \forall\ j\in\squarebrack{n}$, or, $a_j = f(v_j)\ \forall\ j \in\squarebrack{n}$. Therefore, if $f$ is in the span of $\beta^{*}$, then it must be the case that the coefficients satisfy $a_j=f(v_j)\ \forall\ j \in \squarebrack{n}$. Now, if $g = \sumi{1}{n}f(v_i)f_i$, then to show that $g=f$, it is enough to show that they agree on the basis vectors $\curly{v_j}$ of $\Vcal$, that is $f(v_i)=g(v_i)$. This is simply true by the construction of $\curly{f_i}_{i=1}^n$, therefore $\beta^{*}$ spans $\Vcal^{*}$.

This establishes that $\beta^{*} = \curly{f_i}_{i=1}^n$ is indeed a basis of $\Vcal^{*}$, and any $f\in\Vcal^{*}$ can be decomposed into the basis of $\Vcal^{*}$ as

\[\begin{equation} f = \sumi{1}{n}f(v_i)f_i \end{equation}\]

where $\beta = \curly{v_i}$ is the basis of $\Vcal$. The existence of dual basis $\beta^{*}$ and the relation with $\beta$ also shows that $\Vcal$ and $\Vcal^{*}$ are isomorphic.

Transpose

The existence of the dual basis $\beta^{*}$ helps us define the concept of transpose in linear algebra which is extensively used in application.

Transpose - Let $T:\Vcal\to\Wcal$ be a linear transformation from the vector space $\Vcal$ to the vector space $\Wcal$, then $T^\top:\Wcal^{*}\to\Vcal^{*}$ is a linear transformation from $\Wcal^{*}$ to $\Vcal^{*}$ that takes in a linear functional $g\in\Wcal^{*}$, and outputs another linear functional $T^\top(g)\in\Vcal^{*}$ defined as $T^\top(g)(x) = g\round{T(x)}\ \forall\ x\in\Vcal$.

$$ \begin{align} x &\xrightarrow{T} T(x) &\xrightarrow{g\in\Wcal^{*}}&\ g(T(x)) &\qquad\equiv\qquad& x &\xrightarrow{T^\top(g)\in\Vcal^*}& g(T(x))\\ \Vcal &\xrightarrow{T}\Wcal &\xrightarrow{g\in\Wcal^*}&\ \Field &\qquad\qquad& \Vcal &\xrightarrow{T^\top(g)\in\Vcal^*}& \Field \end{align} $$

Theorem - Let $T:\Vcal\to\Wcal$ be a linear transformation from vector space $\Vcal$ to vector space $\Wcal$. Let $\beta = \curly{v_i}_{i=1}^n$ and $\beta^{*} = \curly{f_i}_{i=1}^n$ be the basis of $\Vcal$ and $\Vcal^{*}$ respectively, and $\gamma = \curly{w_j}_{j=1}^m$ and $\gamma^{*} = \curly{g_j}_{j=1}^m$ be the basis of $\Wcal$ and $\Wcal^{*}$ respectively. Let $A = \squarebrack{T}_{\beta}^\gamma$ be the matrix which transforms a vector in $\Vcal$ to a vector in $\Wcal$. Then $\squarebrack{T^\top}_{\gamma^{*}}^{\beta^{*}}=A^\top$.

The matrix of $\squarebrack{T^\top}_{\gamma^{*}}^{\beta^{*}}$ takes in an element of $\Wcal^{*}$ represented in its basis $\gamma^{*}$, and outputs an element in $\Vcal^{*}$ represented in its basis $\beta^{*}$. Therefore, it is sufficient to represent every element in $\curly{T^\top(g_j)}_{j=1}^m$ in the basis $\beta^{*}$ of $\Vcal^{*}$. Since any vector $f$ in $\Vcal^{*}$ can be expressed as a linear combination of the basis vectors $\beta^{*}$ as we proved earlier, we have

$$ \begin{align} T^\top(g_j) &= \sumi{1}{n}T^\top(g_j)(v_i)f_i\\ &= \sumi{1}{n}g_j(T(v_i))f_i \end{align} $$

Therefore, the $(i,j)$-th element of the matrix $\squarebrack{T^\top}_{\gamma^{*}}^{\beta^{*}}$ is $g_j(T(v_i))$. Now, the elements of the matrix $\squarebrack{T}_{\beta}^{\gamma}$ are the representations of the basis vectors $\beta$ of $\Vcal$, in the basis vectors $\gamma$ of $\Wcal$. Therefore we have,

$$ \begin{align} T(v_i) &= \sumk{1}{m}A_{ki}w_k\\ \implies g_j(T(v_i)) &= g_j\round{\sumk{1}{m}A_{ki}w_k}\\ &= \sumk{1}{m}A_{ki}g_j(w_k)\qquad(\because g \text{ is a linear functional})\\ &= A_{ji}\qquad(\because g_j(w_k)=\delta_{jk}) \end{align} $$

Which finally gives us

$$ \begin{align} T^\top(g_j) &= \sumi{1}{n}A_{ji}f_i \end{align} $$

and shows us that the $(i,j)$-th element of the matrix $\squarebrack{T^\top}_{\gamma^{*}}^{\beta^{*}}$ is nothing but the $(j,i)$-th element of $A$, which implies that the matrix $\squarebrack{T^\top}_{\gamma^{*}}^{\beta^{*}} = A^\top$.

Double Dual

If $\Vcal$ is a vector space, then its dual space $\Vcal^{*}$ is also a vector space. We can again define the set of all linear transformations from $\Vcal^{*}$ to $\Field$ as its dual space $\Vcal^{**} = \mathscr{L}(\Vcal^{*},\Field)$, which makes it the double dual space of $\Vcal$. Unlike $\Vcal^{*}$ where we needed to define basis, there is an elegant way to go from $\Vcal$ to $\Vcal^{**}$ via an injective map $\Psi$. If $x\in\Vcal$, and $\hat{x} = \Psi(x)\in\Vcal^{**}$ acts on a linear functional $f\in\Vcal^{*}$ and evaluates $f$ at $x$ returning an element in the field $\Field$, i.e.,

$$ \begin{align} f & \xrightarrow{\hat{x}\in\Vcal^{**}} & f(x)\\ \Vcal^* & \xrightarrow{\hat{x}\in\Vcal^{**}} & \Field \end{align} $$

Theorem - If $\Vcal$ is a vector space of finite dimension, then $\Psi : \Vcal\to\Vcal^{**}$ is an isomorphism. First, we show that $\Psi$ is linear. If $x,y\in\Vcal$, $f\in\Vcal^{*}$, and $c\in\Field$, then

$$ \begin{align} \Psi(x+cy)(f) &= f(x+cy)\\ &= f(x)+cf(y)\\ &= (\Psi(x) + c\Psi(y))(f) \end{align} $$

Next, we show that $\Psi$ is one to one. Suppose $\hat{x} = 0$, the zero functional, then $x = 0$. Let $\beta = \curly{v_i}_{i=1}^n$, then $x = \sumi{1}{n}a_iv_i$ for some $\curly{a_i\in\Field}$. Therefore, if $\Psi(x) = \hat{x}=0$, then

$$ \begin{align} \Psi(x) &= \Psi\round{\sumi{1}{n}a_iv_i}\\ &= \sumi{1}{n}a_i\Psi(v_i)\\ \implies \Psi(x)(f) &= \sumi{1}{n}a_i\Psi(v_i)(f)\qquad \forall\ f \in\Vcal^*\\ &= \sumi{1}{n}a_if(v_i) \end{align} $$

For $f\in\curly{f_i}_{i=1}^n$, we get $a_i = 0\ \forall\ i \in\squarebrack{n}$. This implies that $x = \sumi{1}{n}a_iv_i = 0$. Since $\mathrm{dim}(\Vcal) = \mathrm{dim}(\Vcal^{**})$, we have that $\Psi$ is one to one and therefore an isomorphism. The isomorphism between $\Vcal$ and $\Vcal^{**}$, $\Psi$, allows us to map each element in $\Vcal$ to a unique element in $\Vcal^{**}$.

The infinite sequence of dual spaces

We can continue thinking about the dual of the parent space and there always will exist an isomorphism between alternate dual spaces. Formally, Let us denote $\Vcal^{(i+1)*} = \round{\Vcal^{(i)*}}^{*}$ for all $i\in\Natural$ where $\Vcal^{(0)*} = \Vcal$ is a finite dimensional vector space. Let $\Psi_i$ be the isomorphism between $\Vcal^{(i-1)*}$ and $\Vcal^{(i+1)*}$. Then, in general define $\hat{x}^{(i+1)} = \Psi_{i}\round{\hat{x}^{(i-1)}}$ for $\hat{x}^{(i-1)}\in\Vcal^{(i-1)*}$ when $i$ is odd, and $\hat{f}^{(i+1)} = \Psi_{i}\round{\hat{f}^{(i-1)}}$ for $\hat{f}^{(i-1)}\in\Vcal^{(i-1)*}$ when $i$ is an even natural number, with $\hat{x}^{(0)} = x\in\Vcal$, and $\hat{f}^{(1)} = f\in\Vcal^{*}$. Then by the property of these isomorphisms we have

$$ \hat{x}^{(i+1)}\round{\hat{f}^{(i)}} = \Psi_i\round{\hat{x}^{(i-1)}}\round{\hat{f}^{(i)}} = \hat{f}^{(i)}\round{\hat{x}^{(i-1)}} = \ldots = \hat{x}^{(2)}\round{\hat{f}^{(1)}} = \hat{x}^{(2)}(f) $$

when $i$ is odd, and

$$ \hat{f}^{(i+1)}\round{\hat{x}^{(i)}} = \Psi_{i}\round{\hat{f}^{(i-1)}}\round{\hat{x}^{(i)}} = \hat{x}^{(i)}\round{\hat{f}^{(i-1)}} = \ldots = f^{(1)}(x) = f(x) $$

when $i$ is an even natural number. However, note that $\hat{x}^{(2)}(f)$ and $f(x)$ are the same quantities in $\Field$, the difference lies in the order of application of these linear functionals, which makes them the dual representation of each other.

Application to Fenchel conjugate

With this formalization, one can reason about the mathematical object, Fenchel conjugate in an elegant way. Let us first define what a Fenchel conjugate of a function is.

Fenchel conjugate - Let $\Vcal^{*} = \mathscr{L}(\Vcal,\Field)$ be the dual space of $\Vcal$, and $\inner{\ \cdot\ ,\ \cdot\ } : \Vcal^{*} \times \Vcal \to \Field$ denote the dual pairing between the two vector spaces. For a function $f : \Vcal\to \Field\cup\curly{\infty_{\Field}}$, its convex conjugate $f^{*} : \Vcal^{*}\to \Field\cup\curly{\infty_{\Field}}$ is defined as

\[\begin{equation} f^{*}(g) = \sup\limits_{\x\in\Vcal}\curly{\inner{g,\x} - f(\x)} \qquad \forall\ g\in\Vcal^{*} \end{equation}\]

For a differentiable function $f:\Vcal\to\Field$, we have from the Taylor’s expansion that $\nabla f(\x)$ is a linear functional for all $\x\in\Vcal$, and therefore lies in the dual space $\Vcal^{*}$. If $f^{*}:\Vcal^{*}\to\Field$ is the convex conjugate of $f$, we have that $\nabla f^{*}(g)$ is an element in $\Vcal^{**}$ for all $g\in\Vcal^{*}$. Since there exists a natural isomorphism $\Psi : \Vcal\to\Vcal^{**}$ between $\Vcal$ and $\Vcal^{**}$ which is the evaluation map, this in particular gives us $\Psi(\x) = \hat{\x} = \nabla f^{*}\round{\nabla f(\x)}\in\Vcal^{**}$.

Leave a Comment