Convex functions¶
Definition and examples¶
Definition (Convex function) A function whose domain is convex, and which also satisfies
for every \(x, y \in \textbf{dom}(f)\) and \(\theta \in [0, 1]\). If the inequality is strict, we call the function strictly convex.
Concave and strictly concave funtions: If a function \(f\) satisfies the above definition but with the inequality sign going the other way, we call it a (strictly) concave function - the domain still needs to be a convex set. The negative of a concave function is convex and vice versa.
Examples of convex functions in \(\mathbb{R}\):
affine: \(ax + b\), for any \(a, b \in \mathbb{R}\).
exponential: \(e^{\alpha x}\) for any \(\alpha \in \mathbb{R}\).
powers: \(x^\alpha\) on \(\mathbb{R}_{++}\) for \(\alpha \geq 1\) or \(\alpha \leq 0\).
powers of absolute value: \(|x|^p\) for \(p \geq 1\).
Examples of concave functions in \(\mathbb{R}\):
affine: \(ax + b\), for any \(a, b \in \mathbb{R}\).
logarithm: \(\log x\) for \(x \in \mathbb{R}_{++}\).
powers: \(x^\alpha\) on \(\mathbb{R}_{++}\) for \(0 \leq \alpha \leq 1\).
Examples of convex functions in \(\mathbb{R}^n\):
affine: \(a^\top x + b\), for any \(a \in \mathbb{R}^n, b \in \mathbb{R}\).
\(p\)-norm: \(||x||_p = \left[ \sum_{i = 1}^n |x_i|^p\right]^{1/p}\) for \(p \geq 0\).
Examples of convex functions in \(\mathbb{R}^{n \times m}\):
affine: \(\text{Tr}(A^\top X) = \left[\sum_{i = 1}^n \sum_{j = 1}^m A_{ij} X_{ij}\right] + b\)
spectral norm: \(f(X) = ||X||_2 = \sigma_{\text{max}}(X) = \lambda^{1/2}_{\text{max}}(X^\top X)\), where \(\lambda_{\text{max}}(\cdot)\) is the maximum eigenvalue of a matrix.
Restriction to a line¶
Lemma (Restriction to a line) A function \(f\) from \(\mathbb{R}^n\) to \(\mathbb{R}\) is convex if and only if the restriction \(g\) of \(f\) on any line, is also convex
for any \(x \in \textbf{dom}~f, v \in \mathbb{R}^n\).
Example use of restriction: Using the above result we can prove the convexity of quite complicated functions. Take for example the function \(\log \det : \mathbf{S}_{++}^n \to \mathbb{R}\) and restrict it to a line along matrix \(V\). For \(\textbf{dom}~g = \{t~|~X + tV \in \textbf{dom}~f\}\) to be non-empty, \(V\) will have to be symmetric but does not have to be positive-definite. The restricted function \(g\) is
where \(\lambda_i\) is the \(i^\text{th}\) eigenvalue of \(X^{-1/2}VX^{-1/2}\) and \(X^{-1/2}\) exists because \(X\) is positive-definite. The components \(\log (1 + t \lambda_i)\) are all concave regardless of the value of \(\lambda_i\), so \(g\) is concave along any line in \(\mathbf{S}_{++}^n\) and thus \(\log \det\) is also concave.
Extended value extension¶
Lemma (Extended-value extension) The extended-value extension \(\tilde{f}\) of \(f\) is
This definition helps to extend the domain of convex functions to a larger set, for example extending \(f(x) = x^2, \textbf{dom}~f = \mathbb{R}_{+}\) from the positive reals \(\mathbb{R}_{+}\) to the whole real line \(\mathbb{R}\) and still have \(\tilde{f}\) be covex.
Two conditions for convexity¶
Defintion (First-order differentiability) A function \(f\) is differentiable if its domain is open and the gradient
exists at each \(x \in \textbf{dom}~f\).
The following first-order condition for convexity follows from this definition.
Lemma (First-order condition) A differentiable function \(f\) with a convex domain is convex if and only if
Defintion (Second-order differentiability) A function \(f\) is twice differentiable if its domain is open and the hessian
exists at each \(x \in \textbf{dom}~f\).
From this definition, the following second-order condition for convexity follows.
Lemma (Second-order condition) A twice differentiable function \(f\) with convex domain is
Note \(f\) can be stricly convex and still have \(\nabla^2 f = 0\) for some \(x\) - take for example \(f(x) = x^4\).
Examples of convex functions¶
Here are some examples of convex functions. For some, it is easy to see they are convex, while for others it is more challenging.
Quadratic form
Least squares
Quadratic-over-linear
Log-sum-exp or soft-max
Geometric mean
When in doubt, one can plot the values of the function being tested, along randomly chosen lines in the space. This is a useful heuristic which can help eliminate non-convex functions with minimal effort. Since a function is convex if and only if its restriction along any line is also convex, we can be certain the function is not convex if any of its restrictions are not convex. If all the restrictions appear convex, we can then move in and try to prove that the function itself is.
Epigraphs and sublevel sets¶
Defintion (Sublevel set) The \(\boldsymbol{\alpha}\)-sublevel set of \(f\) is
the set of \(x\) for which \(f(x)\) is no larger than \(\alpha\). Sublevel sets of convex functions are convex.
Defintion (Epigraph) The epigraph of \(f : \mathbb{R}^n \to \mathbb{R}\) is
the set of points which are on or above \(f\) in \(\mathbb{R}^{n + 1}\).
Jensen’s inequality¶
The definition of a convex function
is a special case of Jensen’s inequality.
Theorem (Jensen’s inequality) If \(f\) is convex and \(\mathbb{E}\) is the expectation over a random variable \(x\), then
Establishing convexity¶
We have seen two main ways through which we can establish the convexity of a function. The first is to directly verify the definition, for example by restricting to an arbitrary line and showing the restricted function is convex. The second is to show the Hessian is positive semi-definite – provided the function is twice differentiable.
A third way is to show that a convex function can be expressed in terms of operations that preserve convexity. The following is a list of operations which preserve convexity.
Lemma (Nonnegative multiple) \(\alpha f\) is convex if \(f\) is convex and \(\alpha \geq 0\).
Lemma (Sum of convex functions) If \(f_1\) and \(f_2\) are convex, \(f_1 + f_2\) is convex. This extends to infinite sums and integrals.
Lemma (Composition with affine function) \(f(Ax + b)\) is convex if \(f\) is convex.
Lemma (Pointwise maximum) If \(f_1, ..., f_n\) are convex, then the following is convex
Lemma (Pointwise supremum) If \(f(x, y)\) is convex in \(x\) for each \(y \in \mathcal{Y}\), then the following is convex
Lemma (Composition with scalar function) Given functions \(g : \mathbb{R}^n \to \mathbb{R}\) and \(h : \mathbb{R} \to \mathbb{R}\) their composition
is convex if (1) \(g\) is convex, \(h\) is convex and \(\tilde{h}\) is nondecreasing; if (2) \(g\) is concave, \(h\) is convex and \(\tilde{h}\) is nonincreasing.
Lemma (Vector composition) Composition with scalar functions can be extended to composition of vector functions. Suppose Given functions \(g : \mathbb{R}^n \to \mathbb{R}^k\) and \(h : \mathbb{R}^k \to \mathbb{R}\), then
is convex if (1) \(g_i\) are all convex, \(h\) is convex and \(\tilde{h}\) non-decreasing in each argument; (2) \(g_i\) are all concave, \(h\) is convex and \(\tilde{h}\) non-increasing in each argument.
Lemma (Minimisation) If \(f\) is convex in \((x, y)\) and \(C\) is a convex nonempty set, then
is convex in \(x\), provided \(g(x) < - \infty\) for all \(x\).
We provide two alternative proofs from Boyd and Vandenberghe below. The first shows the result, while the second gives additional intuition for a special case involving additional assumptions.
Proof: Minimisation
Proof (a) The first proof confirms Jensen’s inequality. Since \(C\) may be open and the definition involves an infimum, this proof involves an analysis argument. For any \(y_1, y_2 \in C\) we have
where from the first to the second line we have upper-bounded the infimum over \(y\) by inserting \(y = \theta y_1 + (1 - \theta) y_2\) and using the fact that \(C\) is convex - so since \(y_1, y_2 \in C\), we also have \(\theta y_1 + (1 - \theta) y_2\). From the second to the third we have used the convexity of \(f\).
Now, since we have assumed that \(g(x) < - \infty\) for all \(x\), then for any \(\epsilon > 0\) there exist \(y_1, y_2 \in C\) such that \(f(x_i, y_i) \leq g(x_i) + \epsilon\). So for any \(\epsilon > 0\) there exist \(y_1, y_2 \in C\) such that
holds, proving the result. We see that the \(\epsilon\) argument is necessary to handdle the possibility that the infimum over \(y\) may not be attained in \(C\), either because \(C\) is open or because \(f\) is monotonic decreasing along some direction.
In the figure above, \(C\) is the set of points above the curve and \(y^*\) minimises \(f(\theta x_1 + (1 - \theta) x_2, y)\). At the convex combination of \((x_1, y_1)\) and \((x_2, y_2)\), the function \(f\) attains a value larger than the one it attains at \((\theta x_1 + (1 - \theta) x_2, y^*)\). Since this holds for any \(y_1, y_2\), we see that by chosing \(y_1, y_2\) such that \(f(x_i, y_i) \leq g(x_i, y_i) + \epsilon\) for a sequence of vanishing \(\epsilon\), the result must also follow.
Proof (b) We can get some additional intuition, by considering the special case where the infimum over \(y\) is attained in \(C\), for every \(x\). In this case we have
which is a convex set because it is the projection of a convex set on a subset of the axes of the space it belongs.
Lemma (Perspective preserves convexity) If \(f\) is a convex function, then its perspective function \(g\) given by
with domain \(\text{dom}~g = \{(x, t) | x/t \in \text{dom}~f, t > 0\}\) is also convex.
Proof: Perspective preserves convexity
The epigraphs of \(f\) and \(g\) satisfy
Therefore, the epigraph of \(g\) is the inverse image of the epigraph of \(f\) under the perspective function. Since \(\text{epi}~f\) is a convex set, so is its inverse image under the perspective function, so \(\text{epi}~g\) is convex.
Conjugate function¶
Definition (Conjugate function) Given a function \(f : \mathbb{R}^n \to \mathbb{R}\), its conjugate \(f^* : \mathbb{R}^n \to \mathbb{R}\) is defined as
The conjugate \(f^*\) of a function is always convex, even if \(f\) is not convex.
Lemma (Conjugate is convex) Suppose \(f\) is a convex function. Then its conjugate \(f^*\) is convex.
Proof: Conjugate is convex
The conjugate of a function is the pointwise supremum of a set of affine (and thus convex) functions, and it is therefore convex.
Lemma (Convex \(f \implies f^{**} = f\)) Suppose \(f\) is convex. Then the conjugate of its conjugate is the original function
Proof: Convex \(f \implies f^{**} = f\)
Suppose \(f\) is a convex function and define its conjugate
and the conjugate of the conjugate
Note that \(-f^*(y)\) is the largest \(y\)-intercept that a plane with slope \(y\) which underestimates \(f\) can have. Given an in input \(x\), \(f^{**}(x)\) is equal to the supremum of \(x^\top y - f^*(y)\) over \(y\). This is the pointwise supremum of all planes with slope \(y\) and intercept \(-f^*(y)\). Consider a point \((x, t)\) in the epigraph of \(f\). This point is contained in the epigraphs of all affine functions with slope \(y\) and intercept \(-f^*(y)\). Therefore, it is also contained in the epigraph of the pointwise supremum of these affine functions, which is equal to \(f^{**}\), and \(\text{epi}~f \subseteq \text{epi}~f^{**}\). Conversely, if \((x, t)\) is contained in the epigraph \(f^{**}\), then it is contained in the epigraph of the pointwise supremum of all affine underestimators of \(f\) and is therefore also contained in the epigraph of \(f\), so \(\text{epi}~f^{**} \subseteq \text{epi}~f\). Thus \(\text{epi}~f = \text{epi}~f^{**}\) and \(f = f^{**}\).
Quasiconvex functions¶
Definition (Quasiconvex, quasiconcave, quasilinear) A function \(f : \mathbb{R}^n \to \mathbb{R}\) is quasiconvex (or unimodal) if its domain and all sublevel sets are convex. A function \(f\) is quasiconcave if \(-f\) is quasiconvex. If a function is both quasiconvex and quasiconcave, we say it is quasilinear.
Lemma (Jensen’s inequality for quasiconvex functions) A function \(f\) is quasiconvex if
for all \(\theta \in [0, 1]\) and all \(x, y \in \text{dom}~f\).
Proof: Jensen's inequality for quasiconvex functions
Suppose \(f\) is quasiconvex and consider \(x, y \in \text{dom}~f\). Then the sublevel set
is convex and therefore contains all convex combinations of \(x, y\), which is equivalent to the condition
Conversely, suppose that for any \(x, y \in \text{dom}~f\) the above inequality holds. We see that if \(x, y\) are in the sublevel set \(S_{\alpha}\), then any convex combination of \(x, y\) satisfies
so \(\theta x + (1 - \theta) y \in S_{\alpha}\).
As for convex functions, when a function is differentiable, we have conditions for quasiconvexity based on deerivative information. There is a first-order necessary and sufficient condition as well as a second order necessary condition for quasiconvexity.
Lemma (First order condition for quasiconvexity) Suppose \(f\) is differentiable. Then \(f\) is quasiconvex if and only if
Proof: First order condition for quasiconvexity
Suppose the condition
holds and let \(x_1, x_2 \in \text{dom}~f\) be such that \(f(x_1) \leq f(x_2)\). Now, constrain the function \(f\) to the line \(\alpha x_1 + (1 - \alpha) x_2\), and fix \(\theta \in [0, 1]\). If the point \(\theta x_1 + (1 - \theta) x_2\) is not in the sub-\(f(x_2)\) set of \(f\). Then
so there must exist \(\alpha \in [0, 1]\), such that \(\nabla f(\theta x_1 + (1 - \theta) x_2) < 0\)
Since sub-level sets are closed This means that any given sub-level set is the intersection of a set of halfplanes, that is the halfplanes whose boundary is tangent to the boundary of the sublevel set. Th
Suppose \(f\) is quasiconvex and differentiable, and consider \(x_1, x_2 \in \text{dom}~f\) such that \(f(x_1) \leq f(x_2)\). Now, constrain the function \(f\) to the line \(\alpha x_1 + (1 - \alpha) x_2\). If
then there exists sufficiently small \(\alpha > 0\) such that
which means that the level set of \(f\) constrained along the line is disconnected, which contradicts the assumption that \(f\) is quasiconvex. Therefore