Convex functions

Definition and examples

Definition (Convex function) A function whose domain is convex, and which also satisfies

\[f\big(\theta x + (1 - \theta) y\big) \leq \theta f(x) + (1 - \theta) f(y),\]

for every \(x, y \in \textbf{dom}(f)\) and \(\theta \in [0, 1]\). If the inequality is strict, we call the function strictly convex.

Concave and strictly concave funtions: If a function \(f\) satisfies the above definition but with the inequality sign going the other way, we call it a (strictly) concave function - the domain still needs to be a convex set. The negative of a concave function is convex and vice versa.

Examples of convex functions in \(\mathbb{R}\):

  • affine: \(ax + b\), for any \(a, b \in \mathbb{R}\).

  • exponential: \(e^{\alpha x}\) for any \(\alpha \in \mathbb{R}\).

  • powers: \(x^\alpha\) on \(\mathbb{R}_{++}\) for \(\alpha \geq 1\) or \(\alpha \leq 0\).

  • powers of absolute value: \(|x|^p\) for \(p \geq 1\).

Examples of concave functions in \(\mathbb{R}\):

  • affine: \(ax + b\), for any \(a, b \in \mathbb{R}\).

  • logarithm: \(\log x\) for \(x \in \mathbb{R}_{++}\).

  • powers: \(x^\alpha\) on \(\mathbb{R}_{++}\) for \(0 \leq \alpha \leq 1\).

Examples of convex functions in \(\mathbb{R}^n\):

  • affine: \(a^\top x + b\), for any \(a \in \mathbb{R}^n, b \in \mathbb{R}\).

  • \(p\)-norm: \(||x||_p = \left[ \sum_{i = 1}^n |x_i|^p\right]^{1/p}\) for \(p \geq 0\).

Examples of convex functions in \(\mathbb{R}^{n \times m}\):

  • affine: \(\text{Tr}(A^\top X) = \left[\sum_{i = 1}^n \sum_{j = 1}^m A_{ij} X_{ij}\right] + b\)

  • spectral norm: \(f(X) = ||X||_2 = \sigma_{\text{max}}(X) = \lambda^{1/2}_{\text{max}}(X^\top X)\), where \(\lambda_{\text{max}}(\cdot)\) is the maximum eigenvalue of a matrix.

Restriction to a line

Lemma (Restriction to a line) A function \(f\) from \(\mathbb{R}^n\) to \(\mathbb{R}\) is convex if and only if the restriction \(g\) of \(f\) on any line, is also convex

\[\begin{align} g(t) = f(x + tv),~~~\textbf{dom}~g = \{t~|~x + tv \in \textbf{dom}~f\}, \end{align}\]

for any \(x \in \textbf{dom}~f, v \in \mathbb{R}^n\).

Example use of restriction: Using the above result we can prove the convexity of quite complicated functions. Take for example the function \(\log \det : \mathbf{S}_{++}^n \to \mathbb{R}\) and restrict it to a line along matrix \(V\). For \(\textbf{dom}~g = \{t~|~X + tV \in \textbf{dom}~f\}\) to be non-empty, \(V\) will have to be symmetric but does not have to be positive-definite. The restricted function \(g\) is

\[\begin{split}\begin{align} g(t) = \log \det (X + tV) &= \log \det X + \log \det (I + tX^{-1/2}VX^{-1/2})\\ &= \log \det X + \sum_{i = 1}^n \log (1 + t \lambda_i) \end{align}\end{split}\]

where \(\lambda_i\) is the \(i^\text{th}\) eigenvalue of \(X^{-1/2}VX^{-1/2}\) and \(X^{-1/2}\) exists because \(X\) is positive-definite. The components \(\log (1 + t \lambda_i)\) are all concave regardless of the value of \(\lambda_i\), so \(g\) is concave along any line in \(\mathbf{S}_{++}^n\) and thus \(\log \det\) is also concave.

Extended value extension

Lemma (Extended-value extension) The extended-value extension \(\tilde{f}\) of \(f\) is

\[\begin{align} \tilde{f}(x) = f(x) \text{ if } x\in \textbf{dom}~f, \text{ otherwise } \infty. \end{align}\]

This definition helps to extend the domain of convex functions to a larger set, for example extending \(f(x) = x^2, \textbf{dom}~f = \mathbb{R}_{+}\) from the positive reals \(\mathbb{R}_{+}\) to the whole real line \(\mathbb{R}\) and still have \(\tilde{f}\) be covex.

Two conditions for convexity

Defintion (First-order differentiability) A function \(f\) is differentiable if its domain is open and the gradient

\[\begin{align} \nabla f(x) = \left(\frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}, ..., \frac{\partial f}{\partial x_n} \right) \end{align}\]

exists at each \(x \in \textbf{dom}~f\).

The following first-order condition for convexity follows from this definition.

Lemma (First-order condition) A differentiable function \(f\) with a convex domain is convex if and only if

\[\begin{align} f(x) \geq f(x_0) + \nabla f(x_0)^\top (x - x_0), \text{ for all } x_0, x \in \textbf{dom}~f. \end{align}\]

Defintion (Second-order differentiability) A function \(f\) is twice differentiable if its domain is open and the hessian

\[\begin{align} \nabla^2 f(x) \in \mathbf{S}^n, \nabla^2 f(x)_{ij} = \frac{\partial^2f(x)}{\partial x_i \partial x_j},~i, j = 1, ..., n, \end{align}\]

exists at each \(x \in \textbf{dom}~f\).

From this definition, the following second-order condition for convexity follows.

Lemma (Second-order condition) A twice differentiable function \(f\) with convex domain is

\[\begin{split}\begin{align} \text{convex if and only if }\nabla^2 f &\succcurlyeq 0 \text{ for all } x \in \textbf{dom} f\\ \text{strictly convex if }\nabla^2 f &\succ 0 \text{ for all } x \in \textbf{dom} f. \end{align}\end{split}\]

Note \(f\) can be stricly convex and still have \(\nabla^2 f = 0\) for some \(x\) - take for example \(f(x) = x^4\).

Examples of convex functions

Here are some examples of convex functions. For some, it is easy to see they are convex, while for others it is more challenging.

Quadratic form

\[\begin{align} f(x) = \frac{1}{2} x^\top P x + q^\top x + r \text{ for } P \succ 0. \end{align}\]

Least squares

\[\begin{align} f(x) = ||Ax - b||_2^2 \text{ for any } A, b. \end{align}\]


\[\begin{align} f(x, y) = \frac{x^2}{y} \text{ for } y > 0. \end{align}\]

Log-sum-exp or soft-max

\[\begin{align} f(x) = \log \sum_{n = 1}^N e^{x_k}. \end{align}\]

Geometric mean

\[\begin{align} f(x) = \left(\prod_{i = 1}^n x_i\right)^{1/n} \text{ where } x > 0. \end{align}\]

When in doubt, one can plot the values of the function being tested, along randomly chosen lines in the space. This is a useful heuristic which can help eliminate non-convex functions with minimal effort. Since a function is convex if and only if its restriction along any line is also convex, we can be certain the function is not convex if any of its restrictions are not convex. If all the restrictions appear convex, we can then move in and try to prove that the function itself is.

Epigraphs and sublevel sets

Defintion (Sublevel set) The \(\boldsymbol{\alpha}\)-sublevel set of \(f\) is

\[\begin{align} C_\alpha = \{x \in \textbf{dom}~f~|~f(x) \leq \alpha \}, \end{align}\]

the set of \(x\) for which \(f(x)\) is no larger than \(\alpha\). Sublevel sets of convex functions are convex.

Defintion (Epigraph) The epigraph of \(f : \mathbb{R}^n \to \mathbb{R}\) is

\[\begin{align} \textbf{epi}~f = \{(x, t) \in \mathbb{R}^{n + 1} : x \in \textbf{dom}~f, f(x \leq t)\}, \end{align}\]

the set of points which are on or above \(f\) in \(\mathbb{R}^{n + 1}\).

Jensen’s inequality

The definition of a convex function

\[\begin{align} f\big(\theta x + (1 - \theta)y\big) \leq \theta f(x) + (1 - \theta)f(y), \end{align}\]

is a special case of Jensen’s inequality.

Theorem (Jensen’s inequality) If \(f\) is convex and \(\mathbb{E}\) is the expectation over a random variable \(x\), then

\[\begin{align} f\big(\mathbb{E}[x]\big) \leq \mathbb{E}\big[f(x)\big]. \end{align}\]

Establishing convexity

We have seen two main ways through which we can establish the convexity of a function. The first is to directly verify the definition, for example by restricting to an arbitrary line and showing the restricted function is convex. The second is to show the Hessian is positive semi-definite – provided the function is twice differentiable.

A third way is to show that a convex function can be expressed in terms of operations that preserve convexity. The following is a list of operations which preserve convexity.

Lemma (Nonnegative multiple) \(\alpha f\) is convex if \(f\) is convex and \(\alpha \geq 0\).

Lemma (Sum of convex functions) If \(f_1\) and \(f_2\) are convex, \(f_1 + f_2\) is convex. This extends to infinite sums and integrals.

Lemma (Composition with affine function) \(f(Ax + b)\) is convex if \(f\) is convex.

Lemma (Pointwise maximum) If \(f_1, ..., f_n\) are convex, then the following is convex

\[\begin{align} f(x) = \max \{f_1(x), f_2(x), ..., f_n(x)\}. \end{align}\]

Lemma (Pointwise supremum) If \(f(x, y)\) is convex in \(x\) for each \(y \in \mathcal{Y}\), then the following is convex

\[\begin{align} \DeclareMathOperator*{\sup}{sup} g(x) = \sup_{y \in \mathcal{Y}}f(x, y). \end{align}\]

Lemma (Composition with scalar function) Given functions \(g : \mathbb{R}^n \to \mathbb{R}\) and \(h : \mathbb{R} \to \mathbb{R}\) their composition

\[\begin{align} f(x) = h(g(x)), \end{align}\]

is convex if (1) \(g\) is convex, \(h\) is convex and \(\tilde{h}\) is nondecreasing; if (2) \(g\) is concave, \(h\) is convex and \(\tilde{h}\) is nonincreasing.

Lemma (Vector composition) Composition with scalar functions can be extended to composition of vector functions. Suppose Given functions \(g : \mathbb{R}^n \to \mathbb{R}^k\) and \(h : \mathbb{R}^k \to \mathbb{R}\), then

\[\begin{align} f(x) = h\big(g(x)\big) = h\big(g_1(x), g_2(x), ..., g_k(x)\big), \end{align}\]

is convex if (1) \(g_i\) are all convex, \(h\) is convex and \(\tilde{h}\) non-decreasing in each argument; (2) \(g_i\) are all concave, \(h\) is convex and \(\tilde{h}\) non-increasing in each argument.

Lemma (Minimisation) If \(f\) is convex in \((x, y)\) and \(C\) is a convex nonempty set, then

\[\begin{align} g(x) = \inf_{y \in C} f(x, y) \end{align}\]

is convex in \(x\), provided \(g(x) < - \infty\) for all \(x\).

We provide two alternative proofs from Boyd and Vandenberghe below. The first shows the result, while the second gives additional intuition for a special case involving additional assumptions.

Proof: Minimisation

Proof (a) The first proof confirms Jensen’s inequality. Since \(C\) may be open and the definition involves an infimum, this proof involves an analysis argument. For any \(y_1, y_2 \in C\) we have

\[\begin{split}\begin{align} g(\theta x_1 + (1 - \theta) x_2) &= \inf_{y \in C} f(\theta x_1 + (1 - \theta) x_2, y) \\ &\leq f(\theta x_1 + (1 - \theta) x_2, \theta y_1 + (1 - \theta) y_2) \\ &\leq \theta f(x_1, y_1) + (1 - \theta) f(x_2, y_2) \end{align}\end{split}\]

where from the first to the second line we have upper-bounded the infimum over \(y\) by inserting \(y = \theta y_1 + (1 - \theta) y_2\) and using the fact that \(C\) is convex - so since \(y_1, y_2 \in C\), we also have \(\theta y_1 + (1 - \theta) y_2\). From the second to the third we have used the convexity of \(f\).

Now, since we have assumed that \(g(x) < - \infty\) for all \(x\), then for any \(\epsilon > 0\) there exist \(y_1, y_2 \in C\) such that \(f(x_i, y_i) \leq g(x_i) + \epsilon\). So for any \(\epsilon > 0\) there exist \(y_1, y_2 \in C\) such that

\[\begin{align} g(\theta x_1 + (1 - \theta) x_2) &\leq \theta g(x_1) + (1 - \theta) g(x_2) + \epsilon \end{align}\]

holds, proving the result. We see that the \(\epsilon\) argument is necessary to handdle the possibility that the infimum over \(y\) may not be attained in \(C\), either because \(C\) is open or because \(f\) is monotonic decreasing along some direction.


In the figure above, \(C\) is the set of points above the curve and \(y^*\) minimises \(f(\theta x_1 + (1 - \theta) x_2, y)\). At the convex combination of \((x_1, y_1)\) and \((x_2, y_2)\), the function \(f\) attains a value larger than the one it attains at \((\theta x_1 + (1 - \theta) x_2, y^*)\). Since this holds for any \(y_1, y_2\), we see that by chosing \(y_1, y_2\) such that \(f(x_i, y_i) \leq g(x_i, y_i) + \epsilon\) for a sequence of vanishing \(\epsilon\), the result must also follow.

Proof (b) We can get some additional intuition, by considering the special case where the infimum over \(y\) is attained in \(C\), for every \(x\). In this case we have

\[\begin{align} \text{epi}~g = \{(x, t)~|~(x, y, t) \in \text{epi}~f \text{ for some } y \in C\}, \end{align}\]

which is a convex set because it is the projection of a convex set on a subset of the axes of the space it belongs.

Lemma (Perspective preserves convexity) If \(f\) is a convex function, then its perspective function \(g\) given by

\[\begin{align} g(x, t) = tf\left(\frac{x}{t}\right), \end{align}\]

with domain \(\text{dom}~g = \{(x, t) | x/t \in \text{dom}~f, t > 0\}\) is also convex.

Proof: Perspective preserves convexity

The epigraphs of \(f\) and \(g\) satisfy

\[\begin{split}\begin{align} (x, t, y) \in \text{epi}~g &\doublearrow tf\left(\frac{x}{t}\right) \leq y \\ &\doublearrow f\left(\frac{x}{t}\right) \leq \frac{y}{t} \\ &\left(\frac{x}{t}, \frac{y}{t}\right) \in \text{epi}~f. \end{align}\end{split}\]
\[\begin{split}\begin{align} (\frac{x}{t}, \frac{y}{t}\right) \in \text{epi}~f &\doublearrow f\left(\frac{x}{t}\right) \leq \frac{y}{t} \\ &\doublearrow tf\left(\frac{x}{t}\right) \leq y \\ &\doublearrow (x, t, y) \in \text{epi}~g \\ \end{align}\end{split}\]

Therefore, the epigraph of \(g\) is the inverse image of the epigraph of \(f\) under the perspective function. Since \(\text{epi}~f\) is a convex set, so is its inverse image under the perspective function, so \(\text{epi}~g\) is convex.

Conjugate function

Definition (Conjugate function) Given a function \(f : \mathbb{R}^n \to \mathbb{R}\), its conjugate \(f^* : \mathbb{R}^n \to \mathbb{R}\) is defined as

\[\begin{align} f^*(y) = \sup_{x \in \text{dom}~f} \left(y^\top x - f(x)\right). \end{align}\]

The conjugate \(f^*\) of a function is always convex, even if \(f\) is not convex.

Lemma (Conjugate is convex) Suppose \(f\) is a convex function. Then its conjugate \(f^*\) is convex.

Proof: Conjugate is convex

The conjugate of a function is the pointwise supremum of a set of affine (and thus convex) functions, and it is therefore convex.

Lemma (Convex \(f \implies f^{**} = f\)) Suppose \(f\) is convex. Then the conjugate of its conjugate is the original function

\[\begin{align} f^{**} = f \end{align}\]

Proof: Convex \(f \implies f^{**} = f\)

Suppose \(f\) is a convex function and define its conjugate

\[\begin{align} f^*(y) = \sup_{x \in \text{dom}~f} \left(y^\top x - f(x)\right), \end{align}\]

and the conjugate of the conjugate

\[\begin{align} f^{**}(x) = \sup_{y \in \text{dom}~f^{*}} \left(x^\top y - f^*(y)\right). \end{align}\]

Note that \(-f^*(y)\) is the largest \(y\)-intercept that a plane with slope \(y\) which underestimates \(f\) can have. Given an in input \(x\), \(f^{**}(x)\) is equal to the supremum of \(x^\top y - f^*(y)\) over \(y\). This is the pointwise supremum of all planes with slope \(y\) and intercept \(-f^*(y)\). Consider a point \((x, t)\) in the epigraph of \(f\). This point is contained in the epigraphs of all affine functions with slope \(y\) and intercept \(-f^*(y)\). Therefore, it is also contained in the epigraph of the pointwise supremum of these affine functions, which is equal to \(f^{**}\), and \(\text{epi}~f \subseteq \text{epi}~f^{**}\). Conversely, if \((x, t)\) is contained in the epigraph \(f^{**}\), then it is contained in the epigraph of the pointwise supremum of all affine underestimators of \(f\) and is therefore also contained in the epigraph of \(f\), so \(\text{epi}~f^{**} \subseteq \text{epi}~f\). Thus \(\text{epi}~f = \text{epi}~f^{**}\) and \(f = f^{**}\).

Quasiconvex functions

Definition (Quasiconvex, quasiconcave, quasilinear) A function \(f : \mathbb{R}^n \to \mathbb{R}\) is quasiconvex (or unimodal) if its domain and all sublevel sets are convex. A function \(f\) is quasiconcave if \(-f\) is quasiconvex. If a function is both quasiconvex and quasiconcave, we say it is quasilinear.

Lemma (Jensen’s inequality for quasiconvex functions) A function \(f\) is quasiconvex if

\[\begin{align} f\left(\theta x + (1 - \theta) y\right) = \max \left\{f(x), f(y)\right\}, \end{align}\]

for all \(\theta \in [0, 1]\) and all \(x, y \in \text{dom}~f\).

Proof: Jensen's inequality for quasiconvex functions

Suppose \(f\) is quasiconvex and consider \(x, y \in \text{dom}~f\). Then the sublevel set

\[\begin{align} S_{\alpha} = \left\{z | z \leq \alpha\right\}, \text{ where } \alpha = \max \left\{f(x), f(y)\right\}, \end{align}\]

is convex and therefore contains all convex combinations of \(x, y\), which is equivalent to the condition

\[\begin{align} f\left(\theta x + (1 - \theta) y\right) = \max \left\{f(x), f(y)\right\}. \end{align}\]

Conversely, suppose that for any \(x, y \in \text{dom}~f\) the above inequality holds. We see that if \(x, y\) are in the sublevel set \(S_{\alpha}\), then any convex combination of \(x, y\) satisfies

\[\begin{align} f\left(\theta x + (1 - \theta) y\right) = \max \left\{f(x), f(y)\right\} \leq \alpha \end{align}\]

so \(\theta x + (1 - \theta) y \in S_{\alpha}\).

As for convex functions, when a function is differentiable, we have conditions for quasiconvexity based on deerivative information. There is a first-order necessary and sufficient condition as well as a second order necessary condition for quasiconvexity.

Lemma (First order condition for quasiconvexity) Suppose \(f\) is differentiable. Then \(f\) is quasiconvex if and only if

\[\begin{align} f(x_1) \leq f(x_2) \implies \nabla f(x_2)^\top (x_2 - x_1) \leq 0. \end{align}\]

Proof: First order condition for quasiconvexity

Suppose the condition

\[\begin{align} f(x_1) \leq f(x_2) \implies \nabla f(x_2)^\top (x_2 - x_1) \leq 0 \end{align}\]

holds and let \(x_1, x_2 \in \text{dom}~f\) be such that \(f(x_1) \leq f(x_2)\). Now, constrain the function \(f\) to the line \(\alpha x_1 + (1 - \alpha) x_2\), and fix \(\theta \in [0, 1]\). If the point \(\theta x_1 + (1 - \theta) x_2\) is not in the sub-\(f(x_2)\) set of \(f\). Then

\[\begin{align} f(\theta x_1 + (1 - \theta) x_2) > f(x_2), \end{align}\]

so there must exist \(\alpha \in [0, 1]\), such that \(\nabla f(\theta x_1 + (1 - \theta) x_2) < 0\)

Since sub-level sets are closed This means that any given sub-level set is the intersection of a set of halfplanes, that is the halfplanes whose boundary is tangent to the boundary of the sublevel set. Th

Suppose \(f\) is quasiconvex and differentiable, and consider \(x_1, x_2 \in \text{dom}~f\) such that \(f(x_1) \leq f(x_2)\). Now, constrain the function \(f\) to the line \(\alpha x_1 + (1 - \alpha) x_2\). If

\[\begin{align} \nabla f(x_2)^\top (x_2 - x_1) > 0 \end{align}\]

then there exists sufficiently small \(\alpha > 0\) such that

\[\begin{align} f(\alpha x_1 + (1 - \alpha) x_2) > 0 \end{align}\]

which means that the level set of \(f\) constrained along the line is disconnected, which contradicts the assumption that \(f\) is quasiconvex. Therefore

\[\begin{align} f(x_1) \leq f(x_2) \implies \nabla f(x_2)^\top (x_2 - x_1) \leq 0. \end{align}\]