Convex sets

Affine sets

Definition (Line) The line through points \(x_1\) and \(x_2\) is the set

\[x = \theta x_1 + (1 - \theta) x_2, \text{ where } \theta \in \mathbb{R}.\]

Definition (Affine set) An affine set is a set which contains the line defined by any two of its points.

Convex sets

Definition (Line segment) The line segment between two points \(x_1\) and \(x_2\) is the set of points

\[\begin{align} x = \theta x_1 + (1 - \theta) x_2, \text{ where } \theta \in [0, 1]. \end{align}\]

Definition (Convex set) A set is convex if it contains the line segments between any two of its points.

Convex combinations and hulls

Definition (Convex combination) A linear combination of points \(x_1, x_2, ..., x_n\) of the form

\[\begin{align} x = \theta_1 x_1 + \theta_2 x_2 + ... + \theta_n x_n, \end{align}\]

is a convex combination if \(\theta_k \geq 0\) for all \(k\) and \(\sum_k \theta_k = 1\).

Definition (Convex hull) The convex hull of a set \(S\), \(\textbf{conv}S\), is the set of all convex combinations of \(S\).

The convex hull of a set \(S\) is the smallest convex set that contains \(S\). This is because any convex superset of \(S\) must contain all convex combinations of elements in \(S\) and by extension \(\textbf{conv}S\) as well.

Cones and convex cones

Definition (Cone) A cone is a set which contains all non-negative multiples of its points

\[\begin{align} x \in C, \theta \geq 0 \implies \theta x \in C. \end{align}\]

Definition (Conic combination) A linear combination of points \(x_1\) and \(x_2\)

\[\begin{align} x = \theta_1 x_1 + \theta_2 x_2, \end{align}\]

is a conic combination if \(\theta_1, \theta_2 \geq 0\).

Definition (Convex cone) A convext cone is a set which contains all conic combinations of its points.

Since the set of conic combinations of a set subsumes the set of convex combinations, such a set is convex as well as a cone.

Definition (Conic hull) The conic hull of a set \(S\) is the smallest set which contains all conic combinations of \(S\).

A conic hull is the smallest convex cone that contains \(S\), by the same argument applied to convex hulls and convex sets.

Hyperplanes and halfspaces

Definition (Hyperplane) A hyperplane is a set of points of the form

\[\begin{align} \{x | a^\top x = b\}, a \neq 0. \end{align}\]

Definition (Halfspace) A halfspace is a set of points of the form

\[\begin{align} \{x | a^\top x \leq b\}, a \neq 0. \end{align}\]

Hyperplanes are affine and convex. Halfspaces are convex but they are not affine.

Euclidian balls and ellipsoids

Definition (Euclidean ball) A Euclidean ball with center \(x_c\) and radius \(r\) is the set

\[\begin{align} B(x_c, r) = \{x~|~||x - x_c||_2 \leq r \} = \{x_c + ru~|~||u||_2 \leq 1 \}. \end{align}\]

Definition (Ellipsoid) An ellipsoid is a set of the form

\[\begin{align} E(x_c, P) = \{x~|~(x - x_c)^\top P (x - x_c) \leq 1\} = \{x_c + Au~|~||u||_2 \leq 1 \}, \end{align}\]

where \(P\) is a symmetric positive-definite and \(A\) is non-singular.

Euclidean balls and ellipsoids are convex sets.

Norm balls and norm cones

Definition (Norm) A norm is a function \(||\cdot||\) that satisfies

  1. \(||x|| \geq 0\) and \(||x|| = 0 \iff x = 0\).

  2. \(||cx|| = |c|~||x||\)

  3. \(||x + y|| \leq ||x|| + ||y||\)

Definition (Norm ball) A norm ball with center \(x_c\) and radius \(r\) is a set of the form

\[\begin{align} \{x~|~||x - x_c|| \leq r\}. \end{align}\]

Definition (Norm cone) A norm cone is a set of the form

\[\begin{align} \{(x, t) ~|~ ||x|| \leq t\}. \end{align}\]

Norm balls and norm cones are convex sets.


Definition (Polyhedron) A polyhedron is a set defined by finitely many linear inequalities and equalities

\[\begin{align} Ax \preccurlyeq b, \text{ and } Cx = d, \end{align}\]

where \(\preccurlyeq\) is the component-wise inequality.

Polyhedra are convex and can be expressed as an intersection of finitely many halfspaces (inequalities) and hyperplances (equalities).

Positive-semidefinite cone

Definition (Positive (semi)definite matrices) We use the symbols \(\mathbf{S}_+^n\) and \(\mathbf{S}_{++}^n\) to denote positive semi-definite and positive definite \(n\times n\) symmetric matrices.

Both \(\mathbf{S}_+^n\) and \(\mathbf{S}_{++}^n\) are convex cones. The difference between these two sets is that \(\mathbf{S}_+^n\) is closed, whereas \(\mathbf{S}_{++}^n\) is open.

Operations that preserve convexity

Lemma (Intersection of convex sets) The intersection of any number of convex sets is convex.

Proof: Intersection of convex sets

To show this, suppose \(S_1, S_2, ...\) is a (finite or infinite) sequence of convex sets whose intersection is \(S\) and \(x, y \in S\), then \(x, y\) are inside every \(S_n\) and so is the line segment between them (by convexity of \(S_n\)). This implies the line segment is itself in \(S\), showing \(S\) is convex.

Lemma (Affine functions and convexity) An affine mapping, \(f(x) = Ax + b\), of a convex set is also convex. Convexity is also preserved under inverses of affine maps, where the inverse map is defined as \(f^{-1}(S)\{x~|~f(x) \in S\}\).

Proof: Affine functions and convexity

To show the first statement, let \(S_1\) be a convex set, \(S_2 = f(S_1)\) and \(x, y \in S_1\). Then the line segment \(\theta x + (1 - \theta) y, \theta \in [0, 1]\) is in \(S_1\) and

\[\begin{align} f\big(\theta x + (1 - \theta) y\big) = \theta Ax + (1 - \theta) Ay + b = \theta f(x) + (1 - \theta) f(y)~\in~f(S_2) \end{align}\]

so \(S_2\) is convex. To show the second statement, let \(S_2\) be a convex set, \(S_1 = f^{-1}(S_2)\) and \(x, y \in S_2\). Then \(f^{-1}(x), f^{-1}(y) \subset S_1\), and we want to show that if \(x^* \in f^{-1}(x), y^* \in f^{-1}(y)\), then the line segment \(\theta x^* + (1 - \theta)y^*\) is also in \(S_1\). By the convexity of \(S_2\), the line segment \(\theta x + (1 - \theta) y\) is in \(S_2\), so:

\[\begin{align} f\Big(\big\{\theta x^* + (1 - \theta)y^* | x^* \in f^{-1}(x), y^* \in f^{-1}(y)\big\}\Big) = \theta x + (1 - \theta) y \in S_2 \end{align}\]

and since the left hand side is in \(S_2\), the argument of \(f\) on the left hand side must be a subset of \(S_1\) (by the definition of \(f^{-1}\)), showing that \(S_1\) is convex.

Definition (Perspective function) A perspective function \(f : \{x~|~ x \in \mathbb{R}^{n} \text{ and } x_n > 0\} \to \mathbb{R}^{n - 1}\) is the mapping

\[\begin{align} f(x) = \left(\frac{x_1}{x_{n}}, \frac{x_2}{x_{n}}, ..., \frac{x_{n-1}}{x_{n}} \right), \end{align}\]

and it is a convex function.

The perspective function can be interpreted as a projection of a point in \(\mathbb{R}^{n}\) to the image plane of a pinhole camera. It is also closely related to homogeneous coordinates, used in graphics and computer vision - for further reading see Roberto Cipolla’s notes on Computer Vision.

Definition (Linear-fractional function) A perspective function \(f : \mathbb{R}^{n} \to \mathbb{R}^m\) is a mapping of the form

\[\begin{align} f(x) = \left(\frac{x_1}{x_{n}}, \frac{x_2}{x_{n}}, ..., \frac{x_{n-1}}{x_{n}} \right), \end{align}\]

and is also a convex function. It is a generalisation of the perspective funtion, and equal to it if \(c^\top = [0, 0, ..., 1]\) and \(d = 0\).

Proper cones, generalised inequalities, minimum and minimal elements

Definition (Proper cone) A convex cone is a proper cone if it is

  1. Closed (contains its boundary),

  2. Solid (it has non-zero interior),

  3. Pointed (does not contain a line).

Proper cones are used to define another useful concept, the generalised inequality.

Definition (Generalised inequality) The generalised inequalities \(\preccurlyeq_K\) and \(\prec_K\) with respect to a proper cone \(K\) are defined as

\[\begin{split}\begin{align} x \preccurlyeq_K y &\implies y - x \in K \\ x \prec_K y &\implies y - x \in \textbf{ int}(K). \end{align}\end{split}\]

In general, we might have both \(x \not \preccurlyeq_K y\) and \(x \not \succcurlyeq_K y\) simultaneously, if \(x\) and \(y\) are not comparable under the inequality. For example, comparing the points \(x = (1, 0)\) and \(y = (-1, 1)\) using the inequality \(\preccurlyeq_{\mathbb{R}^2_{+}}\) (where \(\mathbb{R}^2_{+}\) is the positive quadrant of \(\mathbb{R}^2\)), we see that neither \(x \preccurlyeq_{\mathbb{R}^2_{+}} y\) nor \(x \succcurlyeq_{\mathbb{R}^2_{+}} y\) hold, because neither \(y - x\) nor \(x - y\) are in \(\mathbb{R}^2_{+}\). This inequality is therefore a partial ordering, and we need to distinguish between different kinds of “minimum” points.

Definition (Minimum element) We call \(x\) the minimum element of \(S\) if it satisfies

\[\begin{align} y \in S \implies x \preccurlyeq_K y. \end{align}\]

So if \(y \in S\) then \(y\) is no greater than \(x\). Thus \(x\) can be compared to all other points \(y \in S\) and inequality can be unambiguously decided.

Definition (Minimal element) A minimal element \(x\) of \(S\) satisfies

\[\begin{align} y \in S, y \preccurlyeq_K x \implies x = y. \end{align}\]

This means that if there is another element \(y\) in \(S\) such that it can be compared to \(x\) and is in fact smaller than or equal to \(x\), then \(x = y\). The reason we define minimal elements in addition to minimum elements, is to deal with cases where not all points can be compared to each other, like the \(\mathbb{R}^2_{+}\) example above. In such cases, there may not be a minimum point but there will be minimal points.

Separating hyperplane theorem

The separating hyperplane theorem is very useful for proving other results about convex sets and functions. Before proving it, we will state and prove a weaker version of it which will be used in its proof.

Lemma (Weaker separating hyperplane theorem) Suppose \(C\) and \(D\) are convex and disjoint sets, such that their infimum distance is realised, that is there exist \(d \in D, c \in C\) such that

\[\begin{align} ||c - d||_2 = \inf_{x \in C, y \in D} ||x - y||_2. \end{align}\]

Then there exist \(a \neq 0\) and \(b\) such that

\[\begin{split}\begin{align} &a^\top x \leq b \text{ for } x \in C \\ \text{ and } &a^\top x \geq b \text{ for } x \in D. \end{align}\end{split}\]

Proof: Weaker separating hyperplane theorem

Suppose \(C\) and \(D\) are convex and disjoint sets, such that the infimum distance between them is realised, for points \(c \in C\) and \(d \in D\). Then consider the hyperplane defined as

\[\begin{align} H = \{x | a^\top x + b \geq 0\}, \text{ where } a = (d - c), b = \frac{||c||_2^2 - ||d||_2^2}{2}, \end{align}\]

that is the plane with normal \((d - c)\) which passes through \(\frac{c + d}{2}\). Now if there exists \(d' \in D\) such that \(d' \not \in H\), then it must contain all convex combinations of \(d\) and \(d'\). Thus for some small enough \(\epsilon > 0\) we have

\[\begin{align} || \epsilon d' + (1 - \epsilon) d - c||_2 < ||d - c||_2, \end{align}\]

which contradicts the assumption that \(c \in C\) and \(d \in D\) are the points which minimise \(||d - c||_2\). Thus there cannot exist such \(d'\), and \(H\) contains \(D\). Repeating the same argument, we see that

\[\begin{align} H' = \{x | a^\top x + b \leq 0\}, \end{align}\]

contains \(C\). Therefore, the hyperplane \(a^\top x + b = 0\) separates \(C\) and \(D\).

Now we can state and prove the full separating hyperplane theorem. The separating hyperplane theorem applies to any convex sets with disjoint interiors, not just disjoint sets where the infimum distance is realised. The proof of this theorem uses the previous, more limited version of the supporting hyperplane theorem.

Theorem (Separating hyperplane theorem) If \(C\) and \(D\) are convex sets with disjoint interiors, there exist \(a \neq 0\) and \(b\) such that

\[\begin{split}\begin{align} &a^\top x \leq b \text{ for } x \in C \\ \text{ and } &a^\top x \geq b \text{ for } x \in D. \end{align}\end{split}\]

Proof: Separating hyperplane theorem

First, we will assume the sets \(C\) and \(D\) to be closed. If the separating hyperplane theorem holds for the closure of \(C\) and \(D\), then it also holds for \(C\) and \(D\). Consider the set difference between \(C\) and \(D\) defined as

\[\begin{align} S = C - D = \{c - d | c \in C, d \in D\}. \end{align}\]

The set \(S\) is a convex set. In order to show the result, it suffices to show that there exists a separating hyperplane between \(S\) and \(\{0\}\). If \(S\) does not contain \(0\), then from the previous lemma we can find a hyperplane separating \(\{0\}\) and \(S\), which proves the result.

Now consider the case where \(S\) contains \(0\). We will show that \(0\) must be on the boundary of \(S\). If \(0\) is not on the boundary of \(S\), then there exists \(\epsilon > 0\) such that the \(\epsilon\)-ball centered at \(0\), is contained in \(S\). This implies that \(C\) and \(D\) do not have disjoint interiors, leading to a contradiction. Thus if \(0\) is contained in \(S\), it must be a boundary point.

Now, consider the \(\epsilon\)-constraint of \(S\) defined as

\[\begin{align} S_{-\epsilon} = \{x | B_\epsilon(x) \in S\}. \end{align}\]

Since \(S_{-\epsilon}\) and \(\{0\}\) are disjoint, there exists a hyperplane which separates them. Then, for any sequence of \(\epsilon_n \to 0\), there exists a sequence \(a_n, b_n\) where \(a_n \in \mathbb{R}^N\) and \(b_n \in \mathbb{R}\) such that

\[\begin{align} S_{-\epsilon} \subseteq \{x | a_n^\top x - b_n \geq 0\}. \end{align}\]

Without loss of generality we can take \(a_n\) and \(b_n\) to satisfy

\[\begin{align} ||a_n||^2_2 + b_n^2 = 1, \end{align}\]

by re-scaling them. Since \((a_n, b_n) \in \mathbb{R}^{N + 1}\) is a bounded sequence, it must have a convergent subsequence, by Weierstrass’s theorem. Thus the limiting point, \((a, b)\), of this subsequence contains \(S_{-\epsilon}\) for all \(\epsilon > 0\), and thus also contains \(\epsilon\). This proves the result.

In other words, the hyperplane defined by \(a, b\) separates the two sets. Strict inequality would require further restrictions: consider the case where \(C\) is an open (does not contain its boundary) convex set and \(D\) is a point on the boundary of \(C\); strict equality \(a^\top x > b \text{ for } x \in D\) would not hold.

Supporting hyperplane theorem

Definition (Supporting hyperplane theorem) A supporting hyperplane to \(C\), at an \(x_0\) on the boundary of \(C\), is the set

\[\begin{align} \{x | a^\top x = a^\top x_0\}, \end{align}\]

where \(a \neq 0\) and \(a^\top x \leq a^\top x_0\) for \(x \in C\).

The following theorem says that a convex set always has a supporting hyperplane at each of its boundary points.

Theorem (Supporting hyperplane theorem) If \(C\) is a convex set, then there exists a supporting hyperplane to \(C\) at every one of its boundary points.

Proof: Supporting hyperplane theorem

The supporting hyperplane theorem can be proved by using the separating hyperplane theorem. If \(x_0\) is a boundary point of \(C\) we can define \(D = \{x_0\}\), and apply the separating hyperplane theorem to show that a separating hyperplane exists, which is also supporting because \(x_0\) is on the boundary of \(C\).

Dual cones and inequalities

Definition (Dual cone) The dual cone of a cone \(K\) is defined as

\[\begin{align} K^* = \{z~|~ z^\top x \geq 0 \text{ for all } x \in K\}, \end{align}\]

and satisfies the useful properties:

  • \(K^*\) is closed and convex.

  • If \(K_1 \subseteq K_2\) then \(K_2^* \subseteq K_1^*\).

  • If \(K\) has nonempty interior, then \(K^*\) is pointed.

  • If the closure of \(K\) is pointed then \(K^*\) has non-empty interior.

  • \(K^{**}\) is the closure of the convex hull of \(K\). If \(K\) is closed and convex to begin with, then \(K = K^{**}\).

Lemma (Dual generalised inequalities) The dual cones of proper cones are proper, and can also be used to define generalised inequalities

\[\begin{split}\begin{align} x \preccurlyeq_{K^*} y &\implies y - x \in K^* \\ x \prec_{K^*} y &\implies y - x \in \textbf{ int}(K^*). \end{align}\end{split}\]

Two important properties that relate generalised inequalities and their duals are

  • \(x \preccurlyeq_K y \iff \lambda^\top x \leq \lambda^\top y\) for all \(\lambda \succcurlyeq_{K*} 0\),

  • \(x \prec_K y \iff \lambda^\top x < \lambda^\top y\) for all \(\lambda \succcurlyeq_{K*} 0, \lambda \neq 0\),

and since \(K = K^{**}\) for proper cones, the above properties also hold if \(K^*\) and \(K^{**}\) are swapped.

Minimum and minimal elements

We have the following conditions on minimum and minimal elements based on dual inequalities.

Lemma (Minimum elements and generalised inqualities) An element \(x \in S\) is the minimum element of \(S\) if and only if \(x\) is the unique minimiser of \(\lambda^\top z, z \in S\) for all \(\lambda \succ_{K^*} 0\).

Lemma (Minimal element) An element \(x \in S\) is a minimal element of \(S\) if \(x\) minimises \(\lambda^\top z, z \in S\) for some \(\lambda \succcurlyeq_{K^*} 0\).

The converse of the above statement is generally not true. If \(x\) is a minimal point, it is not true in general that there exists \(\lambda \succ_{K^*} 0\), such that \(x\) minimises \(\lambda^\top z, z \in S\). However, if \(S\) is convex, then a necessary and sufficient condition exists.

Lemma (Minimal element of a convex set) An element \(x\) of a convex set \(S\), is minimal if and only if there exists \(\lambda \succcurlyeq_{K^*} 0\) such that \(x\) minimises \(\lambda^\top z, z \in S\).

For the statements on minimal elements, the strict inequality \(\lambda \succ_{K^*} 0\) plays an important role. If \(\lambda \succcurlyeq_{K^*} 0\) was used in the statements, we can easily find counter-examples.