Convex Conditions for Strong Convexity

An important concept in online learning and convex optimization is that of strong convexity: a twice-differentiable function f is said to be strongly convex with respect to a norm \|\cdot\| if

z^T\frac{\partial^2 f}{\partial x^2}z \geq \|z\|^2

for all z (for functions that are not twice-differentiable, there is an analogous criterion in terms of the Bregman divergence). To check strong convexity, then, we basically need to check a condition on the Hessian, namely that z^THz \geq \|z\|^2. So, under what conditions does this hold?

For the l^2 norm, the answer is easy: z^THz \geq \|z\|_2^2 if and only if H \succeq I (i.e., H-I is positive semidefinite). This can be shown in many ways, perhaps the easiest is by noting that z^THz-\|z\|_2^2 = z^T(H-I)z.

For the l^{\infty} norm, the answer is a bit trickier but still not too complicated. Recall that we want necessary and sufficient conditions under which z^THz \geq \|z\|_{\infty}^2. Note that this is equivalent to asking that z^THz \geq (z_i)^2 for each coordinate i of z, which in turn is equivalent to H \succeq e_ie_i^T for each coordinate vector e_i (these are the vectors that are 1 in the ith coordinate and 0 everywhere else).

More generally, for any norm \|\cdot\|, there exists a dual norm \|\cdot\|_* which satisfies, among other properties, the relationship \|z\| = \sup_{\|w\|_* = 1} w^Tz. So, in general, z^THz \geq \|z\|^2 is equivalent to asking that z^THz \geq (w^Tz)^2 for all w with \|w\|_* = 1. But this is in turn equivalent to asking that

H \succeq ww^T for all w such that \|w\|_* = 1.

In fact, it suffices to pick a subset of the w such that the convex hull consists of all w with \|w\|_* \leq 1; this is why we were able to obtain such a clean formulation in the l^{\infty} case: the dual norm to l^{\infty} is l^1, whose unit ball is the simplex, which is a polytope with only 2n vertices (namely, each of the signed unit vectors \pm e_i).

We can also derive a simple (but computationally expensive) criterion for l^1 strong convexity: here the dual norm is l^{\infty}, whose unit ball is the n-dimensional hypercube, with vertices given by all 2^n vectors of the form [ \pm 1 \ \cdots \ \pm 1]. Thus z^THz \geq \|z\|_1^2 if and only if H \succeq ss^T for all 2^n sign vectors s.

Finally, we re-examine the l^2 case; even though the l^2-ball is not a polytope, we were still able to obtain a very simple expression. This was because the condition H \succeq I manages to capture simultaneously all dual vectors such that w^Tw \leq 1. We thus have the general criterion:

Theorem. H \succeq M_jM_j^T for j = 1,\ldots,m if and only if H is strongly convex with respect to the norm \|\cdot\| whose dual unit ball is the convex hull of the transformed unit balls M_j\mathcal{B}_j, j = 1, \ldots, m, where \mathcal{B}_j is the l^2 unit ball whose dimension matches the number of columns of M_j.

Proof. H \succeq M_jM_j^T if and only if z^THz \geq \max_{j=1}^m \|M_j^Tz\|_2^2. Now note that \|M_j^Tz\|_2 = \sup_{w \in \mathcal{B}_j} w^TM_j^Tz = \sup_{w' \in M_j\mathcal{B}_j} (w')^Tz. If we define \|z\| = \max_{j=1}^m \|M_j^Tz\|_2, it is then apparent that the dual norm unit ball is the convex hull of the M_j\mathcal{B}_j.