Algebra trick of the day

I’ve decided to start recording algebra tricks as I end up using them. Today I actually have two tricks, but they end up being used together a lot. I don’t know if they have more formal names, but I call them the “trace trick” and the “rank 1 relaxation”.

Suppose that we want to maximize the Rayleigh quotient \frac{x^TAx}{x^Tx} of a matrix A. There are many reasons we might want to do this, for instance of A is symmetric then the maximum corresponds to the largest eigenvalue. There are also many ways to do this, and the one that I’m about to describe is definitely not the most efficient, but it has the advantage of being flexible, in that it easily generalizes to constrained maximizations, etc.

The first observation is that \frac{x^TAx}{x^Tx} is homogeneous, meaning that scaling x doesn’t affect the result. So, we can assume without loss of generality that x^Tx = 1, and we end up with the optimization problem:

maximize x^TAx

subject to x^Tx = 1

This is where the trace trick comes in. Recall that the trace of a matrix is the sum of its diagonal entries. We are going to use two facts: first, the trace of a number is just the number itself. Second, trace(AB) = trace(BA). (Note, however, that trace(ABC) is not in general equal to trace(BAC), although trace(ABC) is equal to trace(CAB).) We use these two properties as follows — first, we re-write the optimization problem as:

maximize Trace(x^TAx)

subject to Trace(x^Tx) = 1

Second, we re-write it again using the invariance of trace under cyclic permutations:

maximize Trace(Axx^T)

subject to Trace(xx^T) = 1

Now we make the substitution X = xx^T:

maximize Trace(AX)

subject to Trace(X) = 1, X = xx^T

Finally, note that a matrix X can be written as xx^T if and only if X is positive semi-definite and has rank 1. Therefore, we can further write this as

maximize Trace(AX)

subject to Trace(X) = 1, Rank(X) = 1, X \succeq 0

Aside from the rank 1 constraint, this would be a semidefinite program, a type of problem that can be solved efficiently. What happens if we drop the rank 1 constraint? Then I claim that the solution to this program would be the same as if I had kept the constraint in! Why is this? Let’s look at the eigendecomposition of X, written as \sum_{i=1}^n \lambda_i x_ix_i^T, with \lambda_i \geq 0 (by positive semidefiniteness) and \sum_{i=1}^n \lambda_i = 1 (by the trace constraint). Let’s also look at Trace(AX), which can be written as \sum_{i=1}^n \lambda_i Trace(Ax_ix_i^T). Since Trace(AX) is just a convex combination of the Trace(Ax_ix_i^T), we might as well have just picked X to be x_ix_i^T, where i is chosen to maximize Trace(Ax_ix_i^T). If we set that \lambda_i to 1 and all the rest to 0, then we maintain all of the constraints while increasing Trace(AX), meaning that we couldn’t have been at the optimum value of X unless n was equal to 1. What we have shown, then, is that the rank of X must be 1, so that the rank 1 constraint was unnecessary.

Technically, X could be a linear combination of rank 1 matrices that all have the same value of Trace(AX), but in that case we could just pick any one of those matrices. So what I have really shown is that at least one optimal point has rank 1, and we can recover such a point from any solution, even if the original solution was not rank 1.

Here is a problem that uses a similar trick. Suppose we want to find x that simultaneously satisfies the equations:

b_i = |a_i^Tx|^2

for each i = 1,\ldots,n (this example was inspired from the recent NIPS paper by Ohlsson, Yang, Dong, and Sastry, although the idea itself goes at least back to Candes, Strohmer, and Voroninski). Note that this is basically equivalent to solving a system of linear equations where we only know each equation up to a sign (or a phase, in the complex case). Therefore, in general, this problem will not have a unique solution. To ensure the solution is unique, let us assume the very strong condition that whenever a_i^TVa_i = 0 for all i = 1,\ldots,n, the matrix V must itself be zero (note: Candes et al. get away with a much weaker condition). Given this, can we phrase the problem as a semidefinite program? I highly recommend trying to solve this problem on your own, or at least reducing it to a rank-constrained SDP, so I’ll include the solution below a fold.

Solution. We can, as before, re-write the equations as:

b_i = Trace(a_ia_i^Txx^T)

and further write this as

b_i = Trace(a_ia_i^TX), X \succeq 0, rank(X) = 1

As before, drop the rank 1 constraint and let X = \sum_{j=1}^m \lambda_j x_jx_j^T. Then we get:

b_i = \sum_{j=1}^m Trace(a_ia_i^Tx_jx_j^T)\lambda_j,

which we can re-write as b_i = a_i^T\left(\sum_{j=1}^m \lambda_jx_jx_j^T\right)a_i. But if x^* is the true solution, then we also know that b_i = a_i^Tx^*(x^*)^Ta_i, so that a_i^T\left(-x^*(x^*)^T+\sum_{j=1}^m \lambda_jx_jx_j^T\right) = 0 for all i. By the non-degeneracy assumption, this implies that

x^*(x^*)^T = \sum_{j=1}^m \lambda_jx_jx_j^T,

so in particular X = x^*(x^*)^T. Therefore, X = x^*(x^*)^T is the only solution to the semidefinite program even after dropping the rank constraint.

Advertisements

2 Responses to Algebra trick of the day

  1. Nice! You probably have, but you should see “The Matrix Cookbook”. There are tons of great results within.

  2. jsteinhardt says:

    Yup! The Matrix Cookbook is awesome!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: