Linear Control Theory: Part 0

The purpose of this post is to introduce you to some of the basics of control theory and to introduce the Linear-Quadratic Regulator, an extremely good hammer for solving stabilization problems.

To start with, what do we mean by a control problem? We mean that we have some system with dynamics described by an equation of the form

\dot{x} = Ax,

where x is the state of the system and A is some matrix (which itself is allowed to depend on x). For example, we could have an object that is constrained to move in a line along a frictionless surface. In this case, the system dynamics would be

\left[ \begin{array}{c} \dot{q} \\ \ddot{q} \end{array} \right] = \left[ \begin{array}{cc} 0 & 1 \\ 0 & 0 \end{array} \right]\left[ \begin{array}{c} q \\ \dot{q} \end{array} \right].

Here q represents the position of the object, and \dot{q} represents the velocity (which is a relevant component of the state, since we need it to fully determine the future behaviour of the system). If there was drag, then we could instead have the following equation of motion:

\left[ \begin{array}{c} \dot{q} \\  \ddot{q} \end{array} \right] = \left[ \begin{array}{cc} 0 & 1 \\ 0  & -b \end{array} \right]\left[ \begin{array}{c} q \\ \dot{q}  \end{array} \right],

where b is the coefficient of drag.

If you think a bit about the form of these equations, you will realize that it is both redundant and not fully general. The form is redundant because A can be an arbitrary function of x, yet it also acts on x as an argument, so the equation \ddot{q} = q\dot{q}, for example, could be written as

\left[ \begin{array}{c} \dot{q} \\ \ddot{q} \end{array} \right] = \left[ \begin{array}{cc} 0 & 1 \\ \alpha \dot{q} & (1-\alpha) q \end{array} \right] \left[ \begin{array}{c} q \\ \dot{q} \end{array} \right]

for any choice of \alpha. On the other hand, this form is also not fully general, since x = 0 will always be a fixed point of the system. (We could in principle fix this by making \dot{x} affine, rather than linear, in x, but for now we’ll use the form given here.)

So, if this representation doesn’t uniquely describe most systems, and can’t describe other systems, why do we use it? The answer is that, for most systems arising in classical mechanics, the equations naturally take on this form (I think there is a deeper reason for this coming from Lagrangian mechanics, but I don’t yet understand it).

Another thing you might notice is that in both of the examples above, x was of the form \left[ \begin{array}{c} q \\ \dot{q} \end{array} \right]. This is another common phenomenon (although q and \dot{q} may be vectors instead of scalars in general), owing to the fact that Newtonian mechanics produces second-order systems, and so we care about both the position and velocity of the system.

So, now we have a mathematical formulation, as well as some notation, for what we mean by the equations of motion of a system. We still haven’t gotten to what we mean by control. What we mean is that we assume that, in addition to the system state x, we have a control input u (usually we can choose u independently from x),  such that the actual equations of motion satisfy

\dot{x} = Ax+Bu,

where again, A and B can both depend on x. What this really means physically is that, for any configuration of the system, we can choose a control input u, and u will affect the instantaneous change in state in a linear manner. We normally call each of the entries of u a torque.

The assumption of linearity might seem strong, but it is again true for most systems, in the sense that a linear increase in a given torque will induce a linear response in the kinematics of the system. But note that this is only true once we talk about mechanical torques. If we think of a control input as an electrical signal, then the system will usually respond non-linearly with respect to the signal. This is simply because the actuator itself provides a force that is non-linear with its electrical input.

We can deal with this either by saying that we only care about a local model, and the actuator response is locally linear to its input; or, we can say that the problem of controlling the actuator itself is a disjoint problem that we will let someone worry about. In either case, I will shamelessly use the assumption that the system response is linear in the control input.

So, now we have a general form for equations of motion with a control input. The general goal of a control problem is to pick a function f(x,t) such that if we let u = f(x,t) then the trajectory $X(t)$ induced by the equation \dot{x} = Ax+Bf(x,t) minimizes some objective function J(X,f). Sometimes our goals are more modest and we really just want to get to some final state, in which case we can make J just be a function of the final state that assigns a score based on how close we end up to the target state. We might also have hard constraints on u (because our actuators can only produce a finite amount of torque), in which case we can make J assign an infinite penalty to any f that violates these constraints.

As an examples, let’s return to our first example of an object moving in a straight line. This time we will say that \left[ \begin{array}{c} \dot{q} \\ \ddot{q} \end{array} \right] = \left[ \begin{array}{cc} 0 & 1 \\ 0 & 0 \end{array} \right] \left[ \begin{array}{c} q \\ \dot{q} \end{array} \right]+\left[ \begin{array}{c} 0 \\ 1 \end{array} \right]u, with the constraint that |u| \leq A. We want to get to x = \left[ \begin{array}{c} 0 \\ 0 \end{array} \right] as quickly as possible, meaning we want to get to q = 0 and then stay there. We could have J(X,f) just be the amount of time it takes to  get to the desired endpoint, with a cost of infinity on any f that violates the torque limits. However, this is a bad idea, for two reasons.

The first reason is that, numerically, you will never really end up at exactly \left[ \begin{array}{c} 0 \\ 0 \end{array} \right], just very close to it. So if we try to use this function on a computer, unless we are particularly clever we will assign a cost of \infty to every single control policy.

However, we could instead have J(X,f) be the amount of time it takes to get close to the desired endpoint. I personally still think this is a bad idea, and this brings me to my second reason. Once you come up with an objective function, you need to somehow come up with a controller (that is, a choice of f) that minimizes that objective function, or at the very least performs reasonably well as measured by the objective function. You could do this by being clever and constructing such a controller by hand, but in many cases you would much rather have a computer find the optimal controller. If you are going to have a computer search for a good controller, you want to make the search problem as easy as possible, or at least reasonable. This means that, if we think of J as a function on the space of control policies, we would like to make the problem of optimizing J tractable. I don’t know how to make this precise, but there are a few properties we would like J to satisfy — there aren’t too many local minima, and the minima aren’t approached too steeply (meaning that there is a reasonable large neighbourhood of small values around each local minimum). If we choose an objective function that assigns a value of \infty to almost everything, then we will end up spending most of our time wading through a sea of infinities without any direction (because all directions will just yield more values of \infty). So a very strict objective function will be very hard to optimize. Ideally, we would like a different choice of J that has its minimum at the same location but that decreases gradually to that minimum, so that we can solve the problem using gradient descent or some similar method.

In practice, we might have to settle for an objective function that only is trying to minimize the same thing qualitatively, rather than in any precise manner. For example, instead of the choice of J discussed above for the object moving in a straight line, we could choose

J(X,f) = \int_{0}^{T} |q(t)|^2 dt,

where T is some arbitrary final time. In this form, we are trying to minimize the time-integral of some function of the deviation of q from 0. With a little bit of work, we can deduce that, for large enough T, the optimal controller is a bang-bang controller that accelerates towards $0$ at the greatest rate possible, until accelerating any more would cause the object to overshoot q = 0, at which point the controller should decelerate at the greatest rate possible (there are some additional cases for when the object will overshoot the origin no matter what, but this is the basic idea).

This brings us to my original intention in making this post, which is LQR (linear-quadratic regulator) control. In this case, we assume that A and B are both constant and that our cost function is of the form

J(X,f) = \int_{0}^{\infty} X(t)^{T}QX(t) + f(X(t),t)^{T}Rf(X(t),t) dt,

where the T means transpose and Q and R are both positive definite matrices. In other words, we assume that our goal is to get to x = 0, and we penalize both our distance from x = 0 and the amount of torque we apply at each point in time. If we have a cost function of this form, then we can actually solve analytically for the optimal control policy f. The solution involves solving the Hamilton-Bellman-Jacobi equations, and I won’t go into the details, but when the smoke clears we end up with a linear feedback policy u = -Kx, where K = R^{-1}B^{T}P, and P is given by the solution to the algebraic Riccati equation


What’s even better is that MATLAB has a built-in function called lqr that will set up and solve the Riccati equation automatically.

You might have noticed that we had to make the assumption that both A and B were constant, which is a fairly strong assumption, as it implies that we have a LTI (linear time-invariant) system. So what is LQR control actually good for? The answer is stabilization. If we want to design a controller that will stabilize a system about a point, we can shift coordinates so that the point is at the origin, then take a linear approximation about the origin. As long as we have a moderately accurate linear model for the system about that point, the LQR controller will successfully stabilize the system to that point within some basin of attraction. More technically, the LQR controller will make the system locally asymptotically stable, and the cost function J for the linear system will be a valid local Lyapunov function.

Really, the best reason to make use of LQR controllers is that they are a solution to stabilization problems that work out of the box. Many controllers that work in theory will actually require a ton of tuning in practice; this isn’t the case for an LQR controller. As long as you can identify a linear system about the desired stabilization point, even if your identification isn’t perfect, you will end up with a pretty good controller.

I was thinking of also going into techniques for linear system identification, but I think I’ll save that for a future post. The short answer is that you find a least-squares fit of the data you collect. I’ll also go over how this all applies to the underwater cart-pole in a future post.


4 Responses to Linear Control Theory: Part 0

  1. jnub says:

    The matrices Q and R in the objective function for LQR seem to be left undetermined. Is there a way to select them, or is this actually robust enough to work for any positive definite Q and R?

    • jsteinhardt says:

      Well if you have a perfectly accurate linear model of the system (in the sense that your linear model is the same as the linearization of the actual system about the point you care about), then for any Q and R you will be able to stabilize the system for sufficiently small perturbations. (Usually we refer to the portion of state space that the controller can stabilize as the basin of attraction.)

      Of course, this by itself isn’t that helpful, for a few reasons — first of all, you will never have a perfectly accurate model; secondly, if the basin of attraction becomes too small then issues like floating point precision and actuator lag will make the system unstable anyway. Perhaps most importantly, you usually want the basin of attraction to be reasonably large. This makes it easier to reach the basin from other parts of state space, and it also makes your controller more robust to noise and perturbations.

      However, in practice I have found that you get fairly reasonably-sized basins without worrying much about Q and R (I usually set them both to the identity). If you tune them a bit to reflect the actual performance considerations of the system in mind, you will do even better, but this probably isn’t necessary if all you care about is getting something that works.

      One reason things work out like this is that an LQR controller is, among other things, modifying the linear system so that all of its eigenvalues are in the left half-plane (that is, they have negative real part). Since eigenvalues vary continuously with the system parameters, there is actually an entire neighbourhood of linear systems that are all stabilized by a single LQR controller. Intuitively, as long as the instantaneous system dynamics stay inside that neighbourhood, you should end up with a stable system. I’m pretty sure this isn’t actually true, in the sense that if you have a time-varying linear system, even if the matrix describing the system dynamics has negative eigenvalues at every point in time, you can still end up with a system that is unstable. So maybe this isn’t the best intuition ever. I’ll have to think a bit more to come up with a better mathematical explanation for why LQR works. I suspect it has to do with the fact that the cost function (J) for LQR is a Lyapunov function for the linear system, and will therefore be a locally valid Lyapunov function for all nearby systems.

      • jnub says:

        Ok, but do Q and R have any physical significance? I realize that for the most part they don’t have much effect, but is the manner in which they control the shape of the basin understood?

  2. jsteinhardt says:

    @Arvind: If you make Q and R diagonal, then in a rough sense a given entry in Q is inversely proportional to how good your model for that variable has to be, and for R it is inversely proportional to how powerful your controller has to be. Of course, if you scale both Q and R equally then you end up with the same controller, so this is more a statement about relative rather than absolute amounts.

    If you want to make precise statements about how Q and R shape the basin, then you will need an actual model for the rest of the system as well, at which point (as long as the model is polynomial / well-approximated by polynomials) you can do something like sum-of-squares programming on the LQR cost function to find a region where it is a Lyapunov function. This region is contained in the basin of attraction, and Russ (the lab director of my group) claims that it is also a pretty good approximation of it.

    I don’t know how much that explanation makes sense unless you know a bit about Lyapunov functions, although the Wikipedia article should contain everything you’d need. I can elaborate more if you want. I should probably also write a post about Lyapunov functions at some point, although that’s lower-priority right now, since I’ve spent the last two posts talking about theory and I feel I should go into practice as well.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: