## The Underwater Cartpole

My last few posts have been rather abstract. I thought I’d use this one to go into some details about the actual system we’re working with.

As I mentioned before, we are looking at a cart pole in a water tunnel. A cart pole is sometimes also called an inverted pendulum. Here is a diagram from wikipedia:

The parameter we have control over is F, the force on the cart. We would like to use this to control both the position of the cart and the angle of the pendulum. If the cart is standing still, the only two possible fixed points of the system are $\theta = 0$ (the bottom, or “downright”) and $\theta = \pi$ (the “upright”). Since $\theta = 0$ is easy to get to, we will be primarily interested with getting to $\theta = \pi$.

For now, I’m just going to worry about the regular cart pole system, without introducing any fluid dynamics. This is because the fluid dynamics are complicated, even with a fairly rough model (called the Quasi-steady Model), and I don’t know how to derive them anyway. Before continuing, it would be nice to have an explicit parametrization of the system. There are two position states we care about: $x$, the cart position; and $\theta$, the pendulum angle, which we will set to $0$ at the bottom with the counter-clockwise direction being positive. I realize that this is not what the picture indicates, and I apologize for any confusion. I couldn’t find any good pictures that parametrized it the way I wanted, and I’m going to screw up if I use a different parametrization than what I’ve written down.

At any rate, in addition to the two position states $x$ and $\theta$, we also care about the velocity states $\dot{x}$ and $\dot{\theta}$, so that we have four states total. For convenience, we’ll also name a variable $u := \frac{F}{M}$, so that we have a control input $u$ that directly affects the acceleration of the cart. We also have system parameters $M$ (the mass of the cart), $g$ (the acceleration due to gravity), $l$ (the length of the pendulum arm), and $I$ (the inertia of the pendulum arm). With these variables, we have the following equations of motion:

$\left[ \begin{array}{c} \dot{x} \\ \dot{\theta} \\ \ddot{x} \\ \ddot{\theta} \end{array} \right] = \left[ \begin{array}{c} \dot{x} \\ \dot{\theta} \\ 0 \\ -\frac{mgl\sin(\theta)}{I} \end{array} \right] + \left[ \begin{array}{c} 0 \\ 0 \\ 1 \\ -\frac{mg\cos(\theta)}{I} \end{array} \right] u$

You will note that the form of these equations is different from in my last post. This is because I misspoke last time. The actual form we should use for a general system is

$\dot{x} = f(x) + B(x)u,$

or, if we are assuming a second-order system, then

$\left[ \begin{array}{c} \dot{q} \\ \ddot{q} \end{array} \right] = \left[ \begin{array}{c} \dot{q} \\ f(q,\dot{q}) \end{array} \right] + B(q,\dot{q}) u.$

Here we are assuming that the natural system dynamics can be arbitrarily non-linear in $x$, but the effect of control is still linear for any fixed system state (which, as I noted last time, is a pretty safe assumption). The time when we use the form $\dot{x} = Ax + Bu$ is when we are talking about a linear system — usually a linear time-invariant system, but we can also let $A$ and $B$ depend on time and get a linear time-varying system.

I won’t go into the derivation of the equations of motion of the above system, as it is a pretty basic mechanics problem and you can find the derivation on Wikipedia if you need it. Instead, I’m going to talk about some of the differences between this system and the underwater system, why this model is still important, and how we can apply the techniques from the last two posts to get a good controller for this system.

Differences from the Underwater System

In the underwater system, instead of having gravity, we have a current (the entire system is on the plane perpendicular to gravity). I believe that the effect of current is much the same as the affect of gravity (although with a different constant), but that could actually be wrong. At any rate, the current plays the role that gravity used to play in terms of defining “up” and “down” for the system (as well as creating a stable fixed point at $\theta = 0$ and an unstable fixed point at $\theta = \pi$).

More importantly, there is significant drag on the pendulum, and the drag is non-linear. (There is always some amount of drag on a pendulum due to friction of the joint, but it’s usually fairly linear, or at least easily modelled.) The drag becomes the greatest when $\theta = \pm \frac{\pi}{2}$, which is also the point at which $u$ becomes useless for controlling $\theta$ (note the $\cos(\theta)$ term in the affect of $u$ on $\ddot{\theta}$). This means that getting past $\frac{\pi}{2}$ is fairly difficult for the underwater system.

Another difference is that high accelerations will cause turbulence in the water, and I’m not sure what affect that will have. The model we’re currently using doesn’t account for this, and I haven’t had a chance to experiment with the general fluid model (using PDEs) yet.

Why We Care

So with all these differences, why am I bothering to give you the equations for the regular (not underwater) system? More importantly, why would I care about them for analyzing the actual system in question?

I have to admit that one of my reasons is purely pedagogical. I wanted to give you a concrete example of a system, but I didn’t want to just pull out a long string of equations from nowhere, so I chose a system that is complex enough to be interesting but that still has dynamics that are simple to derive. However, there are also better reasons for caring about this system. The qualitative behaviour of this system can still be good for giving intuition about the behaviour of the underwater system.

For instance, one thing we want to be able to do is swing-up. With limited magnitudes of acceleration and a limited space (in terms of $x$) to perform maneuvers in, it won’t be possible in general to perform a swing-up. However, there are various system parameters that could make it easier or harder to perform the swing-up. For instance, will increasing $I$ (the inertia of the pendulum) make it easier or harder to perform a swing-up? (You should think about this if you don’t know the answer, so I’ve provided it below the fold.)

## Linear Control Theory: Part 0

The purpose of this post is to introduce you to some of the basics of control theory and to introduce the Linear-Quadratic Regulator, an extremely good hammer for solving stabilization problems.

To start with, what do we mean by a control problem? We mean that we have some system with dynamics described by an equation of the form

$\dot{x} = Ax,$

where $x$ is the state of the system and $A$ is some matrix (which itself is allowed to depend on $x$). For example, we could have an object that is constrained to move in a line along a frictionless surface. In this case, the system dynamics would be

$\left[ \begin{array}{c} \dot{q} \\ \ddot{q} \end{array} \right] = \left[ \begin{array}{cc} 0 & 1 \\ 0 & 0 \end{array} \right]\left[ \begin{array}{c} q \\ \dot{q} \end{array} \right].$

Here $q$ represents the position of the object, and $\dot{q}$ represents the velocity (which is a relevant component of the state, since we need it to fully determine the future behaviour of the system). If there was drag, then we could instead have the following equation of motion:

$\left[ \begin{array}{c} \dot{q} \\ \ddot{q} \end{array} \right] = \left[ \begin{array}{cc} 0 & 1 \\ 0 & -b \end{array} \right]\left[ \begin{array}{c} q \\ \dot{q} \end{array} \right],$

where $b$ is the coefficient of drag.

If you think a bit about the form of these equations, you will realize that it is both redundant and not fully general. The form is redundant because $A$ can be an arbitrary function of $x$, yet it also acts on $x$ as an argument, so the equation $\ddot{q} = q\dot{q}$, for example, could be written as

$\left[ \begin{array}{c} \dot{q} \\ \ddot{q} \end{array} \right] = \left[ \begin{array}{cc} 0 & 1 \\ \alpha \dot{q} & (1-\alpha) q \end{array} \right] \left[ \begin{array}{c} q \\ \dot{q} \end{array} \right]$

for any choice of $\alpha$. On the other hand, this form is also not fully general, since $x = 0$ will always be a fixed point of the system. (We could in principle fix this by making $\dot{x}$ affine, rather than linear, in $x$, but for now we’ll use the form given here.)

So, if this representation doesn’t uniquely describe most systems, and can’t describe other systems, why do we use it? The answer is that, for most systems arising in classical mechanics, the equations naturally take on this form (I think there is a deeper reason for this coming from Lagrangian mechanics, but I don’t yet understand it).

Another thing you might notice is that in both of the examples above, $x$ was of the form $\left[ \begin{array}{c} q \\ \dot{q} \end{array} \right]$. This is another common phenomenon (although $q$ and $\dot{q}$ may be vectors instead of scalars in general), owing to the fact that Newtonian mechanics produces second-order systems, and so we care about both the position and velocity of the system.

So, now we have a mathematical formulation, as well as some notation, for what we mean by the equations of motion of a system. We still haven’t gotten to what we mean by control. What we mean is that we assume that, in addition to the system state $x$, we have a control input $u$ (usually we can choose $u$ independently from $x$),  such that the actual equations of motion satisfy

$\dot{x} = Ax+Bu,$

where again, $A$ and $B$ can both depend on $x$. What this really means physically is that, for any configuration of the system, we can choose a control input $u$, and $u$ will affect the instantaneous change in state in a linear manner. We normally call each of the entries of $u$ a torque.

The assumption of linearity might seem strong, but it is again true for most systems, in the sense that a linear increase in a given torque will induce a linear response in the kinematics of the system. But note that this is only true once we talk about mechanical torques. If we think of a control input as an electrical signal, then the system will usually respond non-linearly with respect to the signal. This is simply because the actuator itself provides a force that is non-linear with its electrical input.

We can deal with this either by saying that we only care about a local model, and the actuator response is locally linear to its input; or, we can say that the problem of controlling the actuator itself is a disjoint problem that we will let someone worry about. In either case, I will shamelessly use the assumption that the system response is linear in the control input.

So, now we have a general form for equations of motion with a control input. The general goal of a control problem is to pick a function $f(x,t)$ such that if we let $u = f(x,t)$ then the trajectory $X(t)$ induced by the equation $\dot{x} = Ax+Bf(x,t)$ minimizes some objective function $J(X,f)$. Sometimes our goals are more modest and we really just want to get to some final state, in which case we can make $J$ just be a function of the final state that assigns a score based on how close we end up to the target state. We might also have hard constraints on $u$ (because our actuators can only produce a finite amount of torque), in which case we can make $J$ assign an infinite penalty to any $f$ that violates these constraints.

As an examples, let’s return to our first example of an object moving in a straight line. This time we will say that $\left[ \begin{array}{c} \dot{q} \\ \ddot{q} \end{array} \right] = \left[ \begin{array}{cc} 0 & 1 \\ 0 & 0 \end{array} \right] \left[ \begin{array}{c} q \\ \dot{q} \end{array} \right]+\left[ \begin{array}{c} 0 \\ 1 \end{array} \right]u$, with the constraint that $|u| \leq A$. We want to get to $x = \left[ \begin{array}{c} 0 \\ 0 \end{array} \right]$ as quickly as possible, meaning we want to get to $q = 0$ and then stay there. We could have $J(X,f)$ just be the amount of time it takes to  get to the desired endpoint, with a cost of infinity on any $f$ that violates the torque limits. However, this is a bad idea, for two reasons.

The first reason is that, numerically, you will never really end up at exactly $\left[ \begin{array}{c} 0 \\ 0 \end{array} \right]$, just very close to it. So if we try to use this function on a computer, unless we are particularly clever we will assign a cost of $\infty$ to every single control policy.

However, we could instead have $J(X,f)$ be the amount of time it takes to get close to the desired endpoint. I personally still think this is a bad idea, and this brings me to my second reason. Once you come up with an objective function, you need to somehow come up with a controller (that is, a choice of $f$) that minimizes that objective function, or at the very least performs reasonably well as measured by the objective function. You could do this by being clever and constructing such a controller by hand, but in many cases you would much rather have a computer find the optimal controller. If you are going to have a computer search for a good controller, you want to make the search problem as easy as possible, or at least reasonable. This means that, if we think of $J$ as a function on the space of control policies, we would like to make the problem of optimizing $J$ tractable. I don’t know how to make this precise, but there are a few properties we would like $J$ to satisfy — there aren’t too many local minima, and the minima aren’t approached too steeply (meaning that there is a reasonable large neighbourhood of small values around each local minimum). If we choose an objective function that assigns a value of $\infty$ to almost everything, then we will end up spending most of our time wading through a sea of infinities without any direction (because all directions will just yield more values of $\infty$). So a very strict objective function will be very hard to optimize. Ideally, we would like a different choice of $J$ that has its minimum at the same location but that decreases gradually to that minimum, so that we can solve the problem using gradient descent or some similar method.

In practice, we might have to settle for an objective function that only is trying to minimize the same thing qualitatively, rather than in any precise manner. For example, instead of the choice of $J$ discussed above for the object moving in a straight line, we could choose

$J(X,f) = \int_{0}^{T} |q(t)|^2 dt,$

where $T$ is some arbitrary final time. In this form, we are trying to minimize the time-integral of some function of the deviation of $q$ from $0$. With a little bit of work, we can deduce that, for large enough $T$, the optimal controller is a bang-bang controller that accelerates towards $0$ at the greatest rate possible, until accelerating any more would cause the object to overshoot $q = 0$, at which point the controller should decelerate at the greatest rate possible (there are some additional cases for when the object will overshoot the origin no matter what, but this is the basic idea).

This brings us to my original intention in making this post, which is LQR (linear-quadratic regulator) control. In this case, we assume that $A$ and $B$ are both constant and that our cost function is of the form

$J(X,f) = \int_{0}^{\infty} X(t)^{T}QX(t) + f(X(t),t)^{T}Rf(X(t),t) dt,$

where the $T$ means transpose and $Q$ and $R$ are both positive definite matrices. In other words, we assume that our goal is to get to $x = 0$, and we penalize both our distance from $x = 0$ and the amount of torque we apply at each point in time. If we have a cost function of this form, then we can actually solve analytically for the optimal control policy $f$. The solution involves solving the Hamilton-Bellman-Jacobi equations, and I won’t go into the details, but when the smoke clears we end up with a linear feedback policy $u = -Kx$, where $K = R^{-1}B^{T}P$, and $P$ is given by the solution to the algebraic Riccati equation

$A^TP+PA-PBR^{-1}B^TP+Q=0.$

What’s even better is that MATLAB has a built-in function called lqr that will set up and solve the Riccati equation automatically.

You might have noticed that we had to make the assumption that both $A$ and $B$ were constant, which is a fairly strong assumption, as it implies that we have a LTI (linear time-invariant) system. So what is LQR control actually good for? The answer is stabilization. If we want to design a controller that will stabilize a system about a point, we can shift coordinates so that the point is at the origin, then take a linear approximation about the origin. As long as we have a moderately accurate linear model for the system about that point, the LQR controller will successfully stabilize the system to that point within some basin of attraction. More technically, the LQR controller will make the system locally asymptotically stable, and the cost function $J$ for the linear system will be a valid local Lyapunov function.

Really, the best reason to make use of LQR controllers is that they are a solution to stabilization problems that work out of the box. Many controllers that work in theory will actually require a ton of tuning in practice; this isn’t the case for an LQR controller. As long as you can identify a linear system about the desired stabilization point, even if your identification isn’t perfect, you will end up with a pretty good controller.

I was thinking of also going into techniques for linear system identification, but I think I’ll save that for a future post. The short answer is that you find a least-squares fit of the data you collect. I’ll also go over how this all applies to the underwater cart-pole in a future post.

## Robotics

This summer I am working in the Robotics Locomotion group at CSAIL (MIT’s Computer Science and Artificial Intelligence Laboratory). I’ve decided to start a blog to exposit on the ideas involved. This ranges from big theoretical ideas (like general system identification techniques) to problem-specific ideas (specific learning strategies for the system we’re interested in) to useful information on using computational tools (how to make MATLAB’s ode45 do what you want it to).

To start with, I’m going to describe the problem that I’m working on, together with John (a grad student in mechanical engineering).

Last spring, I took 6.832 (Underactuated Robotics) at MIT. In that class, we learned multiple incredibly powerful techniques for nonlinear control. After taking it, I was more or less convinced that we could solve, at least off-line, pretty much any control problem once it was posed properly. After coming to the Locomotion group, I realized that this wasn’t quite right. What is actually true is that we can solve any control problem where we have a good model and a reasonable objective function (we can also run into problems in high dimensions, but even there you can make progress if the objective function is nice enough).

So, we can (almost) solve any problem once we have a good model. That means the clear next thing to do is to come up with really good modelling techniques. Again, this is sort of true. There are actually three steps to constructing a good controller: experimental design, system identification, and a control policy.

System identification is the process of building a good model for your system given physical data from it. But to build a good model, you need good data. That’s where experimental design comes in. Many quick and dirty ways of collecting data (like measuring the response of a sinusoidal input) introduce flaws into the model (which cannot be fixed except by collecting more data). I will explain these issues in more detail in a later post. For simple systems, you can get away with still-quick but slightly-less-dirty methods (such as a chirp input), but for more general systems you need better techniques. Ian (a research scientist in our lab) has a nice paper on this topic that involves semidefinite optimization.

Designing a control policy we have already discussed. It is the process of, once we have collected data and built a model, designing an algorithm that will guide our system to its desired end state.

So, we have three tasks — experimental design, system identification, and control policy. If we can do the first two well, then we already know how to do the third. So one solution is to do a really good job on the experimental design and system identification so that we can use our sophisticated control techniques. This is what Michael and Zack are working on with a walking robot. Zack has spent months building the robot in such a way that it will obey all of our idealized models and behave nicely enough that we can run LQR-trees on it.

Another solution is to give up on having a perfect model and have a system identification algorithm that, instead of returning a single model, returns an entire space of models (for example, by giving an uncertainty on each parameter). Then, as long as we can build a controller that works for every system in this space, we’ll be good to go. This can be done by using techniques from robust control.

A final idea is to give up on models entirely and try to build a controller that relies only on the qualitative behaviour of the system in question (for example, by using model-free learning techniques). This is what I am working on. More specifically, I’m working on control in fluids with reasonably high Reynold’s number. Unless you can solve the Navier-Stokes equations, you can’t hope to get a model for this system, so you’ll have to do something more clever.

The first system we’re working with is the underwater cart-pole system. This involves a pendulum attached to a cart. The pendulum itself is not actuated (meaning there is no motor to swing it up). Instead, the only way to control the pendulum is indirectly, by moving the cart around (the cart is constrained to move in a line). The fluid dynamics enter when we put the pendulum in a water tunnel and replace the arm of the pendulum with a water foil.

When the pendulum is in a constant-velocity stream, the system becomes a cart and pendulum with non-linear damping. However, when we add objects to the stream, the objects shed vortices and the dynamics become too complicated to model with an ordinary differential equation. Instead, we need to simulate the solution to a partial differential equation, which is significantly more difficult computationally.

Our first goal is to design a controller that will stabilize the pendulum at the top in the case of a constant stream (we have already done this — more about that in a later post). Our next goal is to design a controller to swing the pendulum up to the top, again in a constant stream (this is what I hope to finish tomorrow — again, more details in a later post). Once we have these finished, the more interesting work begins — to accomplish the same tasks in the presence of vortices. If we were to use existing ideas, we would design a robust version of the controller for a constant stream, and treat the vortices as disturbances. And this will probably be the first thing we do, so that we have a standard of comparison. But there are many examples in nature of animals using vortices to their advantage. So our ultimate goal is to do the same in this simple system. Since vortices represent lots of extra energy in the water, our hope is to actually pull energy out of the vortex to aid in the swing-up task, thus actually using less energy than would be needed without vortices (if this sounds crazy, consider that dead trout can swim upstream using vortices).

Hopefully this gives you a good overview of my project. This is my first attempt at maintaining a research blog, so if you have any comments to help improve the exposition, please give them to me. Also, if you’d like me to elaborate further on anything, let me know. I’ll hopefully have a post or two going into more specific details this weekend.