OPTIMAL CONTROL BY DIRECT APPROXIMATION OF THE GRADIENT OF THE COST-TO-GO

Douglas B. Tweed

Keywords

Optimal control, learning algorithms, nonlinear systems

Abstract

A promising approach to optimal control is to start with a non- optimal controller u(1) and improve it. One very efficient example is the method of generalized Hamilton–Jacobi–Bellman (GHJB) equations, which learns an approximation to the gradient ∇J(1) of the cost-to-go function of u(1), uses that gradient to define a better controller u(2), and then repeats, creating a sequence u(n) that converges to the optimal controller. Here we point out that GHJB works indirectly in the sense that it does not learn the best approximation to ∇J but instead learns the time derivative dJ/dt, and infers ∇J from that. We show we can get lower-cost controllers with fewer adjustable parameters by learning ∇J directly. We then compare this direct method with GHJB on test problems from the literature.

Important Links:

Go Back