Suggested languages for you:

Americas

Europe

Problem 1

# Consider the discrete-time time-invariant one dimensional system with $$X=\mathbb{R}$$, transitions $$x^{+}=x+u,$$ and control value space the nonnegative reals: $$\mathcal{U}=\mathbb{R}_{+} .$$ With $$q(t, x, u):=u^{2}, p(x):=x^{2}, \sigma=0$$, and $$\tau=3$$, find $$V(t, x)$$ for all $$x$$ and $$t=0,1,2$$ using the dynamic programming technique. Next guess a general formula for arbitrary $$t, \sigma, \tau$$ and establish its validity.

Expert verified
In summary, using the dynamic programming technique for the given one-dimensional discrete-time linear system, we derived the cost-to-go function $$V(t, x)$$ for $$t = 0, 1, 2$$. The general formula for arbitrary values of $$t, \sigma$$, and $$\tau$$ is given by: $$V(t, x) = (1 + \frac{t}{\tau - \sigma})x^2$$ for $$x \ge 0$$, and $$V(t,x) = (1 - \frac{t}{\tau - \sigma})x^2$$ for $$x < 0$$
See the step by step solution

## Step 1: Write the dynamic programming equation

The dynamic programming equation, also known as the Bellman equation, is given as: $$V(t, x) = \min_{u \in \mathcal{U}} [q(t, x, u) + V(t+1, x^+)]$$ with the terminal condition: $$V(\tau, x) = p(x)$$ Now we have to find the optimal control input $$u^* \in \mathcal{U}$$ that minimizes the cost-to-go function for $$t = 0, 1, 2$$.

## Step 2: Calculate V(3, x)

The terminal condition is given as: $$V(\tau, x) = V(3, x) = p(x) = x^2$$

## Step 3: Calculate V(2, x)

Substitute $$t = 2$$ into the Bellman equation: $$V(2, x) = \min_{u \in \mathcal{U}} [u^2 + V(3, x+u)]$$ We know $$V(3, x+u) = (x + u)^2$$, so: $$V(2, x) = \min_{u \in \mathcal{U}} [u^2 + (x + u)^2]$$ To find the optimal control input $$u^*$$, we have to take the derivative of the cost function with respect to $$u$$, set it equal to 0, and solve for $$u$$: $$\frac{d}{du}[u^2 + (x + u)^2] = 0$$ By solving for $$u$$, we get $$u^* = -\frac{x}{2}$$ However, since the control input has to be non-negative, the optimal control input is: $$u^* = \max\left(-\frac{x}{2}, 0\right)$$ Now substitute optimal control input $$u^*$$ into the cost function: $$V(2, x) = \frac{3x^2}{4}$$ for $$x \ge 0$$, and $$V(2, x) = x^2$$ for $$x < 0$$

## Step 4: Calculate V(1, x)

Following the same procedure for $$t = 1$$: $$V(1, x) = \min_{u \in \mathcal{U}} [u^2 + V(2, x+u)]$$ Using the calculated value for $$V(2, x+u)$$, we can find the expression for $$V(1, x)$$: $$V(1, x) = \lbrace \begin{array}{ll} \frac{2x^2}{3} & \mbox{if } x \ge 0 \\ \frac{3x^2}{2} & \mbox{if } x < 0 \end{array}$$

## Step 5: Calculate V(0, x)

Finally, for $$t = 0$$: $$V(0, x) = \min_{u \in \mathcal{U}} [u^2 + V(1, x+u)]$$ Using the calculated value for $$V(1, x+u)$$, we can find the expression for $$V(0, x)$$: $$V(0, x) = \frac{7x^2}{6}$$ for $$x \ge 0$$, and $$V(0,x) = x^2$$ for $$x < 0$$

## Step 6: Generalize the result for arbitrary t, σ, τ

For general time t and arbitrary $$\sigma$$, $$\tau$$: $$V(t, x) = (1 + \frac{t}{\tau - \sigma})x^2$$ for $$x \ge 0$$, and $$V(t,x) = (1 - \frac{t}{\tau - \sigma})x^2$$ for $$x < 0$$ This formula can be validated by applying the step-by-step calculation at each time period involved in dynamic programming. It always yields correct cost-to-go functions for a given set of $$t, \sigma$$, and $$\tau$$.

We value your feedback to improve our textbook solutions.

## Access millions of textbook solutions in one place

• Access over 3 million high quality textbook solutions
• Access our popular flashcard, quiz, mock-exam and notes features

## Join over 22 million students in learning with our Vaia App

The first learning app that truly has everything you need to ace your exams in one place.

• Flashcards & Quizzes
• AI Study Assistant
• Smart Note-Taking
• Mock-Exams
• Study Planner