Value function

Prerequisites

Reinforcement Learning 101

Value functions represent CDE rewards from a state

The value function $V^{π} (s)$ is defined as the CDE rewards conditioned on running the policy $π$ from a given state $s$ :

V^{π} (s) = E_{τ \sim P r (*)} [G (τ) ∣ s_{0} = s]

Given a policy $π$ and an MDP, estimating $V^{π} (s)$ is called policy evaluation. One way to understand why the value function is useful (i.e: why evaluating a policy is useful) is to define our objective above with it:

π^{*} = ar g π \in Π max E_{τ \sim P r (*)} [G (τ)] = ar g π \in Π max E_{s \sim d (*)} [V^{π} (s)]

Notice how in the first line, the expectation is over a distribution of trajectories, while in the second line, the expectation is over a distribution of (starting) states. This means that the cumulative discounted expected rewards of a policy (over a distribution of trajectories) is equal to the expected value function of a policy (over the distribution of starting states).

Continually Learning Blog

Explorer

Value function

Prerequisites

Value functions represent CDE rewards from a state

Graph View

Table of Contents

Backlinks