Prerequisites
Value functions represent CDE rewards from a state
The value function is defined as the CDE rewards conditioned on running the policy from a given state :
Given a policy and an MDP, estimating is called policy evaluation. One way to understand why the value function is useful (i.e: why evaluating a policy is useful) is to define our objective above with it:
Notice how in the first line, the expectation is over a distribution of trajectories, while in the second line, the expectation is over a distribution of (starting) states. This means that the cumulative discounted expected rewards of a policy (over a distribution of trajectories) is equal to the expected value function of a policy (over the distribution of starting states).