The Action-Value Function (Q Value Function)
In the previous sections, we learned about the state-value function, which tells us how rewarding it is to be in a particular state for an agent. Now we will learn about another function where we can combine the state with actions. The action-value function will tell us how good it is for the agent to take any given action from a given state. We also call the action value the Q value. The equation can be written as follows:
The preceding equation can be written in an iterative fashion, as follows:
This equation is also known as the bellman equation. From the equation, we can express . A Bellman equation can be described as follows:
"The total expected reward being in state s and taking action a is the sum of two components: the reward (which is r) that we can...