- Monte Carlo methods are applied only for episodic tasks whereas TD learning can be applied to both episodic and nonepisodic tasks
- The difference between the actual value and the predicted value is called TD error
- Refer section TD prediction and TD control
- Refer section Solving taxi problem using Q learning
- In Q learning, we take action using an epsilon-greedy policy and, while updating the Q value, we simply pick up the maximum action. In SARSA, we take the action using the epsilon-greedy policy and also, while updating the Q value, we pick up the action using the epsilon-greedy policy.




















































