- What's the main property of the Monte Carlo method used in RL?
- Why are Monte Carlo methods offline?
- What are the two main ideas of TD learning?
- What are the differences between Monte Carlo and TD?
- Why is exploration important in TD learning?
- Why is Q-learning off-policy?





















































