- What is the primary limitation of Q-learning algorithms?
- Why are stochastic gradient algorithms sample inefficient?
- How does DPG overcome the maximization problem?
- How does DPG guarantee enough exploration?
- What does DDPG stand for? And what is its main contribution?
- What problems does TD3 propose to minimize?
- What new mechanisms does TD3 employ?