- What is the difference between a reward and a value?
- What is a hyperparameter? Give an example of a hyperparameter other than the ones discussed in this chapter.
- Why will a Q-learning agent not choose the highest Q-valued action for its current state?
- Explain one benefit of a decaying gamma.
- Describe in one or two sentences the difference between the decision-making strategies of SARSA and Q-learning.
- What kind of policy does Q-learning implicitly assume the agent is following?
- Under what circumstances will SARSA and Q-learning produce the same results?





















































