7. Temporal Difference Learning
Activity 7.01: Using TD(0) Q-Learning to Solve FrozenLake-v0 Stochastic Transitions
- Import the required modules:
import numpy as np import matplotlib.pyplot as plt %matplotlib inline import gym
- Instantiate the
gym
environment calledFrozenLake-v0
using theis_slippery
flag set toTrue
in order to enable stochasticity:env = gym.make('FrozenLake-v0', is_slippery=True)
- Take a look at the action and observation spaces:
print("Action space = ", env.action_space) print("Observation space = ", env.observation_space)
This will print out the following:
Action space = Discrete(4) Observation space = Discrete(16)
- Create two dictionaries to easily translate the
actions
numbers into moves:actionsDict = {} actionsDict[0] = " L " actionsDict[1] = " D " actionsDict[2] = " R " actionsDict[3] = " U " actionsDictInv = {} actionsDictInv["L"] = 0 actionsDictInv["D"...