So far, we have randomly picked an action and applied it to the game. Now, let's apply DQN for selecting actions for playing the PacMan game.
- We define the q_nn policy function as follows:
def policy_q_nn(obs, env):
# Exploration strategy - Select a random action
if np.random.random() < explore_rate:
action = env.action_space.sample()
# Exploitation strategy - Select the action with the highest q
else:
action = np.argmax(q_nn.predict(np.array([obs])))
return action
- Next, we modify the episode function to incorporate calculation of q_values and train the neural network on the sampled experience buffer. This is shown in the following code:
def episode(env, policy, r_max=0, t_max=0):
# create the empty list to contain game memory
#memory = deque(maxlen=1000)
# observe initial state
...