The last step in reinforcement learning is to run the experiment. To do this, we need to drop the agent into the environment and then allow the agent to take steps until it reaches the goal. The agent is constrained by a limited number of possible moves and the environment also places another constraint—in our case, by setting boundaries. We set up a for loop that iterates through rounds of the agent attempting a legal move and then sees whether the maze has been successfully accomplished. The loop stops when the agent reaches the goal. To begin our experiment with our defined agent and environment, we write the following code:
state = reset(env)
for (j in 1:5000) {
action = agent$act(state)
nrd = step(env,action)
next_state = unlist(nrd[1])
reward = as.integer(nrd[2])
done = as.logical(nrd[3])
next_state = matrix(c(next_state[1],next_state...