Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
TensorFlow 2 Reinforcement Learning Cookbook

You're reading from   TensorFlow 2 Reinforcement Learning Cookbook Over 50 recipes to help you build, train, and deploy learning agents for real-world applications

Arrow left icon
Product type Paperback
Published in Jan 2021
Publisher Packt
ISBN-13 9781838982546
Length 472 pages
Edition 1st Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
Palanisamy Palanisamy
Author Profile Icon Palanisamy
Palanisamy
Arrow right icon
View More author details
Toc

Table of Contents (11) Chapters Close

Preface 1. Chapter 1: Developing Building Blocks for Deep Reinforcement Learning Using Tensorflow 2.x 2. Chapter 2: Implementing Value-Based, Policy-Based, and Actor-Critic Deep RL Algorithms FREE CHAPTER 3. Chapter 3: Implementing Advanced RL Algorithms 4. Chapter 4: Reinforcement Learning in the Real World – Building Cryptocurrency Trading Agents 5. Chapter 5: Reinforcement Learning in the Real World – Building Stock/Share Trading Agents 6. Chapter 6: Reinforcement Learning in the Real World – Building Intelligent Agents to Complete Your To-Dos 7. Chapter 7: Deploying Deep RL Agents to the Cloud 8. Chapter 8: Distributed Training for Accelerated Development of Deep RL Agents 9. Chapter 9: Deploying Deep RL Agents on Multiple Platforms 10. Other Books You May Enjoy

Building a neural agent

This recipe will guide you through the steps to build a complete agent and the agent-environment interaction loop, which is the main building block for any RL application. When you complete the recipe, you will have an executable script where a simple agent tries to act in a Gridworld environment. A glimpse of what the agent you build will likely be doing is shown in the following screenshot:

Figure 1.11 – Screenshot of output from the neural_agent.py script

Figure 1.11 – Screenshot of output from the neural_agent.py script

Getting ready

Let's get started by activating the tf2rl-cookbook Conda Python environment and running the following code to install and import the necessary Python modules:

pip install tensorflow gym tqdm  # Run this line in a terminal
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import gym
import envs
from tqdm import tqdm

How to do it…

We will start by implementing a Brain class powered by a neural network implemented using TensorFlow 2.x:

  1. Let's first initialize a neural brain model using TensorFlow 2.x and the Keras functional API:
    class Brain(keras.Model):
        def __init__(self, action_dim=5, 
                       input_shape=(1, 8 * 8)):
            """Initialize the Agent's Brain model
            Args:
                action_dim (int): Number of actions
            """
            super(Brain, self).__init__()
            self.dense1 = layers.Dense(32, input_shape= \
                              input_shape, activation="relu")
            self.logits = layers.Dense(action_dim)
  2. Next, we implement the Brain class's call(…) method:
    def call(self, inputs):
            x = tf.convert_to_tensor(inputs)
            if len(x.shape) >= 2 and x.shape[0] != 1:
                x = tf.reshape(x, (1, -1))
            return self.logits(self.dense1(x))
  3. Now we need to implement the Brain class's process() method to conveniently perform predictions on a batch of inputs/observations:
    def process(self, observations):
            # Process batch observations using `call(inputs)`
            # behind-the-scenes
            action_logits = \
                self.predict_on_batch(observations)
            return action_logits
  4. Let's now implement the init function of the agent class:
    class Agent(object):
        def __init__(self, action_dim=5, 
                     input_shape=(1, 8 * 8)):
            """Agent with a neural-network brain powered
               policy
            Args:
                brain (keras.Model): Neural Network based 
            model
            """
            self.brain = Brain(action_dim, input_shape)
            self.policy = self.policy_mlp
  5. Now let's define a simple policy function for the agent:
    def policy_mlp(self, observations):
            observations = observations.reshape(1, -1)
            # action_logits = self.brain(observations)
            action_logits = self.brain.process(observations)
            action = tf.random.categorical(tf.math.\
                           log(action_logits), num_samples=1)
            return tf.squeeze(action, axis=1)
  6. After that, let's implement a convenient get_action method for the agent:
    def get_action(self, observations):
            return self.policy(observations)
  7. Let's now create a placeholder function for learn() that will be implemented as part of RL algorithm implementation in future recipes:
    def learn(self, samples):
            raise NotImplementedError

    This completes our basic agent implementation with the necessary ingredients!

  8. Let's now evaluate the agent in a given environment for one episode:
    def evaluate(agent, env, render=True):
        obs, episode_reward, done, step_num = env.reset(),
                                              0.0, False, 0
        while not done:
            action = agent.get_action(obs)
            obs, reward, done, info = env.step(action)
            episode_reward += reward
            step_num += 1
            if render:
                env.render()
        return step_num, episode_reward, done, info
  9. Finally, let's implement the main function:
    if __name__ == "__main__":
        env = gym.make("Gridworld-v0")
        agent = Agent(env.action_space.n, 
                      env.observation_space.shape)
        for episode in tqdm(range(10)):
            steps, episode_reward, done, info = \
                                         evaluate(agent, env)
            print(f"EpReward:{episode_reward:.2f}\
                   steps:{steps} done:{done} info:{info}")
        env.close()
  10. Execute the script as follows:
    python neural_agent.py

    You should see the Gridworld environment GUI pop up. This will show you what the agent is doing in the environment, and it will look like the following screenshot:

Figure 1.12 – A screenshot of the neural agent acting in the Gridworld environment

Figure 1.12 – A screenshot of the neural agent acting in the Gridworld environment

This provides a simple, yet complete, recipe to build an agent and the agent-environment interaction loop. All that is left is to add the RL algorithm of your choice to the learn() method and the agent will start acting intelligently!

How it works…

This recipe puts together the necessary ingredients to build a complete agent-environment system. The Brain class implements the neural network that serves as the processing unit of the agent, and the agent class utilizes the Brain class and a simple policy that chooses an action based on the output of the brain after processing the observations received from the environment.

We implemented the Brain class as a subclass of the keras.Model class, which allows us to define a custom neural network-based model for the agent's brain. The __init__ method initializes the Brain model and defines the necessary layers using the TensorFlow 2.x Keras functional API. In this Brain model, we are creating two dense (also known as fully-connected) layers to build our starter neural network. In addition to the __init__ method, the call(…) method is also a mandatory method that needs to be implemented by child classes inheriting from the keras.Model class. The call(…) method first converts the inputs to a TensorFlow 2.x tensor and then flattens the inputs to be of the shape 1 x total_number_of_elements in the input tensor. For example, if the input data has a shape of 8 x 8 (8 rows and 8 columns), the data is first converted to a tensor and the shape is flattened to 1 x 8 * 8 = 1 x 64. The flattened inputs are then processed by the dense1 layer, which contains 32 neurons and a ReLU activation function. Finally, the logits layer processes the output from the previous layer and produces n number of outputs corresponding to the action dimension (n).

The predict_on_batch(…) method performs predictions on the batch of inputs given as the argument. This function (unlike the predict() function of Keras) assumes that the inputs (observations) provided as the argument are exactly one batch of inputs and thus feeds the batch to the network without any further splitting of the input data.

We then implemented the Agent class and, in the agent initialization function, we created an object instance of the Brain class by defining the following:

self.brain = Brain(action_dim, input_shape)

Here, input_shape is the shape of the input that is expected to be processed by the brain, and action_dim is the shape of the output expected from the brain. The agent's policy is defined to be a custom Multi-Layer Perceptron (MLP)-based policy based on the brain's neural network architecture. Note that we can reuse DiscretePolicy from the previous recipe to initialize the agent's policy as well.

The agent's policy function, policy_mlp, flattens the input observations and sends it for processing by the agent's brain to receive the action_logits, which are the unnormalized probabilities for the actions. The final action to be taken is obtained by using TensorFlow 2.x's categorical method from the random module, which samples a valid action from the given action_logits (unnormalized probabilities).

Important note

If all of the observations supplied to the predict_on_batch function cannot be accommodated in the given amount of GPU memory or RAM (CPU), the operation can cause a GPU Out Of Memory (OOM) error.

The main function that gets launched – if the neural_agent.py script is run directly – creates an instance of the Gridworld-v0 environment, initializes an agent using the action and observation space of this environment, and starts evaluating the agent for 10 episodes.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image