Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Deep Learning Quick Reference
Deep Learning Quick Reference

Deep Learning Quick Reference: Useful hacks for training and optimizing deep neural networks with TensorFlow and Keras

Arrow left icon
Profile Icon Mike Bernico
Arrow right icon
$19.99 per month
Full star icon Full star icon Full star icon Full star icon Half star icon 4.5 (6 Ratings)
Paperback Mar 2018 272 pages 1st Edition
eBook
$9.99 $35.99
Paperback
$43.99
Subscription
Free Trial
Renews at $19.99p/m
Arrow left icon
Profile Icon Mike Bernico
Arrow right icon
$19.99 per month
Full star icon Full star icon Full star icon Full star icon Half star icon 4.5 (6 Ratings)
Paperback Mar 2018 272 pages 1st Edition
eBook
$9.99 $35.99
Paperback
$43.99
Subscription
Free Trial
Renews at $19.99p/m
eBook
$9.99 $35.99
Paperback
$43.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing
Table of content icon View table of contents Preview book icon Preview Book

Deep Learning Quick Reference

The Building Blocks of Deep Learning

Welcome to Deep Learning Quick Reference! In this book, I am going to attempt to make deep learning techniques more accessible, practical, and consumable to data scientists, machine learning engineers, and software engineers who need to solve problems with deep learning. If you want to train your own deep neural network and you're stuck somewhere, there is a good chance this guide will help.

This book is hands on and is intended to be a practical guide that can help you solve your problems fast. It is primarily intended for experienced machine learning engineers and data scientists who need to use deep learning to solve a problem. Aside from this chapter, which provides some of the terminology, frameworks, and background that we will need to get started, it's not meant to be read in order. Each chapter contains a practical example, complete with code and a few best practices and safe choices. We expect you to flip to the chapter you need and get started.

This book won't go deeply into the theory of deep learning and neural networks. There are many wonderful books that can provide that background, and I highly recommend that you read at least one of them (maybe a bibliography or just recommendations). We hope to provide just enough theory and mathematical intuition to get you started.

We will cover the following topics in this chapter:

  • Deep neural network architectures
  • Optimization algorithms for deep learning
  • Deep learning frameworks
  • Building datasets for deep learning

The deep neural network architectures

The deep neural network architectures can vary greatly in structure depending on the network's application, but they all have some basic components. In this section, we will talk briefly about those components.

In this book, I'll define a deep neural network as a network with more than a single hidden layer. Beyond that we won't attempt to limit the membership to the Deep Learning Club. As such, our networks might have less than 100 neurons, or possibly millions. We might use special layers of neurons, including convolutions and recurrent layers, but we will refer to all of these as neurons nonetheless.

Neurons

A neuron is the atomic unit of a neural network. This is sometimes inspired by biology; however, that's a topic for a different book. Neurons are typically arranged into layers. In this book, if I'm referring to a specific neuron, I'll use the notation where l is the layer the neuron is in and k is the neuron number. As we will be using programming languages that observe 0th notation, my notation will also be 0th based.

At their core, most neurons are composed of two functions that work together: a linear function and an activation function. Let us take a high-level look at those two components.

The neuron linear function

The first component of the neuron is a linear function whose output is the sum of the inputs, each multiplied by a coefficient. This function is really more or less a linear regression. These coefficients are typically referred to as weights in neural network speak. For example, given some neuron with the input features of x1, x2, and x3, and output z, this linear component or the neuron linear function would simply be:

Where are weights or coefficients that we will need to learn given the data and b is a bias term.

Neuron activation functions

The second function of the neuron is the activation function, which is tasked with introducing a nonlinearity between neurons. A commonly used activation is the sigmoid activation, which you may be familiar with from logistic regression. It squeezes the output of the neuron into an output space where very large values of z are driven to 1 and very small values of z are driven to 0.

The sigmoid function looks like this:

It turns out that the activation function is very important for intermediate neurons. Without it one could prove that a stack of neurons with linear activation's (which is really no activation, or more formally an activation function where z=z) is really just a single linear function.

A single linear function is undesirable in this case because there are many scenarios where our network may be under specified for the problem at hand. That is to say that the network can't model the data well because of non-linear relationships present in the data between the input features and target variable (what we're predicting).

The canonical example of a function that cannot be modeled with a linear function is the exclusive OR function, which is shown in the following figure:

Other common activation functions are the tanh function and the ReLu or Rectilinear Activation.

The hyperbolic tangent or the tanh function looks like this:

>

The tanh usually works better than sigmoid for intermediate layers. As you can probably see, the output of tanh will be between [-1, 1], whereas the output of sigmoid is [0, 1]. This additional width provides some resilience from a phenomenon known as the vanishing/exploding gradient problem, which we will cover in more detail later. For now, it's enough to know that the vanishing gradient problem can cause networks to converge very slowly in the early layers, if at all. Because of that, networks using tanh will tend to converge somewhat faster than networks that use sigmoid activation. That said, they are still not as fast as ReLu.

ReLu, or Rectilinear Activation, is defined simply as:

It's a safe bet and we will use it most of the time throughout this book. Not only is ReLu easy to compute and differentiate, it's also resilient against the vanishing gradient problem. The only drawback to ReLu is that it's first derivative is undefined at exactly 0. Variants including leaky ReLu, are computationally harder, but more robust against this issue.

For completeness, here's a somewhat obvious graph of ReLu:

The loss and cost functions in deep learning

Every machine learning model really starts with a cost function. Simply, a cost function allows you to measure how well your model is fitting the training data. In this book, we will define the loss function as the correctness of fit for a single observation within the training set. The cost function will then most often be an average of the loss across the training set. We will revisit loss functions later when we introduce each type of neural network; however, quickly consider the cost function for linear regression as an example:

In this case, the loss function would be , which is really the squared error. So then J, our cost function, is really just the mean squared error, or an average of the squared error across the entire dataset. The term 1/2 is added to make some of the calculus cleaner by convention.

The forward propagation process

Forward propagation is the process by which we attempt to predict our target variable using the features present in a single observation. Imagine we had a two-layer neural network. In the forward propagation process, we would start with the features present within that observation and then multiply those features by their associated coefficients within layer 1 and add a bias term for each neuron. After that, we would send that output to the activation for the neuron. Following that, the output would be sent to the next layer, and so on, until we reach the end of the network where we are left with our network's prediction:

>

The back propagation function

Once forward propagation is complete, we have the network's prediction for each data point. We also know that data point's actual value. Typically, the prediction is defined as while the actual value of the target variable is defined as y.

Once both y and are known, the network's error can be computed using the cost function. Recall that the cost function is the average of the loss function.

In order for learning to occur within the network, the network's error signal must be propagated backwards through the network layers from the last layer to the first. Our goal in back propagation is to propagate this error signal backwards through the network while using it to update the network weights as the signal travels. Mathematically, to do so we need to minimize the cost function by nudging the weights towards values that make the cost function the smallest. This process is called gradient descent.

The gradient is the partial derivative of the error function with respect to each weight within the network. The gradient of each weight can be calculated, layer by layer, using the chain rule and the gradients of the layers above.

Once the gradients of each layer are known, we can use the gradient descent algorithm to minimize the cost function.

The Gradient Descent will repeat this update until the network's error is minimized and the process has converged:

The gradient descent algorithm multiples the gradient by a learning rate called alpha and subtracts that value from the current value of each weight. The learning rate is a hyperparameter.

Stochastic and minibatch gradient descents

The algorithm describe in the previous section assumes a forward and corresponding backwards pass over the entire dataset and as such it's called batch gradient descent.

Another possible way to do gradient descent would be to use a single data point at a time, updating the network weights as we go. This method might help speed up convergence around saddle points where the network might stop converging. Of course, the error estimation of only a single point may not be a very good approximation of the error of the entire dataset.

The best solution to this problem is using mini batch gradient descent, in which we will take some random subset of the data called a mini batch to compute our error and update our network weights. This is almost always the best option. It has the additional benefit of naturally splitting a very large dataset into chunks that are more easily managed in the memory of a machine, or even across machines.

This is an extremely high-level description of one of the most important parts of a neural network, which we believe fits with the practical nature of this book. In practice, most modern frameworks handle these steps for us; however, they are most certainly worth knowing at least theoretically. We encourage the reader to go deeper into forward and backward propagation as time permits.

Optimization algorithms for deep learning

The gradient descent algorithm is not the only optimization algorithm available to optimize our network weights, however it's the basis for most other algorithms. While understanding every optimization algorithm out there is likely a PhD worth of material, we will devote a few sentences to some of the most practical.

Using momentum with gradient descent

Using gradient descent with momentum speeds up gradient descent by increasing the speed of learning in directions the gradient has been constant in direction while slowing learning in directions the gradient fluctuates in direction. It allows the velocity of gradient descent to increase.

Momentum works by introducing a velocity term, and using a weighted moving average of that term in the update rule, as follows:

Most typically is set to 0.9 in the case of momentum, and usually this is not a hyper-parameter that needs to be changed.

The RMSProp algorithm

RMSProp is another algorithm that can speed up gradient descent by speeding up learning in some directions, and dampening oscillations in other directions, across the multidimensional space that the network weights represent:

This has the effect of reducing oscillations more in directions where is large.

The Adam optimizer

Adam is one of the best performing known optimizer and it's my first choice. It works well across a wide variety of problems. It combines the best parts of both momentum and RMSProp into a single update rule:

Where is some very small number to prevent division by 0.

Adam is often a great choice, and it's a great place to start when you're prototyping, so save yourself some time by starting with Adam.

Deep learning frameworks

While it's most certainly possible to build and train deep neural networks from scratch using just Python's numpy, that would take a great deal of time and code. It's far more practical, in almost every case, to use a deep learning framework.

Throughout this book we will be using TensorFlow and Keras to make developing deep neural networks much easier and faster.

What is TensorFlow?

TensorFlow is a library that can be used to quickly build deep neural networks. In TensorFlow, the mathematical operations that we've covered thus far are expressed as nodes. The edges between these nodes are tensors, or multidimensional data arrays. TensorFlow can, given a neural network defined as a graph and a loss function, automatically compute gradients for the network and optimize the graph to minimize the loss function.

TensorFlow was released as an open source project by Google in 2015. Since then it has gained a very large following and enjoys a large user community. While TensorFlow provides APIs in Java, C++, Go, and Python, we will only be covering the Python API. The Python API is used in this book because it's both the most commonly used, and the API most commonly used for the development of new models.

TensorFlow can greatly accelerate computation by performing those calculations on one or more Graphics Processing Units. The acceleration that GPU computation provides has become a necessity in modern deep learning.

What is Keras?

While building deep neural networks in TensorFlow is far easier than doing it from scratch, TensorFlow is still a very low-level API. Keras is a high-level API that allows us to use TensorFlow (or alternatively Theano or Microsoft's CNTK) to rapidly build deep learning networks.

Models built in Keras and TensorFlow are portable and can be trained or served in native TensorFlow as well. Models constructed in TensorFlow can be loaded into Keras and used there as well.

Popular alternatives to TensorFlow

There are many other great deep learning frameworks out there. We chose Keras and TensorFlow primarily because of popularity, ease of use, availability for support, and readiness for production deployments. There are undoubtedly other worthy alternatives.

Some of my favorites alternatives to TensorFlow include:

While I do strongly believe that Keras and TensorFlow are the correct choices for this book, I also want to acknowledge these great frameworks and the contributions to the field that each project has made.

GPU requirements for TensorFlow and Keras

For the remainder of the book, we will be using Keras and TensorFlow. Most of the examples we will be exploring require a GPU for acceleration. Most modern deep learning frameworks, including TensorFlow, use GPUs to greatly accelerate the vast amount of calculations required during network training. Without a GPU, the training time of most of the models we discuss will be unreasonably long.

If you don't have a computer with a GPU installed, GPU-based compute instances can be rented by the second from a variety of cloud providers including Amazon's Amazon Web Services and Google's Google Cloud Platform. For the examples in this book, we will be using a p2.xlarge instance in Amazon EC2 running Ubuntu Server 16.04. The p2.xlarge instance provides an Nvidia Tesla K80 GPU with 2,496 CUDA cores, which will make running the models we show in this book much faster than what is achievable on even very high end desktop computers.

Installing Nvidia CUDA Toolkit and cuDNN

Since you'll likely be using a cloud based solution for your deep learning work, I've included instructions that will get you up and running fast on Ubuntu Linux, which is commonly available across cloud providers. It's also possible to install TensorFlow and Keras on Windows. As of TensorFlow v1.2, TensorFlow unfortunately does not support GPUs on OS X.

Before we can utilize the GPU, the NVidia CUDA Toolkit and cuDNN must be installed. We will be installing CUDA Toolkit 8.0 and cuDNN v6.0, which are recommended for use with TensorFlow v1.4. There is a good chance that a new version will be released before you finish reading this paragraph, so check www.tensorflow.org for the latest required versions.

We will start by installing the build-essential package on Ubuntu, which contains most of what we need to compile C++ programs. The code is given here:

sudo apt-get update
sudo apt-get install build-essential

Next, we can download and install CUDA Toolkit. As previously mentioned, we will be installing version 8.0 and it's associated patch. You can find the CUDA Toolkit that is right for you at https://developer.nvidia.com/cuda-zone.

wget https://developer.nvidia.com/compute/cuda/8.0/Prod2/local_installers/cuda_8.0.61_375.26_linux-run
sudo sh cuda_8.0.61_375.26_linux-run # Accept the EULA and choose defaults
wget https://developer.nvidia.com/compute/cuda/8.0/Prod2/patches/2/cuda_8.0.61.2_linux-run
sudo sh cuda_8.0.61.2_linux-run # Accept the EULA and choose defaults

The CUDA Toolkit should now be installed in the following path: /usr/local/cuda. You'll need to add a few environment variables so that TensorFlow can find it. You should probably consider adding these environment variables to ~/.bash_profile, so that they're set at every login, as shown in the following code:

export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64"
export CUDA_HOME="/usr/local/cuda"

At this point, you can test that everything is working by executing the following command: nvidia-smi. The output should look similar to this:

$nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.26 Driver Version: 375.26 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 0000:00:1E.0 Off | 0 |
| N/A 41C P0 57W / 149W | 0MiB / 11439MiB | 99% Default |
+-------------------------------+----------------------+----------------------+

Lastly, we need to install cuDNN, which is the NVIDIA CUDA Deep Neural Network library.

First, download cuDNN to your local computer. To do so, you will need to register as a developer in the NVIDIA Developer Network. You can find cuDNN at the cuDNN homepage at https://developer.nvidia.com/cuDNN. Once you have downloaded it to your local computer, you can use scp to move it to your EC2 instance. While exact instructions will vary by cloud provider you can find additional information about connecting to AWS EC2 via SSH/SCP at https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html.

Once you've moved cuDNN to your EC2 image, you can unpack the file, using the following code:

tar -xzvf cudnn-8.0-linux-x64-v6.0.tgz

Finally, copy the unpacked files to their appropriate locations, using the following code:

sudo cp cuda/include/cudnn.h /usr/local/cuda/include/
sudo cp cuda/lib64/* /usr/local/cuda/lib64
It's unclear to me why CUDA and cuDNN are distributed separately and why cuDNN requires registrations. The overly complicated download process and manual installation of cuDNN is really one of the greatest mysteries in deep learning.

Installing Python

We will be using virtualenv to create an isolated Python virtual environment. While this isn't strictly necessary, it's an excellent practice. By doing so, we will keep all our Python libraries for this project in a separate isolated environment that won't interfere with the system Python installation. Additionally, virtualenv environments will make it easier to package and deploy our deep neural networks later on.

Let's start by installing Python, pip, and virtualenv, using the aptitude package manager in Ubuntu. The following is the code:

sudo apt-get install python3-pip python3-dev python-virtualenv

Now we can create a virtual environment for our work. We will be keeping all our virtual environment files in a folder called ~/deep-learn. You are free to choose any name you wish for this virtual environment. The following code shows how to create a virtual environment:

virtualenv --no-site-packages -p python3 ~/deep-learn
If you're an experienced Python developer, you might have noticed that I've set up the environment to default to Python 3.x. That's most certainly not required, and TensorFlow / Keras both support Python 2.7. That said, the author feels a moral obligation to the Python community to support modern versions of Python.

Now that the virtual environment has been created, you can activate it as follows:

$source ~/deep-learn/bin/activate
(deep-learn)$ # notice the shell changes to indicate the virtualenv
At this point, every time you log in you will need to activate the virtual environment you want to work in. If you would like to always enter the virtual environment you just created, you can add the source command to ~/.bash_profile.

Now that we've configured our virtual environment, we can add Python packages as required within it. To start, let's make sure we have the latest version of pip, the Python package manager:

easy_install -U pip

Lastly, I recommend installing IPython, which is an interactive Python shell that makes development much easier.

pip install ipython

And that's it. Now we're ready to install TensorFlow and Keras.

Installing TensorFlow and Keras

After everything we've just been through together, you'll be pleased to see how straightforward installing TensorFlow and Keras now is.

Let's start with installing TensorFlow

The installation of TensorFlow can be done using the following code:

pip install --upgrade tensorflow-gpu 
Be sure to pip install tensorflow-gpu. If you pip install TensorfFow (without -gpu), you will install the CPU-only version.

Before we install Keras, let's test our TensorFlow installation. To do this, I'll be using some sample code from the TensorfFow website and the IPython interpreter.

Start the IPython interpreter by typing IPython at the bash prompt. Once IPython has started, let's attempt to import TensorFlow. The output would look like the following:

In [1]: import tensorflow as tf
In [2]:

If importing TensorFlow results in an error, troubleshoot the steps you have followed so far. Most often when TensorFlow cannot be imported, the CUDA or cuDNN might not be installed correctly.

Now that we've successfully installed TensorFlow, we will run a tiny bit of code in IPython that will verify we can run computations on the GPU:

a = tf.constant([1.0,</span> 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b
= tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c
= tf.matmul(a, b)
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
print(sess.run(c))

If everything goes as we hope, we will see lots of indications that our GPU is being used. I have included some output here and highlighted the evidence to draw your attention to it. Your output will likely be different based on hardware, but you should see similar evidence the one shown here:

/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7
MatMul: (MatMul): /job:localhost/replica:0/task:0/device:GPU:0
: I tensorflow/core/common_runtime/placer.cc:874] MatMul: (MatMul)/job:localhost/replica:0/task:0/device:GPU:0
b: (Const): /job:localhost/replica:0/task:0/device:GPU:0
: I tensorflow/core/common_runtime/placer.cc:874] b: (Const)/job:localhost/replica:0/task:0/device:GPU:0
a: (Const): /job:localhost/replica:0/task:0/device:GPU:0
: I tensorflow/core/common_runtime/placer.cc:874] a: (Const)/job:localhost/replica:0/task:0/device:GPU:0
[[ 22. 28.]
[ 49. 64.]]

In the preceding output, we can see that tensors a and b, as well as the matrix multiplication operation, were assigned the the GPU. If there was a problem with accessing the GPU, the output might look as follows:

I tensorflow/core/common_runtime/placer.cc:874] b_1: (Const)/job:localhost/replica:0/task:0/device:CPU:0
a_1: (Const): /job:localhost/replica:0/task:0/device:CPU:0
I tensorflow/core/common_runtime/placer.cc:874] a_1: (Const)/job:localhost/replica:0/task:0/device:CPU:0

Here we can see the tensors b_1 and a_1 were assigned to the CPU rather than the GPU. If this happens there is a problem with your installation of TensorFlow, CUDA, or cuDNN.

If you've made it this far, you have a working installation of TensorFlow. The only remaining task is to install Keras.

The installation of Keras can be done with the help of the following code:

pip install keras

And that's it! Now we're ready to build deep neural networks in Keras and TensorFlow.

This might be a great time to create a snapshot or even an AMI of your EC2 instance, so that you don't have to go through this installation again.

Building datasets for deep learning

Compared to other predictive models that you might have used, deep neural networks are very complicated. Consider a network with 100 inputs, two hidden layers with 30 neurons each, and a logistic output layer. That network would have 3,930 learnable parameters as well as the hyperparameters needed for optimization, and that's a very small example. A large convolutional neural network will have hundreds of millions of learnable parameters. All these parameters are what make deep neural networks so amazing at learning structures and patterns. However, this also makes overfitting possible.

Bias and variance errors in deep learning

You may be familiar with the so-called bias/variance trade-off in typical predictive models. In case you're not, we'll provide a quick reminder here. With traditional predictive models, there is usually some compromise when we try to find an error from bias and an error from variance. So let's see what these two errors are:

  • Bias error: Bias error is the error that is introduced by the model. For example, if you attempted to model a non-linear function with a linear model, your model would be under specified and the bias error would be high.
  • Variance error: Variance error is the error that is introduced by randomness in the training data. When we fit our training distribution so well that our model no longer generalizes, we have overfit or introduce a variance error.

In most machine learning applications, we seek to find some compromise that minimizes bias error, while introducing as little variance error as possible. I say most because one of the great things about deep neural networks is that, for the most part, bias and variance can be manipulated independently of one another. However, to do so, we will need to be very careful with how we structure our training data.

The train, val, and test datasets

For the rest of the book, I will be structuring my data into three separate sets that I'll refer to as train, val, and test. These three separate datasets, drawn as random samples from the total dataset will be structured and sized approximately like this.

The train dataset will be used for training the network, as expected.

The val dataset, or the validation dataset, will be used to find ideal hyperparameters, and to measure overfitting. At the end of an epoch, which is when the network has has the opportunity to observe every data point in the training set, we will make a prediction on the val set. That prediction will be used to watch for overfitting and will help us know when the network has finished training. Using the val set at the end of each epoch like this somewhat differs from the typical usage. For more information on Hold-Out Validation please reference The Elements of Statistical Learning by Hastie and Tibshirani (https://web.stanford.edu/~hastie/ElemStatLearn/).

The test dataset will be used once all training is complete, to accurately measure model performance on a set of data that the network hasn't seen.

It is very important that the val and test data comes from the same datasets. It is less important that the train dataset matches val and test, although that is still ideal. If image augmentation were being used (performing minor modifications to training images in an attempt to amplify the training set size) for example, the training set distribution may no longer match the val set distribution. This is acceptable and network performance can be adequately measured as long as val and test are from the same distribution.

In traditional machine learning applications it's somewhat customary to use 10-20 percent of the available data for val and test. In deep neural networks it's often the case that our data volume is so large that we can adequately measure network performance with much smaller val and test sets. When data volume goes into the 10s of millions of observations, a 98 percent, 1 percent, 1 percent split may be completely appropriate.

Managing bias and variance in deep neural networks

Now that we've defined how we will structure data and refreshed ourselves on bias and variance, let's consider how we will control bias and variance errors in our deep neural networks.

  • High bias: A network with high bias will have a very high error rate when predicting on the training set. The model is not doing well at fitting the data. In order to reduce the bias you will likely need to change the network architecture. You may need to add layers, neurons, or both. It may be that your problem is better solved using a convolutional or recurrent network.

Of course, sometimes a problem is high bias because of a lack of signal or very difficult problem, so be sure to calibrate your expectations on a reasonable rate (I like to start by calibrating on human accuracy).

  • High variance: A network with a low bias error is fitting the training data well; however, if the validation error is greater than the test error the network has begun to overfit the training data. The two best ways to reduce variance are by adding data and adding regularization to the network.

Adding data is straightforward but not always possible. Throughout the book, we will cover regularization techniques as they apply. The most common regularization techniques we will talk about are L2 regularization, dropout, and batch normalization.

K-Fold cross-validation

If you're experienced with machine learning, you may be wondering why I would opt for Hold-Out (train/val/test) validation over K-Fold cross-validation. Training a deep neural network is a very expensive operation, and put very simply, training K of them per set of hyperparameters we'd like to explore is usually not very practical.

We can be somewhat confident that Hold-Out validation does a very good job, given a large enough val and test set. Most of the time, we are hopefully applying deep learning in situations where we have an abundance of data, resulting in an adequate val and test set.

Ultimately, it's up to you. As we will see later, Keras provides a scikit-learn interface that allows Keras models to be integrated into a scikit-learn pipeline. This allows us to perform K-Fold, Stratified K-Fold, and even grid searches with K-Fold. It's both possible and appropriate to sometimes use K-Fold CV in training deep models. That said, for the rest of the book we will focus on the using Hold-Out validation.

Summary

Hopefully, this chapter served to refresh your memory on deep neural network architectures and optimization algorithms. Because this is a quick reference we didn't go into much detail and I'd encourage the reader to dig deeper into any material here that might be new or unfamiliar.

We talked about the basics of Keras and TensorFlow and why we chose those frameworks for this book. We also talked about the installation and configuration of CUDA, cuDNN, Keras, and TensorFlow.

Lastly, we covered the Hold-Out validation methodology we will use throughout the remainder of the book and why we prefer it to K-Fold CV for most deep neural network applications.

We will be referring back to this chapter quite a bit as we revisit these topics in the chapters to come. In the next chapter, we will start using Keras to solve regression problems, as a first step into building deep neural networks.

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • A quick reference to all important deep learning concepts and their implementations
  • Essential tips, tricks, and hacks to train a variety of deep learning models such as CNNs, RNNs, LSTMs, and more
  • Supplemented with essential mathematics and theory, every chapter provides best practices and safe choices for training and fine-tuning your models in Keras and Tensorflow.

Description

Deep learning has become an essential necessity to enter the world of artificial intelligence. With this book deep learning techniques will become more accessible, practical, and relevant to practicing data scientists. It moves deep learning from academia to the real world through practical examples. You will learn how Tensor Board is used to monitor the training of deep neural networks and solve binary classification problems using deep learning. Readers will then learn to optimize hyperparameters in their deep learning models. The book then takes the readers through the practical implementation of training CNN's, RNN's, and LSTM's with word embeddings and seq2seq models from scratch. Later the book explores advanced topics such as Deep Q Network to solve an autonomous agent problem and how to use two adversarial networks to generate artificial images that appear real. For implementation purposes, we look at popular Python-based deep learning frameworks such as Keras and Tensorflow, Each chapter provides best practices and safe choices to help readers make the right decision while training deep neural networks. By the end of this book, you will be able to solve real-world problems quickly with deep neural networks.

Who is this book for?

If you are a Data Scientist or a Machine Learning expert, then this book is a very useful read in training your advanced machine learning and deep learning models. You can also refer this book if you are stuck in-between the neural network modeling and need immediate assistance in getting accomplishing the task smoothly. Some prior knowledge of Python and tight hold on the basics of machine learning is required.

What you will learn

  • Solve regression and classification challenges with TensorFlow and Keras
  • Learn to use Tensor Board for monitoring neural networks and its training
  • Optimize hyperparameters and safe choices/best practices
  • Build CNN s, RNN s, and LSTM s and using word embedding from scratch
  • Build and train seq2seq models for machine translation and chat applications.
  • Understanding Deep Q networks and how to use one to solve an autonomous agent problem.
  • Explore Deep Q Network and address autonomous agent challenges.

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Mar 09, 2018
Length: 272 pages
Edition : 1st
Language : English
ISBN-13 : 9781788837996
Vendor :
Google
Category :
Languages :
Concepts :
Tools :

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details

Publication date : Mar 09, 2018
Length: 272 pages
Edition : 1st
Language : English
ISBN-13 : 9781788837996
Vendor :
Google
Category :
Languages :
Concepts :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 131.97
Deep Learning By Example
$43.99
TensorFlow: Powerful Predictive Analytics with TensorFlow
$43.99
Deep Learning Quick Reference
$43.99
Total $ 131.97 Stars icon
Banner background image

Table of Contents

14 Chapters
The Building Blocks of Deep Learning Chevron down icon Chevron up icon
Using Deep Learning to Solve Regression Problems Chevron down icon Chevron up icon
Monitoring Network Training Using TensorBoard Chevron down icon Chevron up icon
Using Deep Learning to Solve Binary Classification Problems Chevron down icon Chevron up icon
Using Keras to Solve Multiclass Classification Problems Chevron down icon Chevron up icon
Hyperparameter Optimization Chevron down icon Chevron up icon
Training a CNN from Scratch Chevron down icon Chevron up icon
Transfer Learning with Pretrained CNNs Chevron down icon Chevron up icon
Training an RNN from scratch Chevron down icon Chevron up icon
Training LSTMs with Word Embeddings from Scratch Chevron down icon Chevron up icon
Training Seq2Seq Models Chevron down icon Chevron up icon
Using Deep Reinforcement Learning Chevron down icon Chevron up icon
Generative Adversarial Networks Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Half star icon 4.5
(6 Ratings)
5 star 83.3%
4 star 0%
3 star 0%
2 star 16.7%
1 star 0%
Filter icon Filter
Top Reviews

Filter reviews by




Sara May 21, 2018
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I found this book to be one of the more comprehensive but still easily 'readable' texts on practical application of deep learning techniques. The project and code examples are practical, and display a variety of useful deep learning techniques. This will be a book I will likely reference again and again for tips and reminders as I work on deep learning projects both at work and in hobby coding.While there are a lot of exciting ideas and usecases presented in the book, the one I found most useful was one of the more 'humble' ideas, at least mathematically speaking... I really appreciated the deep dive into TensorBoard, and showing some really useful things to do with that particular tool. It showed some things that aren't always necessarily obvious to do from the documentation. The information presented will help make TensorBoard a much more prominent tool in my deep learning development process than it has been so far.
Amazon Verified review Amazon
Ms. One More Time Aug 06, 2018
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Love all the code in the book. Almost every bit of it worked without bugs. Almost every book of this type I have read had weak code that was often broken. This book was ten stars in this area.The level of text details was just perfect for me. I am past the beginner stage but not by much. I now feel after reading and running I can venture into kaggle datasets using Keras.I really loved learning about Tensorboard. It really helps me see if my model is too stupid or a real over fitter.
Amazon Verified review Amazon
Shannon Apr 20, 2018
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book has really accelerated my education in deep learning with a wide range of case studies that helped me understand how and when to use different deep learning solutions. It comes with a great git repo with the code for each chapter, so I could follow the code along with the text. I’ve taken deep learning MOOCs where I implemented neural networks in numpy and spent a lot of time on the math and theory, but this book has a variety of practical examples that make this topic a lot more accessible.A nice constant in this book is that it stays at a high level, but provides great references along the way if I want to take a deeper dive into a topic later on. This helped me stay focused to start implementing code, rather than diving into complex theories and sending me into endless Wikipedia loops.The author has a really approachable voice; he explains things clearly with humor and without condescension. I am particularly excited to use this book’s example on LSTMs to build a predictive model on cryptocurrency pricing to make better trading decisions!
Amazon Verified review Amazon
muuh Mar 16, 2019
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I am interested in applying machine learning and deep learning in practice, not so much in learning the details of the math and writing all the algorithms from scratch myself. I have taken various online courses from Coursera, Udemy, etc. Those tend to go either into the math at a level I am not interested in, or scratch the surface with very basic examples.This book provides foundations to understand what the different types of models (LSTM, CNN, MLP) are based on, and illustrates each with various realistic case studies and models, along with explanations. For me this provided a great way to get the overall idea, build and understanding, and apply the models to my data and problems. Sometimes I did go looking for a bit more in-depth understanding on how each model works, but the explanations in this book are the ones that enabled me to do that.Also having all the code available (and working) on Github is great for quick reference. Just have to remember to look for "deep learning reference github", and can quickly find myself a reminder on how to write some specific code.
Amazon Verified review Amazon
Winston Li Apr 20, 2018
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I have an ebook version of this book, which I very much enjoyed reading. First of all, I really appreciate the author’s effort in nicely and properly formatting the equations and code snippets so that they are reader-friendly on kindle e-readers. It’s usually difficult to find a math/programming related ebook formatted well.As an analytics professional, I like the breadth of the book which covers almost all important types of deep learning frameworks and the usecases of each one. It does not focus only on Deep Learning, but also gives great overviews of more general backgrounds, such as time series for recurrent neural networks and NLP for LSTM/seq2seq models. The reader will have a full picture of the fields without losing the main concentration on Deep Learning. At the same time, the book keeps the depth of the content at the right amount so that the reader can quickly grasp the concept but also have sufficient math to get started with implementation and see how things play out. In addition, the book goes gradually from simple but foundational blocks to advanced models, which I find very easy to digest despite of the complexity of most the advanced topics.This book is a great starting point for people who are familiar with programming and some basic machine learning concepts, but want to know more about deep learning, especially what the state of the art looks like. It is also a good reference book for professionals who are familiar with the concepts but want to double check on how Deep Learning frameworks are set up in Tensorflow or Keras. The book comes with a lot of code examples, and also complete data and tutorial code downloadable through github. I personally like the fact that the book also covers how to locally set up GPU for deep learning, which is such an important topic yet a lot of deep learning books easily skip.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is included in a Packt subscription? Chevron down icon Chevron up icon

A subscription provides you with full access to view all Packt and licnesed content online, this includes exclusive access to Early Access titles. Depending on the tier chosen you can also earn credits and discounts to use for owning content

How can I cancel my subscription? Chevron down icon Chevron up icon

To cancel your subscription with us simply go to the account page - found in the top right of the page or at https://subscription.packtpub.com/my-account/subscription - From here you will see the ‘cancel subscription’ button in the grey box with your subscription information in.

What are credits? Chevron down icon Chevron up icon

Credits can be earned from reading 40 section of any title within the payment cycle - a month starting from the day of subscription payment. You also earn a Credit every month if you subscribe to our annual or 18 month plans. Credits can be used to buy books DRM free, the same way that you would pay for a book. Your credits can be found in the subscription homepage - subscription.packtpub.com - clicking on ‘the my’ library dropdown and selecting ‘credits’.

What happens if an Early Access Course is cancelled? Chevron down icon Chevron up icon

Projects are rarely cancelled, but sometimes it's unavoidable. If an Early Access course is cancelled or excessively delayed, you can exchange your purchase for another course. For further details, please contact us here.

Where can I send feedback about an Early Access title? Chevron down icon Chevron up icon

If you have any feedback about the product you're reading, or Early Access in general, then please fill out a contact form here and we'll make sure the feedback gets to the right team. 

Can I download the code files for Early Access titles? Chevron down icon Chevron up icon

We try to ensure that all books in Early Access have code available to use, download, and fork on GitHub. This helps us be more agile in the development of the book, and helps keep the often changing code base of new versions and new technologies as up to date as possible. Unfortunately, however, there will be rare cases when it is not possible for us to have downloadable code samples available until publication.

When we publish the book, the code files will also be available to download from the Packt website.

How accurate is the publication date? Chevron down icon Chevron up icon

The publication date is as accurate as we can be at any point in the project. Unfortunately, delays can happen. Often those delays are out of our control, such as changes to the technology code base or delays in the tech release. We do our best to give you an accurate estimate of the publication date at any given time, and as more chapters are delivered, the more accurate the delivery date will become.

How will I know when new chapters are ready? Chevron down icon Chevron up icon

We'll let you know every time there has been an update to a course that you've bought in Early Access. You'll get an email to let you know there has been a new chapter, or a change to a previous chapter. The new chapters are automatically added to your account, so you can also check back there any time you're ready and download or read them online.

I am a Packt subscriber, do I get Early Access? Chevron down icon Chevron up icon

Yes, all Early Access content is fully available through your subscription. You will need to have a paid for or active trial subscription in order to access all titles.

How is Early Access delivered? Chevron down icon Chevron up icon

Early Access is currently only available as a PDF or through our online reader. As we make changes or add new chapters, the files in your Packt account will be updated so you can download them again or view them online immediately.

How do I buy Early Access content? Chevron down icon Chevron up icon

Early Access is a way of us getting our content to you quicker, but the method of buying the Early Access course is still the same. Just find the course you want to buy, go through the check-out steps, and you’ll get a confirmation email from us with information and a link to the relevant Early Access courses.

What is Early Access? Chevron down icon Chevron up icon

Keeping up to date with the latest technology is difficult; new versions, new frameworks, new techniques. This feature gives you a head-start to our content, as it's being created. With Early Access you'll receive each chapter as it's written, and get regular updates throughout the product's development, as well as the final course as soon as it's ready.We created Early Access as a means of giving you the information you need, as soon as it's available. As we go through the process of developing a course, 99% of it can be ready but we can't publish until that last 1% falls in to place. Early Access helps to unlock the potential of our content early, to help you start your learning when you need it most. You not only get access to every chapter as it's delivered, edited, and updated, but you'll also get the finalized, DRM-free product to download in any format you want when it's published. As a member of Packt, you'll also be eligible for our exclusive offers, including a free course every day, and discounts on new and popular titles.