You're reading from Java Deep Learning Projects Implement 10 real-world deep learning applications using Deeplearning4j and open source APIs

Product type Paperback

Published in Jun 2018

Publisher Packt

ISBN-13 9781788997454

Length 436 pages

Edition 1st Edition

Languages

Java

Tools

Deeplearning4j

Concepts

Deep Learning

Author (1):

Md. Rezaul Karim

View More author details

Neural network architectures

There are various types of architectures in neural networks. We can categorize DL architectures into four groups: Deep Neural Networks (DNNs), Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Emergent Architectures (EAs).

Nowadays, based on these architectures, researchers come up with so many variants of these for domain-specific use cases and research problems. The following sections of this chapter will give a brief introduction to these architectures. More detailed analysis, with examples of applications, will be the subject of later chapters of this book.

Deep neural networks

DNNs are neural networks having complex and deeper architecture with a large number of neurons in each layer, and there are many connections. The computation in each layer transforms the representations in the subsequent layers into slightly more abstract representations. However, we will use the term DNN to refer specifically to the MLP, the Stacked Auto-Encoder (SAE), and Deep Belief Networks (DBNs).

SAEs and DBNs use AEs and Restricted Boltzmann Machines (RBMs) as building blocks of the architectures. The main difference between these and MLPs is that training is executed in two phases: unsupervised pre-training and supervised fine-tuning.

SAE and DBN using AE and RBM respectively

In unsupervised pre-training, shown in the preceding diagram, the layers are stacked sequentially and trained in a layer-wise manner, like an AE or RBM using unlabeled data. Afterwards, in supervised fine-tuning, an output classifier layer is stacked and the complete neural network is optimized by retraining with labeled data.

Multilayer Perceptron

As discussed earlier, a single perceptron is even incapable of approximating an XOR function. To overcome this limitation, multiple perceptrons are stacked together as MLPs, where layers are connected as a directed graph. This way, the signal propagates one way, from input layer to hidden layers to output layer, as shown in the following diagram:

An MLP architecture having an input layer, two hidden layers, and an output layer

Fundamentally, an MLP is one the most simple FFNNs having at least three layers: an input layer, a hidden layer, and an output layer. An MLP was first trained with a backpropogation algorithm in the 1980s.

Deep belief networks

To overcome the overfitting problem in MLPs, the DBN was proposed by Hinton et al. It uses a greedy, layer-by-layer, pre-training algorithm to initialize the network weights through probabilistic generative models.

DBNs are composed of a visible layer and multiple layers—hidden units. The top two layers have undirected, symmetric connections in between and form an associative memory, whereas lower layers receive top-down, directed connections from the preceding layer. The building blocks of a DBN are RBMs, as you can see in the following figure, where several RBMs are stacked one after another to form DBNs:

A DBN configured for semi-supervised learning

A single RBM consists of two layers. The first layer is composed of visible neurons, and the second layer consists of hidden neurons. Figure 16 shows the structure of a simple RBM, where the neurons are arranged according to a symmetrical bipartite graph:

RBM architecture

In DBNs, an RBM is trained first with input data, called unsupervised pre-training, and the hidden layer represents the features learned using a greedy learning approach called supervised fine-tuning. Despite numerous successes, DBNs are being replaced by AEs.

Autoencoders

An AE is a network with three or more layers, where the input layer and the output layer have the same number of neurons, and those intermediate (hidden layers) have a lower number of neurons. The network is trained to reproduce in the output, for each piece of input data, the same pattern of activity as in the input.

Useful applications of AEs are data denoising and dimensionality reduction for data visualization. The following diagram shows how an AE typically works. It reconstructs the received input through two phases: an encoding phase, which corresponds to a dimensional reduction for the original input, and a decoding phase, which is capable of reconstructing the original input from the encoded (compressed) representation:

Encoding and decoding phases of an AE

Convolutional neural networks

CNNs have achieved much and wide adoption in computer vision (for example, image recognition). In CNN networks, the connection scheme that defines the convolutional layer (conv) is significantly different compared to an MLP or DBN.

Importantly, a DNN has no prior knowledge of how the pixels are organized; it does not know that nearby pixels are close. A CNN's architecture embeds this prior knowledge. Lower layers typically identify features in small areas of the image, while higher layers combine lower-level features into larger features. This works well with most natural images, giving CNNs a decisive head start over DNNs:

A regular DNN versus a CNN

Take a close look at the preceding diagram; on the left is a regular three-layer neural network, and on the right, a CNN arranges its neurons in three dimensions (width, height, and depth). In a CNN architecture, a few convolutional layers are connected in a cascade style, where each layer is followed by a ReLU layer, then a pooling layer, then a few more convolutional layers (+ReLU), then another pooling layer, and so on.

The output from each conv layer is a set of objects called feature maps that are generated by a single kernel filter. Then the feature maps can be used to define a new input to the next layer. Each neuron in a CNN network produces an output followed by an activation threshold, which is proportional to the input and not bound. This type of layer is called a convolutional layer. The following diagram is a schematic of the architecture of a CNN used for facial recognition:

A schematic architecture of a CNN used for facial recognition

Recurrent neural networks

A recurrent neural network (RNN) is a class of artificial neural network (ANN) where connections between units form a directed cycle. RNN architecture was originally conceived by Hochreiter and Schmidhuber in 1997. RNN architectures have standard MLPs plus added loops (as shown in the following diagram), so they can exploit the powerful nonlinear mapping capabilities of the MLP; and they have some form of memory:

RNN architecture

The preceding image shows a a very basic RNN having an input layer, 2 recurrent layers and an output layer. However, this basic RNN suffers from gradient vanishing and exploding problem and cannot model the long-term depedencies. Therefore, more advanced architectures are designed to utilize sequential information of input data with cyclic connections among building blocks such as perceptrons. These architectures include Long-Short-Term Memory (LSTM), Gated Recurrent Units (GRUs), Bidirectional-LSTM and other variants.

Consequently, LSTM and GR can overcome the drawbacks of regular RNNs: gradient vanishing/exploding problem and the long-short term dependency. We will look at these architectures in chapter 2.

Emergent architectures

Many other emergent DL architectures have been suggested, such as Deep SpatioTemporal Neural Networks (DST-NNs), Multi-Dimensional Recurrent Neural Networks (MD-RNNs), and Convolutional AutoEncoders (CAEs).

Nevertheless, there are a few more emerging networks, such as CapsNets (which is an improved version of a CNN, designed to remove the drawbacks of regular CNNs), RNN for image recognition, and Generative Adversarial Networks (GANs) for simple image generation. Apart from these, factorization machines for personalization and deep reinforcement learning are also being used widely.

Residual neural networks

Since there are sometimes millions of billions of hyperparameters and other practical aspects, it's really difficult to train deeper neural networks. To overcome this limitation, Kaiming He et al. (see https://arxiv.org/abs/1512.03385v1) proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously.

They also explicitly reformulated the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. This way, these residual networks are easier to optimize and can gain accuracy from considerably increased depth.

The downside is that building a network by simply stacking residual blocks inevitably limits its optimization ability. To overcome this limitation, Ke Zhang et al. also proposed using a Multilevel Residual Network (https://arxiv.org/abs/1608.02908).

Generative adversarial networks

GANs are deep neural net architectures that consist of two networks pitted against each other (hence the name "adversarial"). Ian Goodfellow et al. introduced GANs in a paper (see more at https://arxiv.org/abs/1406.2661v1). In GANs, the two main components are the generator and discriminator.

Working principle of Generative Adversarial Networks (GANs)

The Generator will try to generate data samples out of a specific probability distribution, which is very similar to the actual object. The discriminator will judge whether its input is coming from the original training set or from the generator part.

Capsule networks

CNNs perform well at classifying images. However, if the images have rotation, tilt, or any other different orientation, then CNNs show relatively very poor performance. Even the pooling operation in CNNs cannot much help against the positional invariance.

This issue in CNNs has led us to the recent advancement of CapsNet through the paper titled Dynamic Routing Between Capsules (see more at https://arxiv.org/abs/1710.09829) by Geoffrey Hinton et al.

Unlike a regular DNN, where we keep on adding layers, in CapsNets, the idea is to add more layers inside a single layer. This way, a CapsNet is a nested set of neural layers. We'll discuss more in Chapter 11, Discussion, Current Trends, and Outlook.