Java Deep Learning Cookbook

Introduction to Deep Learning in Java

Let's discuss various deep learning libraries so as to pick the best for the purpose at hand. This is a context-dependent decision and will vary according to the situation. In this chapter, we will start with a brief introduction to deep learning and explore how DL4J is a good choice for solving deep learning puzzles. We will also discuss how to set up DL4J in your workspace.

In this chapter, we will cover the following recipes:

Deep learning intuition
Determining the right network type to solve deep learning problems
Determining the right activation function
Combating overfitting problems
Determining the right batch size and learning rates
Configuring Maven for DL4J
Configuring DL4J for a GPU-accelerated environment
Troubleshooting installation issues

Deep learning intuition

If you're a newbie to deep learning, you may be wondering how exactly it is differs from machine learning; or is it the same? Deep learning is a subset of the larger domain of machine learning. Let's think about this in the context of an automobile image classification problem:

As you can see in the preceding diagram, we need to perform feature extraction ourselves as legacy machine learning algorithms cannot do that on their own. They might be super-efficient with accurate results, but they cannot learn signals from data. In fact, they don't learn on their own and still rely on human effort:

On the other hand, deep learning algorithms learn to perform tasks on their own. Neural networks under the hood are based on the concept of deep learning and it trains on their own to optimize the results. However, the final decision process is hidden and cannot be tracked. The intent of deep learning is to imitate the functioning of a human brain.

Backpropagation

The backbone of a neural network is the backpropagation algorithm. Refer to the sample neural network structure shown as follows:

For any neural network, data flows from the input layer to the output layer during the forward pass. Each circle in the diagram represents a neuron. Every layer has a number of neurons present. Our data will pass through the neurons across layers. The input needs to be in a numerical format to support computational operations in neurons. Each neuron in the neural network is assigned a weight (matrix) and an activation function. Using the input data, weight matrix, and an activation function, a probabilistic value is generated at each neuron. The error (that is, a deviation from the actual value) is calculated at the output layer using a loss function. We utilize the loss score during the backward pass (that is, from the output layer to the input layer ) by reassigning weights to the neurons to reduce the loss score. During this stage, some output layer neurons will be assigned with high weights and vice versa depending upon the loss score results. This process will continue backward as far as the input layer by updating the weights of neurons. In a nutshell, we are tracking the rate of change of loss with respect to the change in weights across all neurons. This entire cycle (a forward and backward pass) is called an epoch. We perform multiple epochs during a training session. A neural network will tend to optimize the results after every training epoch.

Multilayer Perceptron (MLP)

An MLP is a standard feed-forward neural network with at least three layers: an input layer, a hidden layer, and an output layer. Hidden layers come after the input layer in the structure. Deep neural networks have two or more hidden layers in the structure, while an MLP has only one.

Convolutional Neural Network (CNN)

CNNs are generally used for image classification problems, but can also be exposed in Natural Language Processing (NLP), in conjunction with word vectors, because of their proven results. Unlike a regular neural network, a CNN will have additional layers such as convolutional layers and subsampling layers. Convolutional layers take input data (such as images) and apply convolution operations on top of them. You can think of it as applying a function to the input. Convolutional layers act as filters that pass a feature of interest to the upcoming subsampling layer. A feature of interest can be anything (for example, a fur, shade and so on in the case of an image) that can be used to identify the image. In the subsampling layer, the input from convolutional layers is further smoothed. So, we end up with a much smaller image resolution and reduced color contrast, preserving only the important information. The input is then passed on to fully connected layers. Fully connected layers resemble regular feed-forward neural networks.

Recurrent Neural Network (RNN)

An RNN is a neural network that can process sequential data. In a regular feed-forward neural network, the current input is considered for neurons in the next layer. On the other hand, an RNN can accept previously received inputs as well. It can also use memory to memorize previous inputs. So, it is capable of preserving long-term dependencies throughout the training session. RNN is a popular choice for NLP tasks such as speech recognition. In practice, a slightly variant structure called Long Short-Term Memory (LSTM) is used as a better alternative to RNN.

Why is DL4J important for deep learning?

The following points will help you understand why DL4J is important for deep learning:

DL4J provides commercial support. It is the first commercial-grade, open source, deep learning library in Java.
Writing training code is simple and precise. DL4J supports Plug and Play mode, which means switching between hardware (CPU to GPU) is just a matter of changing the Maven dependencies and no modifications are needed on the code.
DL4J uses ND4J as its backend. ND4J is a computation library that can run twice as fast as NumPy (a computation library in Python) in large matrix operations. DL4J exhibits faster training times in GPU environments compared to other Python counterparts.
DL4J supports training on a cluster of machines that are running in CPU/GPU using Apache Spark. DL4J brings in automated parallelism in distributed training. This means that DL4J bypasses the need for extra libraries by setting up worker nodes and connections.
DL4J is a good production-oriented deep learning library. As a JVM-based library, DL4J applications can be easily integrated/deployed with existing corporate applications that are running in Java/Scala.

Determining the right network type to solve deep learning problems

It is crucial to identify the right neural network type to solve a business problem efficiently. A standard neural network can be a best fit for most use cases and can produce approximate results. However, in some scenarios, the core neural network architecture needs to be changed in order to accommodate the features (input) and to produce the desired results. In the following recipe, we will walk through key steps to decide the best network architecture for a deep learning problem with the help of known use cases.

How to do it...

Determine the problem type.
Determine the type of data engaged in the system.

How it works...

To solve use cases effectively, we need to use the right neural network architecture by determining the problem type. The following are globally some use cases and respective problem types to consider for step 1:

Fraud detection problems: We want to differentiate between legitimate and suspicious transactions so as to separate unusual activities from the entire activity list. The intent is to reduce false-positive (that is, incorrectly tagging legitimate transactions as fraud) cases. Hence, this is an anomaly detection problem.
Prediction problems: Prediction problems can be classification or regression problems. For labeled classified data, we can have discrete labels. We need to model data against those discrete labels. On the other hand, regression models don't have discrete labels.
Recommendation problems: You would need to build a recommender system (a recommendation engine) to recommend products or content to customers. Recommendation engines can also be applied to an agent performing tasks such as gaming, autonomous driving, robotic movements, and more. Recommendation engines implement reinforcement learning and can be enhanced further by introducing deep learning into it.

We also need to know the type of data that is consumed by the neural network. Here are some use cases and respective data types for step 2:

Fraud detection problems: Transactions usually happen over a number of time steps. So, we need to continuously collect transaction data over time. This is an example of time series data. Each time sequence represents a new transaction sequence. These time sequences can be regular or irregular. For instance, if you have credit card transaction data to analyze, then you have labeled data. You can also have unlabeled data in the case of user metadata from production logs. We can have supervised/unsupervised datasets for fraud detection analysis, for example. Take a look at the following CSV supervised dataset:

In the preceding screenshot, features such as amount, oldBalanceOrg, and so on make sense and each record has a label indicating whether the particular observation is fraudulent or not.

On the other hand, an unsupervised dataset will not give you any clue about input features. It doesn't have any labels either, as shown in the following CSV data:

As you can see, the feature labels (top row) follow a numbered naming convention without any clue as to its significance for fraud detection outcomes. We can also have time series data where transactions are logged over a series of time steps.

Prediction problems: Historical data collected from organizations can be used to train neural networks. These are usually simple file types such as a CSV/text files. Data can be obtained as records. For a stock market prediction problem, the data type would be a time series. A dog breed prediction problem requires feeding in dog images for network training. Stock price prediction is an example of a regression problem. Stock price datasets usually are time series data where stock prices are measured over a series as follows:

In most stock price datasets, there are multiple files. Each one of them represents a company stock market. And each file will have stock prices recorded over a series of time steps, as shown here:

Recommendation problems: For a product recommendation system, explicit data might be customer reviews posted on a website and implicit data might be the customer activity history, such as product search or purchase history. We will use unlabeled data to feed the neural network. Recommender systems can also solve games or learn a job that requires skills. Agents (trained to perform tasks during reinforcement learning) can take real-time data in the form of image frames or any text data (unsupervised) to learn what actions to make depending on their states.

There's more...

The following are possible deep learning solutions to the problem types previously discussed:

Fraud detection problems: The optimal solution varies according to the data. We previously mentioned two data sources. One was credit card transactions and the other was user metadata based on their login/logoff activities. In the first case, we have labeled data and have a transaction sequence to analyze.

Recurrent networks may be best suited to sequencing data. You can add LSTM (https://deeplearning4j.org/api/latest/org/deeplearning4j/nn/layers/recurrent/LSTM.html) recurrent layers, and DL4J has an implementation for that. For the second case, we have unlabeled data and the best choice would be a variational (https://deeplearning4j.org/api/latest/org/deeplearning4j/nn/layers/variational/VariationalAutoencoder.html) autoencoder to compress unlabeled data.

Prediction problems: For classification problems that use CSV records, a feed-forward neural network will do. For time series data, the best choice would be recurrent networks because of the nature of sequential data. For image classification problems, you would need a CNN (https://deeplearning4j.org/api/latest/org/deeplearning4j/nn/conf/layers/ConvolutionLayer.Builder.html).
Recommendation problems: We can employ Reinforcement Learning (RL) to solve recommendation problems. RL is very often used for such use cases and might be a better option. RL4J was specifically developed for this purpose. We will introduce RL4J in Chapter 9, Using RL4J for Reinforcement Learning, as it would be an advanced topic at this point. We can also go for simpler options such as feed-forward networks RNNs) with a different approach. We can feed an unlabeled data sequence to recurrent or convolutional layers as per the data type (image/text/video). Once the recommended content/product is classified, you can apply further logic to pull random products from the list based on customer preferences.

In order to choose the right network type, you need to understand the type of data and the problem it tries to solve. The most basic neural network that you could construct is a feed-forward network or a multilayer perceptron. You can create multilayer network architectures using NeuralNetConfiguration in DL4J.

Refer to the following sample neural network configuration in DL4J:

MultiLayerConfiguration configuration = new NeuralNetConfiguration.Builder()
 .weightInit(WeightInit.RELU_UNIFORM)
 .updater(new Nesterovs(0.008,0.9))
 .list()
 .layer(new DenseLayer.Builder().nIn(layerOneInputNeurons).nOut(layerOneOutputNeurons).activation(Activation.RELU).dropOut(dropOutRatio).build())
 .layer(new DenseLayer.Builder().nIn(layerTwoInputNeurons).nOut(layerTwoOutputNeurons).activation(Activation.RELU).dropOut(0.9).build())
 .layer(new OutputLayer.Builder(new LossMCXENT(weightsArray))
 .nIn(layerThreeInputNeurons).nOut(numberOfLabels).activation(Activation.SOFTMAX).build())
 .backprop(true).pretrain(false)
 .build();

We specify activation functions for every layer in a neural network, and nIn() and nOut() represent the number of connections in/out of the layer of neurons. The purpose of the dropOut() function is to deal with network performance optimization. We mentioned it in Chapter 3, Building Deep Neural Networks for Binary Classification. Essentially, we are ignoring some neurons at random to avoid blindly memorizing patterns during training. Activation functions will be discussed in the Determining the right activation function recipe in this chapter. Other attributes control how weights are distributed between neurons and how to deal with errors calculated across each epoch.

Let's focus on a specific decision-making process: choosing the right network type. Sometimes, it is better to use a custom architecture to yield better results. For example, you can perform sentence classification using word vectors combined with a CNN. DL4J offers the ComputationGraph (https://deeplearning4j.org/api/latest/org/deeplearning4j/nn/graph/ComputationGraph.html) implementation to accommodate CNN architecture.

ComputationGraph allows an arbitrary (custom) neural network architecture. Here is how it is defined in DL4J:

public ComputationGraph(ComputationGraphConfiguration configuration) {
 this.configuration = configuration;
 this.numInputArrays = configuration.getNetworkInputs().size();
 this.numOutputArrays = configuration.getNetworkOutputs().size();
 this.inputs = new INDArray[numInputArrays];
 this.labels = new INDArray[numOutputArrays];
 this.defaultConfiguration = configuration.getDefaultConfiguration();
//Additional source is omitted from here. Refer to https://github.com/deeplearning4j/deeplearning4j
}

Implementing a CNN is just like constructing network layers for a feed-forward network:

public class ConvolutionLayer extends FeedForwardLayer

A CNN has ConvolutionalLayer and SubsamplingLayer apart from DenseLayer and OutputLayer.

Determining the right activation function

The purpose of an activation function is to introduce non-linearity into a neural network. Non-linearity helps a neural network to learn more complex patterns. We will discuss some important activation functions, and their respective DL4J implementations.

The following are the activation functions that we will consider:

Tanh
Sigmoid
ReLU (short for Rectified Linear Unit)
Leaky ReLU
Softmax

In this recipe, we will walk through the key steps to decide the right activation functions for a neural network.

How to do it...

Choose an activation function according to the network layers: We need to know the activation functions to be used for the input/hidden layers and output layers. Use ReLU for input/hidden layers preferably.
Choose the right activation function to handle data impurities: Inspect the data that you feed to the neural network. Do you have inputs with a majority of negative values observing dead neurons? Choose the appropriate activation functions accordingly. Use Leaky ReLU if dead neurons are observed during training.
Choose the right activation function to handle overfitting: Observe the evaluation metrics and their variation for each training period. Understand gradient behavior and how well your model performs on new unseen data.
Choose the right activation function as per the expected output of your use case: Examine the desired outcome of your network as a first step. For example, the SOFTMAX function can be used when you need to measure the probability of the occurrence of the output class. It is used in the output layer. For any input/hidden layers, ReLU is what you need for most cases. If you're not sure about what to use, just start experimenting with ReLU; if that doesn't improve your expectations, then try other activation functions.

How it works...

For step 1, ReLU is most commonly used because of its non-linear behavior. The output layer activation function depends on the expected output behavior. Step 4 targets this too.

For step 2, Leaky ReLU is an improved version of ReLU and is used to avoid the zero gradient problem. However, you might observe a performance drop. We use Leaky ReLU if dead neurons are observed during training. Dead neurons are referred to as neurons with a zero gradient for all possible inputs, which makes them useless for training.

For step 3, the tanh and sigmoid activation functions are similar and are used in feed-forward networks. If you use these activation functions, then make sure you add regularization to network layers to avoid the vanishing gradient problem. These are generally used for classifier problems.

There's more...

The ReLU activation function is non-linear, hence, the backpropagation of errors can easily be performed. Backpropagation is the backbone of neural networks. This is the learning algorithm that computes gradient descent with respect to weights across neurons. The following are ReLU variations currently supported in DL4J:

ReLU: The standard ReLU activation function:

public static final Activation RELU

ReLU6: ReLU activation, which is capped at 6, where 6 is an arbitrary choice:

public static final Activation RELU6

RReLU: The randomized ReLU activation function:

public static final Activation RRELU

ThresholdedReLU: Threshold ReLU:

public static final Activation THRESHOLDEDRELU

There are a few more implementations, such as SeLU (short for the Scaled Exponential Linear Unit), which is similar to the ReLU activation function but has a slope for negative values.

Combating overfitting problems

As we know, overfitting is a major challenge that machine learning developers face. It becomes a big challenge when the neural network architecture is complex and training data is huge. While mentioning overfitting, we're not ignoring the chances of underfitting at all. We will keep overfitting and underfitting in the same category. Let's discuss how we can combat overfitting problems.

The following are possible reasons for overfitting, including but not limited to:

Too many feature variables compared to the number of data records
A complex neural network model

Self-evidently, overfitting reduces the generalization power of the network and the network will fit noise instead of a signal when this happens. In this recipe, we will walk through key steps to prevent overfitting problems.

How to do it...

Use KFoldIterator to perform k-fold cross-validation-based resampling:

KFoldIterator kFoldIterator = new KFoldIterator(k, dataSet);

Construct a simpler neural network architecture.
Use enough train data to train the neural network.

How it works...

In step 1, k is the arbitrary number of choice and dataSet is the dataset object that represents your training data. We perform k-fold cross-validation to optimize the model evaluation process.

Complex neural network architectures can cause the network to tend to memorize patterns. Hence, your neural network will have a hard time generalizing unseen data. For example, it's better and more efficient to have a few hidden layers rather than hundreds of hidden layers. That's the relevance of step 2.

Fairly large training data will encourage the network to learn better and a batch-wise evaluation of test data will increase the generalization power of the network. That's the relevance of step 3. Although there are multiple types of data iterator and various ways to introduce batch size in an iterator in DL4J, the following is a more conventional definition for RecordReaderDataSetIterator:

public RecordReaderDataSetIterator(RecordReader recordReader,
 WritableConverter converter,
 int batchSize,
 int labelIndexFrom,
 int labelIndexTo,
 int numPossibleLabels,
 int maxNumBatches,
 boolean regression)

There's more...

When you perform k-fold cross-validation, data is divided into k number of subsets. For every subset, we perform evaluation by keeping one of the subsets for testing and the remaining k-1 subsets for training. We will repeat this k number of times. Effectively, we use the entire data for training with no data loss, as opposed to wasting some of the data on testing.

Underfitting is handled here. However, note that we perform the evaluation k number of times only.

When you perform batch training, the entire dataset is divided as per the batch size. If your dataset has 1,000 records and the batch size is 8, then you have 125 training batches.

You need to note the training-to-testing ratio as well. According to that ratio, every batch will be divided into a training set and testing set. Then the evaluation will be performed accordingly. For 8-fold cross-validation, you evaluate the model 8 times, but for a batch size of 8, you perform 125 model evaluations.

Note the rigorous mode of evaluation here, which will help to improve the generalization power while increasing the chances of underfitting.

Determining the right batch size and learning rates

Although there is no specific batch size or learning rate that works for all models, we can find the best values for them by experimenting with multiple training instances. The primary step is to experiment with a set of batch size values and learning rates with the model. Observe the efficiency of the model by evaluating additional parameters such as Precision, Recall, and F1 Score. Test scores alone don't confirm the model's performance. Also, parameters such as Precision, Recall, and F1 Score vary according to the use case. You need to analyze your problem statement to get an idea about this. In this recipe, we will walk through key steps to determine the right batch size and learning rates.

How to do it...

Run the training instance multiple times and track the evaluation metrics.
Run experiments by increasing the learning rate and track the results.

How it works...

Consider the following experiments to illustrate step 1.

The following training was performed on 10,000 records with a batch size of 8 and a learning rate of 0.008:

The following is the evaluation performed on the same dataset for a batch size of 50 and a learning rate of 0.008:

To perform step 2, we increased the learning rate to 0.6, to observe the results. Note that a learning rate beyond a certain limit will not help efficiency in any way. Our job is to find that limit:

You can observe that Accuracy is reduced to 82.40% and F1 Score is reduced to 20.7%. This indicates that F1 Score might be the evaluation parameter to be accounted for in this model. This is not true for all models, and we reach this conclusion after experimenting with a couple of batch sizes and learning rates. In a nutshell, you have to repeat the same process for your model's training and choose arbitrary values that yield the best results.

There's more...

When we increase the batch size, the number of iterations will eventually reduce, hence the number of evaluations will also be reduced. This can overfit the data for a large batch size. A batch size of 1 is as useless as a batch size based on an entire dataset. So, you need to experiment with values starting from a safe arbitrary point.

A very small learning rate will lead to a very small convergence rate to the target. This can also impact the training time. If the learning rate is very large, this will cause divergent behavior in the model. We need to increase the learning rate until we observe the evaluation metrics getting better. There is an implementation of a cyclic learning rate in the fast.ai and Keras libraries; however, a cyclic learning rate is not implemented in DL4J.

Configuring Maven for DL4J

We need to add DL4J/ND4J Maven dependencies to leverage DL4J capabilities. ND4J is a scientific computation library dedicated to DL4J. It is necessary to mention the ND4J backend dependency in your pom.xml file. In this recipe, we will add a CPU-specific Maven configuration in pom.xml.

Getting ready

Let's discuss the required Maven dependencies. We assume you have already done the following:

JDK 1.7, or higher, is installed and the PATH variable is set.
Maven is installed and the PATH variable is set.

A 64-bit JVM is required to run DL4J.

Set the PATH variable for JDK and Maven:

On Linux: Use the export command to add Maven and JDK to the PATH variable:

export PATH=/opt/apache-maven-3.x.x/bin:$PATH
export PATH=${PATH}:/usr/java/jdk1.x.x/bin

Replace the version number as per the installation.

On Windows: Set System Environment variables from system Properties:

set PATH="C:/Program Files/Apache Software Foundation/apache-maven-3.x.x/bin:%PATH%"
 set PATH="C:/Program Files/Java/jdk1.x.x/bin:%PATH%"

Replace the JDK version number as per the installation.

How to do it...

Add the DL4J core dependency:

<dependency>
 <groupId>org.deeplearning4j</groupId>
 <artifactId>deeplearning4j-core</artifactId>
 <version>1.0.0-beta3</version>
 </dependency>

Add the ND4J native dependency:

<dependency>
 <groupId>org.nd4j</groupId>
 <artifactId>nd4j-native-platform</artifactId>
 <version>1.0.0-beta3</version>
 </dependency>

Add the DataVec dependency to perform ETL (short for Extract, Transform and Load) operations:

<dependency>
 <groupId>org.datavec</groupId>
 <artifactId>datavec-api</artifactId>
 <version>1.0.0-beta3</version>
 </dependency>

Enable logging for debugging:

<dependency>
 <groupId>org.slf4j</groupId>
 <artifactId>slf4j-simple</artifactId>
 <version>1.7.25</version> //change to latest version
 </dependency>

Note that 1.0.0-beta 3 is the current DL4J release version at the time of writing this book, and is the official version used in this cookbook. Also, note that DL4J relies on an ND4J backend for hardware-specific implementations.

How it works...

After adding DL4J core dependency and ND4J dependencies, as mentioned in step 1 and step 2, we are able to create neural networks. In step 2, the ND4J maven configuration is mentioned as a necessary backend dependency for Deeplearnign4j. ND4J is the scientific computation library for Deeplearning4j.

ND4J is a scientific computing library written for Java, just like NumPy is for Python.

Step 3 is very crucial for the ETL process: that is, data extraction, transformation, and loading. So, we definitely need this as well in order to train the neural network using data.

Step 4 is optional but recommended, since logging will reducee the effort involved in debugging.

Configuring DL4J for a GPU-accelerated environment

For GPU-powered hardware, DL4J comes with a different API implementation. This is to ensure the GPU hardware is utilized effectively without wasting hardware resources. Resource optimization is a major concern for expensive GPU-powered applications in production. In this recipe, we will add a GPU-specific Maven configuration to pom.xml.

Getting ready

You will need the following in order to complete this recipe:

JDK version 1.7, or higher, installed and added to the PATH variable
Maven installed and added to the PATH variable
NVIDIA-compatible hardware
CUDA v9.2+ installed and configured
cuDNN (short for CUDA Deep Neural Network) installed and configured

How to do it...

Download and install CUDA v9.2+ from the NVIDIA developer website URL: https://developer.nvidia.com/cuda-downloads.
Configure the CUDA dependencies. For Linux, go to a Terminal and edit the .bashrc file. Run the following commands and make sure you replace username and the CUDA version number as per your downloaded version:

nano /home/username/.bashrc
 export PATH=/usr/local/cuda-9.2/bin${PATH:+:${PATH}}$
 
 export LD_LIBRARY_PATH=/usr/local/cuda-9.2/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
 
 source .bashrc

Add the lib64 directory to PATH for older DL4J versions.
Run the nvcc --version command to verify the CUDA installation.
Add Maven dependencies for the ND4J CUDA backend:

<dependency>
 <groupId>org.nd4j</groupId>
 <artifactId>nd4j-cuda-9.2</artifactId>
 <version>1.0.0-beta3</version>
 </dependency>

Add the DL4J CUDA Maven dependency:

<dependency>
 <groupId>org.deeplearning4j</groupId>
 <artifactId>deeplearning4j-cuda-9.2</artifactId>
 <version>1.0.0-beta3</version>
 </dependency>

Add cuDNN dependencies to use bundled CUDA and cuDNN:

<dependency>
 <groupId>org.bytedeco.javacpp-presets</groupId>
 <artifactId>cuda</artifactId>
 <version>9.2-7.1-1.4.2</version>
 <classifier>linux-x86_64-redist</classifier> //system specific
 </dependency>

How it works...

We configured NVIDIA CUDA using steps 1 to 4. For more detailed OS-specific instructions, refer to the official NVIDIA CUDA website at https://developer.nvidia.com/cuda-downloads.

Depending on your OS, installation instructions will be displayed on the website. DL4J version 1.0.0-beta 3 currently supports CUDA installation versions 9.0, 9.2, and 10.0. For instance, if you need to install CUDA v10.0 for Ubuntu 16.04, you should navigate the CUDA website as shown here:

Note that step 3 is not applicable to newer versions of DL4J. For of 1.0.0-beta and later versions, the necessary CUDA libraries are bundled with DL4J. However, this is not applicable for step 7.

Additionally, before proceeding with steps 5 and 6, make sure that there are no redundant dependencies (such as CPU-specific dependencies) present in pom.xml.

DL4J supports CUDA, but performance can be further accelerated by adding a cuDNN library. cuDNN does not show up as a bundled package in DL4J. Hence, make sure you download and install NVIDIA cuDNN from the NVIDIA developer website. Once cuDNN is installed and configured, we can follow step 7 to add support for cuDNN in the DL4J application.

There's more...

For multi-GPU systems, you can consume all GPU resources by placing the following code in the main method of your application:

CudaEnvironment.getInstance().getConfiguration().allowMultiGPU(true);

This is a temporary workaround for initializing the ND4J backend in the case of multi-GPU hardware. In this way, we will not be limited to only a few GPU resources if more are available.

Troubleshooting installation issues

Though the DL4J setup doesn't seem complex, installation issues can still happen because of different OSes or applications installed on the system, and so on. CUDA installation issues are not within the scope of this book. Maven build issues that are due to unresolved dependencies can have multiple causes. If you are working for an organization with its own internal repositories and proxies, then you need to make relevant changes in the pom.xml file. These issues are also outside the scope of this book. In this recipe, we will walk through the steps to mitigate common installation issues with DL4J.

Getting ready

The following checks are mandatory before we proceed:

Verify Java and Maven are installed and the PATH variables are configured.
Verify the CUDA and cuDNN installations.
Verify that the Maven build is successful and the dependencies are downloaded at ~/.m2/repository.

How to do it...

Enable logging levels to yield more information on errors:

Logger log = LoggerFactory.getLogger("YourClassFile.class");
 log.setLevel(Level.DEBUG);

Verify the JDK/Maven installation and configuration.
Check whether all the required dependencies are added in the pom.xml file.
Remove the contents of the Maven local repository and rebuild Maven to mitigate NoClassDefFoundError in DL4J. For Linux, this is as follows:

rm -rf ~/.m2/repository/org/deeplearning4j
 rm -rf ~/.m2/repository/org/datavec
 mvn clean install

Mitigate ClassNotFoundException in DL4J. You can try this if step 4 didn't help to resolve the issue. DL4J/ND4J/DataVec should have the same version. For CUDA-related error stacks, check the installation as well.

If adding the proper DL4J CUDA version doesn't fix this, then check your cuDNN installation.

How it works...

To mitigate exceptions such as ClassNotFoundException, the primary task is to verify we installed the JDK properly (step 2) and whether the environment variables we set up point to the right place. Step 3 is also important as the missing dependencies result in the same error.

In step 4, we are removing redundant dependencies that are present in the local repository and are attempting a fresh Maven build. Here is a sample for NoClassDefFoundError while trying to run a DL4J application:

root@instance-1:/home/Deeplearning4J# java -jar target/dl4j-1.0-SNAPSHOT.jar
 09:28:22.171 [main] INFO org.nd4j.linalg.factory.Nd4jBackend - Loaded [JCublasBackend] backend
 Exception in thread "main" java.lang.NoClassDefFoundError: org/nd4j/linalg/api/complex/IComplexDouble
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:264)
 at org.nd4j.linalg.factory.Nd4j.initWithBackend(Nd4j.java:5529)
 at org.nd4j.linalg.factory.Nd4j.initContext(Nd4j.java:5477)
 at org.nd4j.linalg.factory.Nd4j.(Nd4j.java:210)
 at org.datavec.image.transform.PipelineImageTransform.(PipelineImageTransform.java:93)
 at org.datavec.image.transform.PipelineImageTransform.(PipelineImageTransform.java:85)
 at org.datavec.image.transform.PipelineImageTransform.(PipelineImageTransform.java:73)
 at examples.AnimalClassifier.main(AnimalClassifier.java:72)
 Caused by: java.lang.ClassNotFoundException: org.nd4j.linalg.api.complex.IComplexDouble

One possible reason for NoClassDefFoundError could be the absence of required dependencies in the Maven local repository. So, we remove the repository contents and rebuild Maven to download the dependencies again. If any dependencies were not downloaded previously due to an interruption, it should happen now.

Here is an example of ClassNotFoundException during DL4J training:

Again, this suggests version issues or redundant dependencies.

There's more...

In addition to the common runtime issues that were discussed previously, Windows users may face cuDNN-specific errors while training a CNN. The actual root cause could be different and is tagged under UnsatisfiedLinkError:

o.d.n.l.c.ConvolutionLayer - Could not load CudnnConvolutionHelper
 java.lang.UnsatisfiedLinkError: no jnicudnn in java.library.path
 at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1867) ~[na:1.8.0_102]
 at java.lang.Runtime.loadLibrary0(Runtime.java:870) ~[na:1.8.0_102]
 at java.lang.System.loadLibrary(System.java:1122) ~[na:1.8.0_102]
 at org.bytedeco.javacpp.Loader.loadLibrary(Loader.java:945) ~[javacpp-1.3.1.jar:1.3.1]
 at org.bytedeco.javacpp.Loader.load(Loader.java:750) ~[javacpp-1.3.1.jar:1.3.1]
 Caused by: java.lang.UnsatisfiedLinkError: C:\Users\Jürgen.javacpp\cache\cuda-7.5-1.3-windows-x86_64.jar\org\bytedeco\javacpp\windows-x86_64\jnicudnn.dll: Can't find dependent libraries
 at java.lang.ClassLoader$NativeLibrary.load(Native Method) ~[na:1.8.0_102]

Perform the following steps to fix the issue:

Download the latest dependency walker here: https://github.com/lucasg/Dependencies/.
Add the following code to your DL4J main() method:

try {
 Loader.load(<module>.class);
 } catch (UnsatisfiedLinkError e) {
 String path = Loader.cacheResource(<module>.class, "windows-x86_64/jni<module>.dll").getPath();
 new ProcessBuilder("c:/path/to/DependenciesGui.exe", path).start().waitFor();
 }

Replace <module> with the name of the JavaCPP preset module that is experiencing the problem; for example, cudnn. For newer DL4J versions, the necessary CUDA libraries are bundled with DL4J. Hence, you should not face this issue.

If you feel like you might have found a bug or functional error with DL4J, then feel free to create an issue tracker at https://github.com/eclipse/deeplearning4j.

You're also welcome to initiate a discussion with the Deeplearning4j community here: https://gitter.im/deeplearning4j/deeplearning4j.

Maxwell Dec 16, 2019

This is an excellently put together book on deep learning. The author puts you on a path to truly understanding how to use java to build learning applications. Granted, you will need the skills to back it up, but if you're like me and just needed a bit of help to get you started -- then this is book is an excellent choice.I liked how well this book flowed as well. Seeing a huge wall of text can be be off-putting when trying to learn something complex like this. I can confidently say I came away with more knowledge than I had when I started. So for me, it was worth the purchase.

Amazon Verified review

Cliente Amazon Mar 23, 2020

Thee book is awesome if you don't have knowledge about DL4J. It explains quite well how the basic of algorithms for Machine Learning work. I found really good examples, but incompletes in some way. My humble opinion: it's a good choice if you don't have prior knowledge in DL4J.