The following are possible deep learning solutions to the problem types previously discussed:
- Fraud detection problems: The optimal solution varies according to the data. We previously mentioned two data sources. One was credit card transactions and the other was user metadata based on their login/logoff activities. In the first case, we have labeled data and have a transaction sequence to analyze.
Recurrent networks may be best suited to sequencing data. You can add LSTM (https://deeplearning4j.org/api/latest/org/deeplearning4j/nn/layers/recurrent/LSTM.html) recurrent layers, and DL4J has an implementation for that. For the second case, we have unlabeled data and the best choice would be a variational (https://deeplearning4j.org/api/latest/org/deeplearning4j/nn/layers/variational/VariationalAutoencoder.html) autoencoder to compress unlabeled data.
- Prediction problems: For classification problems that use CSV records, a feed-forward neural network will do. For time series data, the best choice would be recurrent networks because of the nature of sequential data. For image classification problems, you would need a CNN (https://deeplearning4j.org/api/latest/org/deeplearning4j/nn/conf/layers/ConvolutionLayer.Builder.html).
- Recommendation problems: We can employ Reinforcement Learning (RL) to solve recommendation problems. RL is very often used for such use cases and might be a better option. RL4J was specifically developed for this purpose. We will introduce RL4J in Chapter 9, Using RL4J for Reinforcement Learning, as it would be an advanced topic at this point. We can also go for simpler options such as feed-forward networks RNNs) with a different approach. We can feed an unlabeled data sequence to recurrent or convolutional layers as per the data type (image/text/video). Once the recommended content/product is classified, you can apply further logic to pull random products from the list based on customer preferences.
In order to choose the right network type, you need to understand the type of data and the problem it tries to solve. The most basic neural network that you could construct is a feed-forward network or a multilayer perceptron. You can create multilayer network architectures using NeuralNetConfiguration in DL4J.
Refer to the following sample neural network configuration in DL4J:
MultiLayerConfiguration configuration = new NeuralNetConfiguration.Builder()
.weightInit(WeightInit.RELU_UNIFORM)
.updater(new Nesterovs(0.008,0.9))
.list()
.layer(new DenseLayer.Builder().nIn(layerOneInputNeurons).nOut(layerOneOutputNeurons).activation(Activation.RELU).dropOut(dropOutRatio).build())
.layer(new DenseLayer.Builder().nIn(layerTwoInputNeurons).nOut(layerTwoOutputNeurons).activation(Activation.RELU).dropOut(0.9).build())
.layer(new OutputLayer.Builder(new LossMCXENT(weightsArray))
.nIn(layerThreeInputNeurons).nOut(numberOfLabels).activation(Activation.SOFTMAX).build())
.backprop(true).pretrain(false)
.build();
We specify activation functions for every layer in a neural network, and nIn() and nOut() represent the number of connections in/out of the layer of neurons. The purpose of the dropOut() function is to deal with network performance optimization. We mentioned it in Chapter 3, Building Deep Neural Networks for Binary Classification. Essentially, we are ignoring some neurons at random to avoid blindly memorizing patterns during training. Activation functions will be discussed in the Determining the right activation function recipe in this chapter. Other attributes control how weights are distributed between neurons and how to deal with errors calculated across each epoch.
Let's focus on a specific decision-making process: choosing the right network type. Sometimes, it is better to use a custom architecture to yield better results. For example, you can perform sentence classification using word vectors combined with a CNN. DL4J offers the ComputationGraph (https://deeplearning4j.org/api/latest/org/deeplearning4j/nn/graph/ComputationGraph.html) implementation to accommodate CNN architecture.
ComputationGraph allows an arbitrary (custom) neural network architecture. Here is how it is defined in DL4J:
public ComputationGraph(ComputationGraphConfiguration configuration) {
this.configuration = configuration;
this.numInputArrays = configuration.getNetworkInputs().size();
this.numOutputArrays = configuration.getNetworkOutputs().size();
this.inputs = new INDArray[numInputArrays];
this.labels = new INDArray[numOutputArrays];
this.defaultConfiguration = configuration.getDefaultConfiguration();
//Additional source is omitted from here. Refer to https://github.com/deeplearning4j/deeplearning4j
}
Implementing a CNN is just like constructing network layers for a feed-forward network:
public class ConvolutionLayer extends FeedForwardLayer
A CNN has ConvolutionalLayer and SubsamplingLayer apart from DenseLayer and OutputLayer.