Back in Chapter 3, Modern Neural Networks, we introduced convolutional layers, the operations they perform, and how their hyperparameters (kernel size k, input depth D, number of kernels N, padding p, and stride s) affect the dimensions of their output (Figure 6-3 serves as a reminder). For an input tensor of shape (H, W, D), we presented the following equations to evaluate the output shape (Ho, Wo, N):
Now, let's assume that we want to develop a layer to reverse the spatial transformation of convolutions. In other words, given a feature map of shape (Ho, Wo, N) and the same hyperparameters, k, D, N, p, and s, we would like a convolution-like operation to recover a tensor of shape (H, W, D). Isolating H and W in the previous equations, we thus want an operation upholding the following properties:
This is how transposed convolutions were defined. As we briefly mentioned in Chapter 4, Influential Classification Tools, this...