Learning the fundamentals of parallelism strategies
In the previous section, we learned that the distributed training approach divides the whole training process into small parts. As a result, the entire training process can be solved in parallel because each of these small parts is executed simultaneously in distinct computing resources.
The parallelism strategy defines how to divide the training process into small parts. There are two main parallelism strategies: model and data parallelism. The following sections explain both.
Model parallelism
Model parallelism divides the set of operations that are executed during the training process into smaller subsets of computing tasks. By doing this, the distributed process can run these smaller subsets of operations in distinct computing resources, thus accelerating the entire training process.
It turns out that operations executed in the forward and backward phases are not independent of each other. In other words, the execution...