Training with Multiple CPUs
When accelerating the model-building process, we immediately think of machines endowed with GPU devices. What if I told you that running distributed training on machines equipped only with multicore processors is possible and advantageous?
Although the performance improvement obtained from GPUs is incomparable, we should not disdain the computing power provided by modern CPUs. Processor vendors have continuously increased the number of computing cores on CPUs, besides creating sophisticated mechanisms to treat access contention to shared resources.
Using CPUs to run distributed training is especially interesting for cases where we do not have easy access to GPU devices. Thus, learning this topic is vital to enrich our knowledge about distributed training.
In this chapter, we show how to execute the distributed training process on multiple CPUs in a single machine by adopting a general approach and using the Intel oneCCL backend.
Here is what...