Getting faster with Intel oneCCL
The results shown in Table 9.2 attest that Gloo fulfills the role of the communication backend for the distributed training process in PyTorch very well.
Even so, there is another option for the communication backend to go even faster on Intel platforms: the Intel oneCCL collective communication library. In this section, we will learn what this library is and how to use it as a communication backend for PyTorch.
What is Intel oneCCL?
Intel oneCCL is a collective communication library created and maintained by Intel. Along the lines of Gloo, oneCCL also provides collective communication primitives such as the so-called “All-reduce.”
Naturally, Intel oneCCL is optimized to run on Intel platform environments, though this does not necessarily mean it will not work on other platforms. We can use this library to provide collective communication among the processes executing in the same machine (intraprocess communication) or the...