Executing tasks in parallel on a cluster of computers
By using the parallel
package, we are not limited to running parallel code on a single computer; we can also do it on a cluster of computers. This allows much larger computational tasks to be performed, irrespective of whether we use data parallelism or task parallelism. Only socket-based clusters can be used for this purpose, as processes cannot be forked onto a different computer.
There are many ways to set up a cluster of computers to work with R. To keep things simple, all computers in the cluster should have the same configuration for R—the same version of R, installed in the same directories, installed with the same versions of any packages required, and running on the same operating system. The examples in this section have been tested on a cluster of three computers running Ubuntu 14.04—one master node and two worker nodes.
The master and worker nodes should be on the same network and able to communicate with each other via SSH...