Optimizing parallel performance
Throughout the examples in this chapter, we saw various factors that affect the performance of parallel code.
One overhead in running a parallel R code is in setting up the cluster. By default, makeCluster()
instructs the worker processes to load the methods
package when they start. This can take a good amount of time, so if the task to be run does not require methods, this behavior can be disabled by passing methods=FALSE
to makeCluster()
.
One of the biggest obstacles to parallel performance is the copying and transmission of data between the master process and the worker process. This obstacle can be large when you run parallel tasks on a cluster of computers, as many factors such as limited network bandwidth, and data encryption slow down the transmission of data even before any computations can be done. Even on a single computer, unnecessary copying of data in memory takes up precious seconds that can multiply as the data grows. This can also happen the...