Shared memory versus distributed memory parallelism
In the examples that we have seen so far, data is copied from the master process or node to each worker. This is called distributed memory parallelism, where each process has its own memory space. In other words, each process needs to have its own copy of the data that it needs to work on, even if multiple processes are working on the same data. This is the typical way to distribute data in a cluster of computers because the workers in the cluster cannot access each other's RAM, so they need their own copy of the data.
However, this can result in huge redundancies when you run a parallel code on multiple processes on a single computer. If a dataset takes up 5 GB of memory, then running four parallel processes could result in five copies of the data in memory—one for the master and four for the workers—occupying a total of 25 GB. Earlier, we saw that forked clusters might not suffer from this problem, as most operating systems do not make...