14.3 Using multiprocessing pools and tasks
Python’s multiprocessing
package introduces the concept of a Pool
object. A Pool
object contains a number of worker processes and expects these processes to be executed concurrently. This package allows OS scheduling and time slicing to interleave execution of multiple processes. The intention is to keep the overall system as busy as possible.
To make the most of this capability, we need to decompose our application into components for which non-strict, concurrent execution is beneficial. The overall application must be built from discrete tasks that can be processed in an indefinite order.
An application that gathers data from the internet through web scraping, for example, is often optimized through concurrent processing. A number of individual processes can be waiting for the data to download, while others are performing the scraping operation on data that’s been received. We can create a Pool
object of several identical workers...