Setting the stage for high-performance computing
In this recipe, we will prepare you so that you can perform computing with multiple cores, in clusters, and in MapReduce frameworks. We will use a simple example, where we compute the minimum allele frequency (MAF) of loci across the human genome using the TSI ("Toscani in Italy") HapMap population. Refer to Chapter 6, Phylogenetics, for details on the HapMap data.
We will perform two different kinds of tasks here. First is preparing the data, another is structuring computations as if we were using the parallel computing framework. The sequential execution is a safe and predictable environment to introduce parallel programming concepts, even if we still do not do actual concurrent execution in this recipe. We will use this recipe to also introduce some pitfalls with big data processing and a few basic functional programming techniques.
Getting ready
Here, we will need the data we used in the Managing datasets with PLINK recipe in Chapter 6, Phylogenetics...