What you need for this book
All the codes in this book were developed in R 3.1.1 64-bit on Mac OS X 10.9. Wherever possible, they have also been tested on Ubuntu desktop 14.04 LTS and Windows 8.1. All code examples can be downloaded from https://github.com/r-high-performance-programming/rhpp-2015.
To follow along the code examples, we recommend you to install R 3.1.1 64-bit or a later version in your environment.
We also recommend you to run R in a Unix environment (this includes Linux and Mac OS X). While R runs on Windows, some packages that we will use, for example, "bigmemory" runs only in a Unix environment. Whenever there are differences between Unix and Windows in our code examples, we will indicate them.
You will need the 64-bit version of R, as certain operations (for example, creating a vector with 231 or more elements) are not possible in the 32-bit version. Also, the 64-bit version of R can make use of as much memory as is available on your system, whereas the 32-bit version is limited to not more than 4 GB of memory (on some operating systems, the limit can be as low as 2 GB).
You will also need to install packages in your R environment, as the examples in several chapters will depend on additional packages.
The examples in some chapters require other software or packages to run. These will be listed in the respective chapters along with installation instructions.
If you do not have access to some of the software and tools required for the examples, you can run them on Amazon Web Services (AWS). In particular, the examples in Chapter 5, Using GPUs to Run R Even Faster, require a computer with an NVIDIA GPU with CUDA capabilities; those in Chapter 9, Offloading Data Processing to Database Systems, require various database systems; and those in Chapter 10, R and Big Data, require Hadoop.
To use AWS, log in to http://aws.amazon.com/ with your Amazon account. Create an account if you do not have one. Creating an account is free, but there are charges for using servers, storage, and other resources. Consult the AWS website for the latest prices in your preferred region.
AWS services are provided in different regions around the world. At the time of writing this book, there are eight regions—three in the United States, one in Europe, three in the Asia Pacific, and one in South America. Pick any region you like, such as the one closest to where you are or the one with the lowest prices. To select a region, go to AWS Console (http://console.aws.amazon.com) and select the region in the upper-right corner. Once you have selected a region, use the same region for all the AWS resources you need for the examples in this book.
Before setting up any compute resource, such as a server or Hadoop cluster, you need a key pair to log in to the server. If you do not already have an AWS Elastic Compute Cloud (EC2) key pair, follow these steps to generate one:
- Go to AWS Console and click on EC2.
- Click on Key Pairs in the menu on the left.
- Click on Create Key Pair.
- Enter a name for the new key pair (for example, mykey).
- Once you click on Create, the private key (for example, mykey.pem) will be downloaded on your computer.
On Linux and Mac OS X, change the permissions of the private key file to allow only the read access to the owner; this can be done with chmod 400 mykey.pem in a Terminal window.