Preface
In a world where data is becoming increasingly important, business people and scientists need tools to analyze and process large volumes of data efficiently. R is one of the tools that have become increasingly popular in recent years for data processing, statistical analysis, and data science. While R has its roots in academia, it is now used by organizations across a wide range of industries and geographical areas.
But the design of R imposes some inherent limits on the size of the data and the complexity of computations that it can manage efficiently. This can be a huge obstacle for R users who need to process the ever-growing volume of data in their organizations.
This book, R High Performance Programming, will help you understand the situations that often pose performance difficulties in R, such as memory and computational limits. It will also show you a range of techniques to overcome these performance limits. You can choose to use these techniques alone, or in various combinations that best fit your needs and your computing environment.
This book is designed to be a practical guide on how to improve the performance of R programs, with just enough explanation of why, so that you understand the reasoning behind each solution. As such, we will provide code examples for every technique that we cover in this book, along with performance profiling results that we generated on our machines to demonstrate the performance improvements. We encourage you to follow along by entering and running the code in your own environment to see the performance improvements for yourself.
If you would like to understand how R is designed and why it has performance limitations, the R Internals documentation (http://cran.r-project.org/doc/manuals/r-release/R-ints.html) will provide helpful clues.
This book is written based on open source R because it is the most widely used version of R and is freely available to anybody. If you are using a commercial version of R, check with your software vendor to see what performance improvements they might have made available to you.
The R community has created many new packages to improve the performance of R, which are available on the Comprehensive R Archive Network (CRAN) (http://cran.r-project.org/). We cannot analyze every package on CRAN—there are thousands of them—to see if they provide performance enhancements for specific operations. Instead, this book focuses on the most common tasks for R programmers and introduces techniques that you can use on any R project.