Preprocessing data in a relational database using SQL
We will start by learning how to run SQL statements in the database from R. The first few examples show how processing data in a database instead of moving all the data into R can result in faster performance even for simple operations.
To run the examples in this chapter, you will need a database server supported by R. The CRAN package, RJDBC
provides an interface to JDBC drivers that most databases come with. Alternatively, search on CRAN for packages such as RPostgreSQL
, RMySQL
, and ROracle
that offer functionalities and optimizations specific to each database.
The following examples are based on a PostgreSQL database and the RPostgreSQL
package as we will need them later in this chapter when we learn about the PivotalR
package and MADlib software. Feel free, however, to adapt the code to the database that you use.
Configuring PostgreSQL to work with R involves setting up both the server and the client. First, we need to set up the PostgreSQL...