You're reading from R Data Visualization Recipes A cookbook with 65+ data visualization recipes for smarter decision-making

Product type Paperback

Published in Nov 2017

Publisher Packt

ISBN-13 9781788398312

Length 366 pages

Edition 1st Edition

Languages

Tools

ggplot

Concepts

Data Visualization

Author (1):

Vitor Bianchi Lanzetta

View More author details

Using ggplot2, plotly, and ggvis

ggplot2, ggvis, and plotly have proven to be very useful graphical packages in the R universe. Each of them gained a respectful sum of popularity among R users, being recalled for the several graphical tasks each of them can handle in very elegant manners.

The purpose of this section is to give a brief introduction on the general framework of ggplot2 via some basic examples, and relate how to tackle similar quests using ggvis and plotly. Along the way, some pros and cons from each package will be highlighted.

Whenever you need to choose between some packages (and base R), it's important to balance the tasks each one were designed to handle, the amount of work it will require for you to achieve your goal (learning time included), and the time you actually have. It's also good to consider scale gains in future uses. For example, mastering ggplot2 may not seem a smart choice for a single time task but might pay-off if you're expecting lots of graphical challenges in the future.

Keep in mind that all the three packages are eligible for a large convoy of tasks. There are some jobs that a specific package is more suitable for and even some tasks that can be considered almost impracticable for others. This point will become clearer as the book goes on.

Getting ready

The only requirement this section holds is to have the ggplot2, ggvis, and plotly packages properly installed. Go back to Installing and loading graphics packages recipe if that is not the case. Once the installation is checked, it's time to know ggplot2 framework.

How to do it...

First things first, in order to plot using ggplot2, data must come from a data frame object. Data can come from more than one data frame but it's mandatory to have it arranged into objects from the data frame class.

We took the cars data set to fit this first graphic. It's good to actually get to know the data before plotting, so let's do it using the ?, class(), and head() functions:

> ?cars
> class(cars)
> head(cars)

Plots coming from ggplot2 can be stored by objects. They would fit two classes at same time, gg and ggplot:

> library(ggplot2)
> plot1 <- ggplot(cars, aes(x = speed,y = dist))

Objects created by the ggplot() function get to be from classes gg and ggplot at the same time. That said, you can to refer to a plot crafted by ggplot2 as a ggplot.

The three packages work more or less in a layered way. To add what we call layers to a ggplot, we can use the + operator:

 > plot1 + geom_point()

The + operator is in reality a function.

Result is shown by the following figure:

Figure 1.2 - Simple ggplot2 scatterplot.

Once you learn this framework, getting to know how ggvis works becomes much easier, and vice-versa. A similar graphic can be crafted with the following code:

> library(ggvis)
> ggvis(data = cars, x = ~speed, y = ~dist) %>% layer_points()

plotly would feel a little bit different, but it's not difficult at all to grasp how it works:

> library(plotly)
> plot_ly(data = cars, x = ~speed, y = ~dist, type = 'scatter', mode = 'markers')

Let's give these nuts and bolts some explanations.

How it works...

In order to have a brief data introduction, step 1 starts by calling ?cars. This is a very useful way to get to meet variables and background related to almost every data set coming from a package. Once ggplot2 requires data coming from data frames, class() function is checking if is that the case, answer is affirmative. At the end of this step head() function is checking upon the first six observations.

Moving on to step 2, after loading ggplot2, it demonstrates how to store the basic coordinate mapping and aesthetics into an object called plot1 (try it on the class() function). In order to set the basics, it uses a function (ggplot()) that initializes every single ggplot.

Storing a plot coming from ggplot2, ggvis, or plotly package into an object is optional, though very useful way to proceed.

To properly set ggplot(), start by declaring data set using data argument. After that, some basic aesthetics and coordinates are assigned. Different figures can ask and work along with different aesthetics, for the majority of cases those are named inside the aes() function.

As the books goes on you're going to get used to the ways how aesthetics can be declared-in or outside the aes() function. For now, let's acknowledged that inside aes() it's possible to call data frame variables by name and they may be displayed in legends.

Checking ?aes() shows "..." as argument, popularly known as three-dots but technically named ellipsis. It allows the user to pass an arbitrary number and variety of arguments. So as ggplot2 does lazy-evaluation (only evaluates arguments as they are requested, you could make up arguments and pass them into the aes() function with zero or only little trouble to the function. Perceive the following:

> plot1 <- ggplot(cars, aes(x = speed,y = dist, gorillaTroubleShooter = T, sight = 'Legolas'))

It would work as good as the earlier version. Just don't forget to name the arguments and you got yourself a good way to create some Easter eggs at your code (also a good way to confuse unaware developers). Both aes() and ggplot() play core roles in building graphics within this package.

Until step 2, only coordinate mapping was set at object named plot1, calling for it alone displays an empty graphic. Step 3 uses %+% to add a layer, the layer called (geom_point()) took care of fixing a geometry to the graphic. Besides the plus sign, ggplots are usually constructed by two families of functions (layers): geom_*and stat_*. While the first family comes with a fixed geometry and a default statistical transformation, the second one comes with fixed statistical transformations and a default geometry (this is grammar of graphics for real), defaults can be tweaked.

plot1 + stat_identity(geom = 'point') works just the same as step 3. Argument geom is set for 'point' as default for stat_identity(), it's fine to skip it. The reason I declared it was to reinforce that if you call for a statistical transformation you can pick the geometry and it goes the other way round (if you call for a geometry you can change the statistical transformation).

Behind the scene, geom_point() called the layer() function, which set a couple of arguments that culminated in the creation of a scatterplot. One may want to modify the axis labels and add a regression line. It can be done by simply adding more layers to the plot using the plus sign. One can stack as many layers desired, as shown next:

> plot1 + geom_point() +
> labs(x = "Speed (mpg)", y = "Distance (ft)") +
> geom_smooth(method = "lm", se = F) +
> scale_y_continuous(breaks = seq(0, 125, 25))

Result is exhibited by figure 1.3:

Figure 1.3 - Adding up several layers to a ggplot.

Combining ggplot2's sum operator (that is actually a function) and functions allows the user to make plots in a layered, iterative way. It splits complex graphics construction into several simple steps. It's also very intuitive and does not get any harder as you practice.

Yet, there are limitations. The difficulty to make interactive graphics by itselft may be one. These tasks, in the majority of the cases, are very well handled by both ggvis and plotly as stand alone packages. This leads us to steps 4 and 5.

Calling plotly::ggplotly() after bringing a ggplot up will coerce it into an interactive plot. It may fail sometimes. Do not forget to have plotly installed.

Step 4 loads ggvis package using library() and then gives birth to an interactive plot. It holds many similarities with ggplot2. Function ggvis() handles basic coordinating mapping while pipe operator (%>%) is used to add up a layer called by the layer_points() function. Remember, pipe operator and not plus sign.

ggvis understands different arguments declared using = (ever scaled) and := (never scaled). Also, ~ must come before the variable names.

Function names may change and also does the operator used to add up layers from ggplot2 to ggvis, but essentially the underlying logic keeps still. Layers coming from ggvis has several correspondences with ggplot2's ones; refer to the See also section to track some. In comparison with ggplot2, ggvis is much younger and some utilities may be yet to come, also data don't need to come from a data frame object.

Step 5 draws an interactive plotly graph. A single function (plot_ly()) takes care of coordinate mapping and geometry. It can be designed a little more layered using the add_traces() function, but there is no real need for that when the plot is too simple. Instead of having many functions demanding statistical transformations and geometries those are declared by arguments inside the main function.

These three packages, ggplot2, ggvis, and plotly, are well coded and powerful graphic packages. Right before picking one of them to handle a task do ever consider some points like:

What the package is able to do
Time needed to master the skill set required
Time required to handle the task
Amount of time available
Time to be saved later by the thing that you learned

Base R is also a feasible possibility. Whenever you face new challenges, it is a good thing to think through these points.

There's more

To have data coming solely from data frames is a strong restriction, but it does obligate the user to be explicit about the data and also draw a very clear line on what is ggplot2's concern (data visualization) and what is not (model visualization). In order to avoid headaches that come from downloading spreadsheets, setting up working directories, and loading data from files, we're taking an alternative way: getting data from packages instead.

data.frame() may be the most convenient function to coerce vectors into data frames in R.

By doing this, we ensure that the readers only need to reach the R's console to reproduce recipes; we want nothing to do with web browsers (we're too cool for school, school meaning web browsers). We shall follow this approach to the end of the book. This recipe look over datasets base packages to do so. ggplot2 has some data frames of its own.

Enter library(help = 'datasets') to general information on the other data sets.

It's also important to outline that the gg in the ggplot2 and ggvis refer to the Grammar of Graphics. That's a very important and inspiring theory that in had influenced ggplot2, ggvis, and plotly. The layered/iterative way that these packages handle plots might come from the Grammar of Graphics and makes graphics building much easier and reasonable. Learning this theory may give you heads into the process of learning these packages while learning these packages may give you heads when it comes to learn the Grammar of Graphics.