In order to have a brief data introduction, step 1 starts by calling ?cars. This is a very useful way to get to meet variables and background related to almost every data set coming from a package. Once ggplot2 requires data coming from data frames, class() function is checking if is that the case, answer is affirmative. At the end of this step head() function is checking upon the first six observations.
Moving on to step 2, after loading ggplot2, it demonstrates how to store the basic coordinate mapping and aesthetics into an object called plot1 (try it on the class() function). In order to set the basics, it uses a function (ggplot()) that initializes every single ggplot.
Storing a plot coming from ggplot2, ggvis, or plotly package into an object is optional, though very useful way to proceed.
To properly set ggplot(), start by declaring data set using data argument. After that, some basic aesthetics and coordinates are assigned. Different figures can ask and work along with different aesthetics, for the majority of cases those are named inside the aes() function.
As the books goes on you're going to get used to the ways how aesthetics can be declared-in or outside the aes() function. For now, let's acknowledged that inside aes() it's possible to call data frame variables by name and they may be displayed in legends.
Checking ?aes() shows "..." as argument, popularly known as three-dots but technically named ellipsis. It allows the user to pass an arbitrary number and variety of arguments. So as ggplot2 does lazy-evaluation (only evaluates arguments as they are requested, you could make up arguments and pass them into the aes() function with zero or only little trouble to the function. Perceive the following:
> plot1 <- ggplot(cars, aes(x = speed,y = dist, gorillaTroubleShooter = T, sight = 'Legolas'))
It would work as good as the earlier version. Just don't forget to name the arguments and you got yourself a good way to create some Easter eggs at your code (also a good way to confuse unaware developers). Both aes() and ggplot() play core roles in building graphics within this package.
Until step 2, only coordinate mapping was set at object named plot1, calling for it alone displays an empty graphic. Step 3 uses %+% to add a layer, the layer called (geom_point()) took care of fixing a geometry to the graphic. Besides the plus sign, ggplots are usually constructed by two families of functions (layers): geom_*and stat_*. While the first family comes with a fixed geometry and a default statistical transformation, the second one comes with fixed statistical transformations and a default geometry (this is grammar of graphics for real), defaults can be tweaked.
plot1 + stat_identity(geom = 'point') works just the same as step 3. Argument geom is set for 'point' as default for stat_identity(), it's fine to skip it. The reason I declared it was to reinforce that if you call for a statistical transformation you can pick the geometry and it goes the other way round (if you call for a geometry you can change the statistical transformation).
Behind the scene, geom_point() called the layer() function, which set a couple of arguments that culminated in the creation of a scatterplot. One may want to modify the axis labels and add a regression line. It can be done by simply adding more layers to the plot using the plus sign. One can stack as many layers desired, as shown next:
> plot1 + geom_point() +
> labs(x = "Speed (mpg)", y = "Distance (ft)") +
> geom_smooth(method = "lm", se = F) +
> scale_y_continuous(breaks = seq(0, 125, 25))
Result is exhibited by figure 1.3:
Figure 1.3 - Adding up several layers to a ggplot.
Combining ggplot2's sum operator (that is actually a function) and functions allows the user to make plots in a layered, iterative way. It splits complex graphics construction into several simple steps. It's also very intuitive and does not get any harder as you practice.
Yet, there are limitations. The difficulty to make interactive graphics by itselft may be one. These tasks, in the majority of the cases, are very well handled by both ggvis and plotly as stand alone packages. This leads us to steps 4 and 5.
Calling plotly::ggplotly() after bringing a ggplot up will coerce it into an interactive plot. It may fail sometimes. Do not forget to have plotly installed.
Step 4 loads ggvis package using library() and then gives birth to an interactive plot. It holds many similarities with ggplot2. Function ggvis() handles basic coordinating mapping while pipe operator (%>%) is used to add up a layer called by the layer_points() function. Remember, pipe operator and not plus sign.
ggvis understands different arguments declared using = (ever scaled) and := (never scaled). Also, ~ must come before the variable names.
Function names may change and also does the operator used to add up layers from ggplot2 to ggvis, but essentially the underlying logic keeps still. Layers coming from ggvis has several correspondences with ggplot2's ones; refer to the See also section to track some. In comparison with ggplot2, ggvis is much younger and some utilities may be yet to come, also data don't need to come from a data frame object.
Step 5 draws an interactive plotly graph. A single function (plot_ly()) takes care of coordinate mapping and geometry. It can be designed a little more layered using the add_traces() function, but there is no real need for that when the plot is too simple. Instead of having many functions demanding statistical transformations and geometries those are declared by arguments inside the main function.
These three packages, ggplot2, ggvis, and plotly, are well coded and powerful graphic packages. Right before picking one of them to handle a task do ever consider some points like:
- What the package is able to do
- Time needed to master the skill set required
- Time required to handle the task
- Amount of time available
- Time to be saved later by the thing that you learned
Base R is also a feasible possibility. Whenever you face new challenges, it is a good thing to think through these points.