Functions
Functions are bits of code that perform a particular task and print or return its output to an object. Writing functions are particularly useful to avoid rewriting code over and over in your program; instead, you can write a function and every time you would like to perform that particular task, you can call that function. In fact, all the code we used so far in our examples call built-in or third-party R package functions.
For example, we ask for the mean of x
using the following code:
> x <- c(2, 6, 7, 12) > mean(x) [1] 6.75
In the preceding code, we are actually asking R to call the mean()
function. Each function takes arguments. If you would like to know what arguments could be passed to a particular R function, you can consult the help page. There are several ways to access the help documentation in R. First, you can use the help()
function as follows:
> help(mean) Description Generic function for the (trimmed) arithmetic mean. Usage mean(x, ...) ## Default S3 method: mean(x, trim = 0, na.rm = FALSE, ...) Arguments x An R object. Currently there are methods for numeric/logical vectors and date, date-time, and time interval objects. Complex vectors are allowed for trim = 0, only. trim the fraction (0 to 0.5) of observations to be trimmed from each end of x before the mean is computed. Values of trim outside that range are taken as the nearest endpoint. na.rm a logical value indicating whether NA values should be stripped before the computation proceeds. ... further arguments passed to or from other methods. […]
Alternatively, you can use the ?
symbol to obtain the documentation page for the mean function as follows:
> ?mean #Returns the same output as above
Alternatively, you may also want to search all the help topics as shown in the following screenshot for the mean
word with the ??
symbol as follows:
> ??mean
As you can see in the preceding screenshot, R returns a table of all the search results matching the word "mean"
for all the packages you have installed on your computer.
The help page is very useful because it tells you what type of object the function takes as input and a list of all the arguments it takes. The help page also informs you of the default settings used for all the arguments the function takes. By consulting the help page for the mean()
function, you learn that the default settings are trim=0
and na.rm=FALSE
. With trim
set to 0
, no observations or values are removed prior to calculating the mean, and with na.rm
set to FALSE
, all NA
entries are not removed before calculating the mean. Consider the following example:
> x <- c(2, 6, 7, 12, NA, NA) > mean(x) [1] NA
If we specify na.rm=TRUE
, the NA
entries are ignored as follows:
> mean(x, na.rm=TRUE) [1] 6.75
So far, we have been changing default parameters by explicitly specifying which arguments to change, that is, na.rm=TRUE
. However, R also allows you to change default parameters using the argument position only. This means we can rewrite the last command as follows:
> #notice "," is used to specify unchanged missing arguments in the order they appear in the function definition on the help page > mean(x, ,TRUE) [1] 6.75
This also holds true for the functions you may write as well. Let's write a simple function called vectorContains()
to test whether a vector contains the number 3. To define a function in R, we write the word function
and our list of arguments contained in parenthesis () followed by curly braces that contains the sequence of commands we want our function to execute. For example, let's write a function to check whether the value 3 is present in a vector. Here are the steps we will take to write a function to check whether a value (in this case, 3) is present in an input vector:
- We create a function called
vectorContains
and use an argument (variable)value.to.check
to store the value we want to check. - We check that the input object type is numeric using the
is.numeric()
function. - We ensure that there are no missing (NA) values using the
any()
andis.na()
functions. Theany()
function will check each entry and theis.na()
function will returnTRUE
ifNA
is present. Because we want to returnTRUE
when there is noNA
present instead of when anNA
is present, we use the!
sign before theany(is.na())
command. - We use an
if else {}
statement to return an error message if the vector isn't numeric and/or containsNA
values using thestop()
function. - We create an object
value.found
to keep track of whether the value to be checked is found. We initially setvalue.found
toFALSE
because we assume the value is not present. - We check each value of our input vector using a
for()
loop. If an element (i
) of our vector matchesvalue.to.check
, we setvalue.found
to"yes"
and break out of thefor()
loop. - Depending on whether
value.found
is set to"yes"
or"no"
, we returnTRUE
orFALSE
as follows:> vectorContains <- function(v1, value.to.check=3){ if(is.numeric(v1) && !any(is.na(v1))) { value.found <- "no" for (i in v1){ if(i == value.to.check) { value.found <- "yes" break } } if(value.found == "yes") { return(TRUE) } else { return(FALSE) } } else { #When it exits the function it will print the following error message stop("This function takes a numeric vector without NAs as input.") } }
Now, let's test our function as follows:
> x <- c(2, 6, 7, 12, NA, NA) > vectorContains(x) Error in vectorContains(x) : This function takes a numeric vector without NAs as input. > y <- c(1, 4, 6, 8, 3, 12, 15) > vectorContains(y) [1] TRUE
Suppose we want to test whether a vector contains the value 6 instead of 3, we can easily change the default value.to.check
from 3
to 6
, as follows:
> vectorContains(y, 6) [1] TRUE > vectorContains(y, value.to.check=17) [1] FALSE
Hopefully, in the preceding example, you can see that the beauty of writing functions instead of individual commands because you can reuse this function to check whether a vector contains any particular value. Moreover, by saving these lines of code to a text document (for example, vectorfunction.R
), you can reload this function in a later session using the source()
command instead of rewriting the function, as follows:
> source("/PathToFile/vectorfunction.R")