3 Custom Functions

Learn to write custom functions and include them in your package.

Packages required for this chapter

library(usethis)  # for setting up data sets in a package
library(glue)     # for text joining
library(devtools) # for loading your package

3.1 Function Overview

First, let’s create a very basic function to learn about custom functions. Functions need a name (like any R object). They are created with the function() function. We’re going to make a function that rounds numbers and keeps the trailing zeroes, so let’s call it round0.

R/round0.R

round0 <- function() {
  # function code goes here
}

Most functions have arguments that set inputs to the function or options for how the function can work. These arguments can be required for the function to work, or have default values.

Check the help for the round function; we want our function to work almost the same, so it will be easier for users if the arguments have the same names in the same order, with the same default values. In the example below, the argument digits defaults to 0 unless you change it.

R/round0.R

round0 <- function(x, digits = 0) {
  # function code goes here
}

Functions use these arguments in their code to produce some kind of output (or side effect). Here, we use the value of digits to create a formatting string, and format the value of x with it using sprintf(). Finally, we use the return() function to return the value.

R/round0.R

round0 <- function(x, digits = 0) {
  fmt <- paste0("%.", digits, "f")
  x0 <- sprintf(fmt, x)
  
  return(x0)
}

Note

You technically don’t have to use the return() function. The last object created in the function code will be automatically returned. Most people don’t use return(), but that can sometimes make it hard to figure out exactly what is being returned if you have a lot of if/else logic.

The return() function also stops all subsequent code from being run.

Run the code above to define the function. After it’s defined, if you type the function name into the console, without parentheses, it will show you the code for the function.

run in the console

round0

function(x, digits = 0) {
  fmt <- paste0("%.", digits, "f")
  x0 <- sprintf(fmt, x)
  
  return(x0)
}

Note

You can do this for any function; try a few! Many of the base R functions, like mean have unsatisfying code like UseMethod("mean"), which you can lean about in the S3 Chapter of Advanced R, but other functions like sd will show you the code they use.

Use your function like any other R function.

run in the console

round0(7/3, 3)
round0(1.9999, 2)

[1] "2.333"
[1] "2.00"

However, you’ll have to define it at the top of any script that uses it, unless you add it to a package.

3.2 Function Development

For demopkg, we’re going to create a function that produces the APA-formatted text for the results of a paired-samples t-test. Here’s an example of APA format.

A paired-samples t-test was conducted to compare {dv} between {level1} (M = {mean1}, SD = {sd1}) and {level2} (M = {mean2}, SD = {sd2}). There was a {non}significant difference; t({df}) = {t_value}, p = {p_value}.

3.2.1 Specific Instance

The first step is to sort out a specific instance of your code. You can put this in a new R script for working out your code and delete it later. We’ll load in the data we added to demopkg in Section 2.1.3.

# load example data
data("self_res_att", package = "demopkg")

If you haven’t added the data to your package yet, use this code:

self_res_att <- read.csv("https://raw.githubusercontent.com/PsyTeachR/intro-r-pkgs/main/data-raw/DeBruine_2004_PRSLB_att.csv")

Next, compare preferences for self-resembling female faces (f_self) to others’ preferences for those same faces f_non using a paired-samples t-test.

# analysis
t_results <- t.test(
  x = self_res_att$f_self, 
  y = self_res_att$f_non,
  paired = TRUE)

t_results


    Paired t-test

data:  self_res_att$f_self and self_res_att$f_non
t = 3.5996, df = 107, p-value = 0.0004845
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 0.2557008 0.8825708
sample estimates:
mean difference 
      0.5691358

Now we set up the text template with variables inside curly brackets where we want to insert values from the analysis. Just make up the variable names, keeping them short but meaningful.

template <- "A paired-samples t-test was conducted to compare {dv} between {level1} (M = {mean1}, SD = {sd1}) and {level2} (M = {mean2}, SD = {sd2}). There was a {non}significant difference; t({df}) = {t_value}, p = {p_value}."

The object t_results prints out like above, but the object is actually a list, Use the str() function to see the structure of the list.

str(t_results)

List of 10
 $ statistic  : Named num 3.6
  ..- attr(*, "names")= chr "t"
 $ parameter  : Named num 107
  ..- attr(*, "names")= chr "df"
 $ p.value    : num 0.000485
 $ conf.int   : num [1:2] 0.256 0.883
  ..- attr(*, "conf.level")= num 0.95
 $ estimate   : Named num 0.569
  ..- attr(*, "names")= chr "mean difference"
 $ null.value : Named num 0
  ..- attr(*, "names")= chr "mean difference"
 $ stderr     : num 0.158
 $ alternative: chr "two.sided"
 $ method     : chr "Paired t-test"
 $ data.name  : chr "self_res_att$f_self and self_res_att$f_non"
 - attr(*, "class")= chr "htest"

Now we can set the value of each variable from the t_results object. You can’t get the means and standard deviations from the t_results object, so we’ll calculate those from the data. Round each numeric value to the appropriate level using the round0 function we created above.

glue::glue(
  template,
  dv      = "preferences for female faces",
  level1  = "participants who resembled those faces",
  level2  = "non-self participants",
  mean1   = round0(mean(self_res_att$f_self), 1), 
  sd1     = round0(sd(self_res_att$f_self), 1),
  mean2   = round0(mean(self_res_att$f_non), 1),
  sd2     = round0(sd(self_res_att$f_non), 1),
  non     = ifelse(t_results$p.value < .05, "", "non-"),
  df      = round0(t_results$parameter, 0),
  t_value = round0(t_results$statistic, 2),
  p_value = round0(t_results$p.value, 3)
)

A paired-samples t-test was conducted to compare preferences for female faces between participants who resembled those faces (M = 3.5, SD = 1.6) and non-self participants (M = 3.0, SD = 1.2). There was a significant difference; t(107) = 3.60, p = 0.000.

Note

Don’t worry about p = 0 just now. A further practice exercise is for you to add code to the function to change it to p < .001 where appropriate.

3.2.2 Set up Function

Now we’re ready to abstract this into a function. The function will need a name. This function will (for now) only give you the APA-style text for a paired-samples t-test, so let’s call it apa_t_pair.

You can create an R script in the R directory called apa_t_pair.R, but the code below does this for you.

run in the console

usethis::use_r("apa_t_pair")

We’ll develop our function in this file. To start, set up a blank function definition.

R/apa_t_pair.R

apa_t_pair <- function() {
  
}

Then add the code from your example script above inside the curly brackets.

R/apa_t_pair.R

apa_t_pair <- function() {
  t_results <- t.test(self_res_att$f_self, 
                      self_res_att$f_non,
                      paired = TRUE)
  
  template <- "A paired-samples t-test was conducted to compare {dv} between {level1} (M = {mean1}, SD = {sd1}) and {level2} (M = {mean2}, SD = {sd2}). There was a {non}significant difference; t({df}) = {t_value}, p = {p_value}."
  
  glue::glue(
    template,
    dv      = "preferences for female faces",
    level1  = "participants who resembled those faces",
    level2  = "non-self participants",
    mean1   = round0(mean(self_res_att$f_self), 1), 
    sd1     = round0(sd(self_res_att$f_self), 1),
    mean2   = round0(mean(self_res_att$f_non), 1),
    sd2     = round0(sd(self_res_att$f_non), 1),
    non     = ifelse(t_results$p.value < .05, "", "non-"),
    df      = round0(t_results$parameter, 0),
    t_value = round0(t_results$statistic, 2),
    p_value = round0(t_results$p.value, 3)
  )
}

Skip a few lines and copy the round0 function definition below this one. You don’t have to define functions in any particular order, as long as all function definitions are run before you try to use them. Run all the code in this file and test that the function works by running it once in the console.

run in the console

apa_t_pair()

A paired-samples t-test was conducted to compare preferences for female faces between participants who resembled those faces (M = 3.5, SD = 1.6) and non-self participants (M = 3.0, SD = 1.2). There was a significant difference; t(107) = 3.60, p = 0.000.

Warning

If you get the message: Error in apa_t_pair() : could not find function "apa_t_pair", this means you didn’t run the code that created the function.

3.2.3 Add Arguments

We probably want this function to work for any pair of vectors we give it, not just the value of f_self and f_non. So we need to add arguments to the function. Add arguments x and y to the function and replace self_res_att$f_self with x and self_res_att$f_self with y everywhere in the function.

R/apa_t_pair.R

apa_t_pair <- function(x, y) {
  t_results <- t.test(x, y, paired = TRUE)
  
  template <- "A paired-samples t-test was conducted to compare {dv} between {level1} (M = {mean1}, SD = {sd1}) and {level2} (M = {mean2}, SD = {sd2}). There was a {non}significant difference; t({df}) = {t_value}, p = {p_value}."
  
  glue::glue(
    template,
    dv      = "preferences for female faces",
    level1  = "participants who resembled those faces",
    level2  = "non-self participants",
    mean1   = round0(mean(x), 1), 
    sd1     = round0(sd(x), 1),
    mean2   = round0(mean(y), 1),
    sd2     = round0(sd(y), 1),
    non     = ifelse(t_results$p.value < .05, "", "non-"),
    df      = round0(t_results$parameter, 0),
    t_value = round0(t_results$statistic,2),
    p_value = round0(t_results$p.value, 3)
  )
}

Now, if you try to run the function without any arguments, you’ll get an error message. This is because there are no default values for x and y.

apa_t_pair()

Error in t.test(x, y, paired = TRUE): argument "x" is missing, with no default

This also won’t work.

x <- self_res_att$f_self
y <- self_res_att$f_non
apa_t_pair()

Error in t.test(x, y, paired = TRUE): argument "x" is missing, with no default

This is because the x and y inside of the function are in a different environment to any x and y outside of the function. This can seem confusing at first, but it’s good that you don’t need to worry about objects that exist outside of your function.

You can specify the vectors as arguments.

apa_t_pair(x = self_res_att$f_self, 
           y = self_res_att$f_non)

A paired-samples t-test was conducted to compare preferences for female faces between participants who resembled those faces (M = 3.5, SD = 1.6) and non-self participants (M = 3.0, SD = 1.2). There was a significant difference; t(107) = 3.60, p = 0.000.

3.2.4 Further Arguments

Now we can further customise our function. You probably won’t always be comparing “preferences for female faces” between “participants who resembled those faces” and “non-self participants”, so let’s add three new arguments to the function. We can set generic default values for these new arguments so that you don’t have to specify them if the defaults are OK.

Since the values are defined with the variable names used in the glue template, we don’t need to specify those in the glue() function anymore.

R/apa_t_pair.R

apa_t_pair <- function(x, y, 
                       dv = "the DV", 
                       level1 = "level 1", 
                       level2 = "level 2") {
  t_results <- t.test(x, y, paired = TRUE)
  
  template <- "A paired-samples t-test was conducted to compare {dv} between {level1} (M = {mean1}, SD = {sd1}) and {level2} (M = {mean2}, SD = {sd2}). There was a {non}significant difference; t({df}) = {t_value}, p = {p_value}."
  
  glue::glue(
    template,
    mean1   = round0(mean(x), 1), 
    sd1     = round0(sd(x), 1),
    mean2   = round0(mean(y), 1),
    sd2     = round0(sd(y), 1),
    non     = ifelse(t_results$p.value < .05, "", "non-"),
    df      = round0(t_results$parameter, 0),
    t_value = round0(t_results$statistic,2),
    p_value = round0(t_results$p.value, 3)
  )
}

Try running the function both with and without the new arguments.

apa_t_pair(x = self_res_att$f_self, 
           y = self_res_att$f_non)

A paired-samples t-test was conducted to compare the DV between level 1 (M = 3.5, SD = 1.6) and level 2 (M = 3.0, SD = 1.2). There was a significant difference; t(107) = 3.60, p = 0.000.

apa_t_pair(x = self_res_att$f_self, 
           y = self_res_att$f_non,
           dv = "preferences for female faces",
           level1 = "participants who resembled those faces",
           level2 = "non-self participants")

A paired-samples t-test was conducted to compare preferences for female faces between participants who resembled those faces (M = 3.5, SD = 1.6) and non-self participants (M = 3.0, SD = 1.2). There was a significant difference; t(107) = 3.60, p = 0.000.

Warning

If the output doesn’t change, this usually means that you forgot to run the code that re-defined apa_t_pair.

3.3 Load in package

3.3.1 Import dependencies

You need to “import” any non-base packages that you use in a function. These are called “dependencies” because your function depends on them. Our function above uses glue() from the glue package. The function below is a quick way to add a dependency.

run in the console

usethis::use_package("glue")

You should see this output:

✔ Setting active project to '/Users/lisad/rproj/demopkg'
✔ Adding 'glue' to Imports field in DESCRIPTION
• Refer to functions with `glue::fun()`

This means that you should always use the full form glue::glue(), rather than loading a package with the library() function and using just the function name glue().

You can open the DESCRIPTION file to see what has changed. Alternatively, you can manually add dependencies to this file under “Imports:”.

3.3.2 Load

Now restart R and make sure that your environment is clear. Run the following code to load your package. You can also use a keyboard shortcut to run this function (Mac: cmd-shift-L, Windows: ctl-shift-L).

run in the console

devtools::load_all(".")

The function should now be available.

apa_t_pair(x = self_res_att$f_self, 
           y = self_res_att$f_non,
           dv = "preferences for female faces",
           level1 = "participants who resembled those faces",
           level2 = "non-self participants")

A paired-samples t-test was conducted to compare preferences for female faces between participants who resembled those faces (M = 3.5, SD = 1.6) and non-self participants (M = 3.0, SD = 1.2). There was a significant difference; t(107) = 3.60, p = 0.000.

And it should be easy to adapt for other pairs of values, such as the equivalent analysis for male faces.

apa_t_pair(x = self_res_att$m_self, 
           y = self_res_att$m_non,
           dv = "preferences for male faces",
           level1 = "participants who resembled those faces",
           level2 = "non-self participants")

A paired-samples t-test was conducted to compare preferences for male faces between participants who resembled those faces (M = 3.5, SD = 1.4) and non-self participants (M = 2.9, SD = 1.0). There was a significant difference; t(107) = 3.89, p = 0.000.

3.3.3 Check

Run the CMD check using devtools::check() or by clicking the Check icon in the Build tab. There will be a lot of grey output text, and hopefully a lot of green checkmarks. But at the end, you’ll probably get a message like this:

❯ checking R code for possible problems ... NOTE
  apa_t_pair: no visible global function definition for ‘t.test’
  apa_t_pair: no visible global function definition for ‘sd’
  Undefined global functions or variables:
    sd t.test
  Consider adding
    importFrom("stats", "sd", "t.test")
  to your NAMESPACE file.

0 errors ✔ | 0 warnings ✔ | 1 note ✖

R CMD check succeeded

The no visible global function definition note means that we’ve used some functions that aren’t from our own package and we haven’t specified what package they are from. t.test and sd are functions from the stats package, which is automatically loaded when you start up R, but still needs to be added as a dependency.

You can add the stats:: prefix to t.test and sd and add the dependency using usethis::use_package("stats").

Note

The part that suggests you add importFrom("stats", "sd", "t.test") to the NAMESAPCE file is a way for you to add specific functions from another package to your package so you can use them without specifying the package name first. However, that specific instruction should be ignored because you should never edit the NAMESPACE file yourself. Instead use roxygen to set this up, which will be explained in Section 4.3.

3.3.4 Install

When you’re developing a package, you usually “load” it using devtools::load_all(".") to be able to access all of the functions in the package for testing and development. This way, the development package is only available during the current session in this project. You will load it every time you make some changes to the package.

If you want your package to be available outside of project sessions where you’ve explicitly loaded it, you need to install the package using devtools::install() or the Install button in the Build pane.

After your package is installed, make sure the environment is clear, and try the following code:

round0(1, 3)

Error in round0(1, 3): could not find function "round0"

This is because round0 is an internal, non-exported function, so only the developer (you) is supposed to be able to use it. Technically, you can also access internal functions using the triple-colon.

demopkg:::round0(1, 3)

[1] "1.000"

3.4 Glossary

term	definition
argument	A variable that provides input to a function.
default-value	A value that a function uses for an argument if it is skipped.
environment	A data structure that contains R objects such as variables and functions
function	A named section of code that can be reused.
object	A word that identifies and stores the value of some data for later use.

3.5 Further Resources

Functions from Wickham (2019)

3.6 Further Practice

Add an argument called alpha that allows the user to set an alpha criterion for determining significance. Make the default value 0.05.
Edit the function to handle p-values < .001.
Add 95% confidence intervals to the output.
Allow the user to set a custom confidence interval. Give this a sensible default value.
Create another function to produce the text for a different analysis you’re familiar with in R, such as an ANOVA or correlation.