3 Custom Functions
Learn to write custom functions and include them in your package.
3.1 Function Overview
First, let’s create a very basic function to learn about custom functions. Functions need a name (like any R object). They are created with the function()
function. We’re going to make a function that rounds numbers and keeps the trailing zeroes, so let’s call it round0
.
Most functions have arguments that set inputs to the function or options for how the function can work. These arguments can be required for the function to work, or have default values.
Check the help for the round
function; we want our function to work almost the same, so it will be easier for users if the arguments have the same names in the same order, with the same default values. In the example below, the argument digits
defaults to 0 unless you change it.
Functions use these arguments in their code to produce some kind of output (or side effect). Here, we use the value of digits
to create a formatting string, and format the value of x
with it using sprintf()
. Finally, we use the return()
function to return the value.
You technically don’t have to use the return()
function. The last object created in the function code will be automatically returned. Most people don’t use return()
, but that can sometimes make it hard to figure out exactly what is being returned if you have a lot of if/else logic.
The return()
function also stops all subsequent code from being run.
Run the code above to define the function. After it’s defined, if you type the function name into the console, without parentheses, it will show you the code for the function.
function(x, digits = 0) {
fmt <- paste0("%.", digits, "f")
x0 <- sprintf(fmt, x)
return(x0)
}
You can do this for any function; try a few! Many of the base R functions, like mean
have unsatisfying code like UseMethod("mean")
, which you can lean about in the S3 Chapter of Advanced R, but other functions like sd
will show you the code they use.
Use your function like any other R function.
However, you’ll have to define it at the top of any script that uses it, unless you add it to a package.
3.2 Function Development
For demopkg
, we’re going to create a function that produces the APA-formatted text for the results of a paired-samples t-test. Here’s an example of APA format.
A paired-samples t-test was conducted to compare {dv} between {level1} (M = {mean1}, SD = {sd1}) and {level2} (M = {mean2}, SD = {sd2}). There was a {non}significant difference; t({df}) = {t_value}, p = {p_value}.
3.2.1 Specific Instance
The first step is to sort out a specific instance of your code. You can put this in a new R script for working out your code and delete it later. We’ll load in the data we added to demopkg in Section 2.1.3.
If you haven’t added the data to your package yet, use this code:
Next, compare preferences for self-resembling female faces (f_self
) to others’ preferences for those same faces f_non
using a paired-samples t-test.
# analysis
t_results <- t.test(
x = self_res_att$f_self,
y = self_res_att$f_non,
paired = TRUE)
t_results
Paired t-test
data: self_res_att$f_self and self_res_att$f_non
t = 3.5996, df = 107, p-value = 0.0004845
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
0.2557008 0.8825708
sample estimates:
mean difference
0.5691358
Now we set up the text template with variables inside curly brackets where we want to insert values from the analysis. Just make up the variable names, keeping them short but meaningful.
The object t_results
prints out like above, but the object is actually a list, Use the str()
function to see the structure of the list.
List of 10
$ statistic : Named num 3.6
..- attr(*, "names")= chr "t"
$ parameter : Named num 107
..- attr(*, "names")= chr "df"
$ p.value : num 0.000485
$ conf.int : num [1:2] 0.256 0.883
..- attr(*, "conf.level")= num 0.95
$ estimate : Named num 0.569
..- attr(*, "names")= chr "mean difference"
$ null.value : Named num 0
..- attr(*, "names")= chr "mean difference"
$ stderr : num 0.158
$ alternative: chr "two.sided"
$ method : chr "Paired t-test"
$ data.name : chr "self_res_att$f_self and self_res_att$f_non"
- attr(*, "class")= chr "htest"
Now we can set the value of each variable from the t_results
object. You can’t get the means and standard deviations from the t_results
object, so we’ll calculate those from the data. Round each numeric value to the appropriate level using the round0
function we created above.
glue::glue(
template,
dv = "preferences for female faces",
level1 = "participants who resembled those faces",
level2 = "non-self participants",
mean1 = round0(mean(self_res_att$f_self), 1),
sd1 = round0(sd(self_res_att$f_self), 1),
mean2 = round0(mean(self_res_att$f_non), 1),
sd2 = round0(sd(self_res_att$f_non), 1),
non = ifelse(t_results$p.value < .05, "", "non-"),
df = round0(t_results$parameter, 0),
t_value = round0(t_results$statistic, 2),
p_value = round0(t_results$p.value, 3)
)
A paired-samples t-test was conducted to compare preferences for female faces between participants who resembled those faces (M = 3.5, SD = 1.6) and non-self participants (M = 3.0, SD = 1.2). There was a significant difference; t(107) = 3.60, p = 0.000.
Don’t worry about p = 0
just now. A further practice exercise is for you to add code to the function to change it to p < .001
where appropriate.
3.2.2 Set up Function
Now we’re ready to abstract this into a function. The function will need a name. This function will (for now) only give you the APA-style text for a paired-samples t-test, so let’s call it apa_t_pair
.
You can create an R script in the R
directory called apa_t_pair.R
, but the code below does this for you.
We’ll develop our function in this file. To start, set up a blank function definition.
Then add the code from your example script above inside the curly brackets.
R/apa_t_pair.R
apa_t_pair <- function() {
t_results <- t.test(self_res_att$f_self,
self_res_att$f_non,
paired = TRUE)
template <- "A paired-samples t-test was conducted to compare {dv} between {level1} (M = {mean1}, SD = {sd1}) and {level2} (M = {mean2}, SD = {sd2}). There was a {non}significant difference; t({df}) = {t_value}, p = {p_value}."
glue::glue(
template,
dv = "preferences for female faces",
level1 = "participants who resembled those faces",
level2 = "non-self participants",
mean1 = round0(mean(self_res_att$f_self), 1),
sd1 = round0(sd(self_res_att$f_self), 1),
mean2 = round0(mean(self_res_att$f_non), 1),
sd2 = round0(sd(self_res_att$f_non), 1),
non = ifelse(t_results$p.value < .05, "", "non-"),
df = round0(t_results$parameter, 0),
t_value = round0(t_results$statistic, 2),
p_value = round0(t_results$p.value, 3)
)
}
Skip a few lines and copy the round0
function definition below this one. You don’t have to define functions in any particular order, as long as all function definitions are run before you try to use them. Run all the code in this file and test that the function works by running it once in the console.
A paired-samples t-test was conducted to compare preferences for female faces between participants who resembled those faces (M = 3.5, SD = 1.6) and non-self participants (M = 3.0, SD = 1.2). There was a significant difference; t(107) = 3.60, p = 0.000.
If you get the message: Error in apa_t_pair() : could not find function "apa_t_pair"
, this means you didn’t run the code that created the function.
3.2.3 Add Arguments
We probably want this function to work for any pair of vectors we give it, not just the value of f_self
and f_non
. So we need to add arguments to the function. Add arguments x
and y
to the function and replace self_res_att$f_self
with x
and self_res_att$f_self
with y
everywhere in the function.
R/apa_t_pair.R
apa_t_pair <- function(x, y) {
t_results <- t.test(x, y, paired = TRUE)
template <- "A paired-samples t-test was conducted to compare {dv} between {level1} (M = {mean1}, SD = {sd1}) and {level2} (M = {mean2}, SD = {sd2}). There was a {non}significant difference; t({df}) = {t_value}, p = {p_value}."
glue::glue(
template,
dv = "preferences for female faces",
level1 = "participants who resembled those faces",
level2 = "non-self participants",
mean1 = round0(mean(x), 1),
sd1 = round0(sd(x), 1),
mean2 = round0(mean(y), 1),
sd2 = round0(sd(y), 1),
non = ifelse(t_results$p.value < .05, "", "non-"),
df = round0(t_results$parameter, 0),
t_value = round0(t_results$statistic,2),
p_value = round0(t_results$p.value, 3)
)
}
Now, if you try to run the function without any arguments, you’ll get an error message. This is because there are no default values for x
and y
.
This also won’t work.
Error in t.test(x, y, paired = TRUE): argument "x" is missing, with no default
This is because the x
and y
inside of the function are in a different environment to any x
and y
outside of the function. This can seem confusing at first, but it’s good that you don’t need to worry about objects that exist outside of your function.
You can specify the vectors as arguments.
A paired-samples t-test was conducted to compare preferences for female faces between participants who resembled those faces (M = 3.5, SD = 1.6) and non-self participants (M = 3.0, SD = 1.2). There was a significant difference; t(107) = 3.60, p = 0.000.
3.2.4 Further Arguments
Now we can further customise our function. You probably won’t always be comparing “preferences for female faces” between “participants who resembled those faces” and “non-self participants”, so let’s add three new arguments to the function. We can set generic default values for these new arguments so that you don’t have to specify them if the defaults are OK.
Since the values are defined with the variable names used in the glue template, we don’t need to specify those in the glue()
function anymore.
R/apa_t_pair.R
apa_t_pair <- function(x, y,
dv = "the DV",
level1 = "level 1",
level2 = "level 2") {
t_results <- t.test(x, y, paired = TRUE)
template <- "A paired-samples t-test was conducted to compare {dv} between {level1} (M = {mean1}, SD = {sd1}) and {level2} (M = {mean2}, SD = {sd2}). There was a {non}significant difference; t({df}) = {t_value}, p = {p_value}."
glue::glue(
template,
mean1 = round0(mean(x), 1),
sd1 = round0(sd(x), 1),
mean2 = round0(mean(y), 1),
sd2 = round0(sd(y), 1),
non = ifelse(t_results$p.value < .05, "", "non-"),
df = round0(t_results$parameter, 0),
t_value = round0(t_results$statistic,2),
p_value = round0(t_results$p.value, 3)
)
}
Try running the function both with and without the new arguments.
A paired-samples t-test was conducted to compare the DV between level 1 (M = 3.5, SD = 1.6) and level 2 (M = 3.0, SD = 1.2). There was a significant difference; t(107) = 3.60, p = 0.000.
apa_t_pair(x = self_res_att$f_self,
y = self_res_att$f_non,
dv = "preferences for female faces",
level1 = "participants who resembled those faces",
level2 = "non-self participants")
A paired-samples t-test was conducted to compare preferences for female faces between participants who resembled those faces (M = 3.5, SD = 1.6) and non-self participants (M = 3.0, SD = 1.2). There was a significant difference; t(107) = 3.60, p = 0.000.
If the output doesn’t change, this usually means that you forgot to run the code that re-defined apa_t_pair
.
3.3 Load in package
3.3.1 Import dependencies
You need to “import” any non-base packages that you use in a function. These are called “dependencies” because your function depends on them. Our function above uses glue()
from the glue package. The function below is a quick way to add a dependency.
You should see this output:
✔ Setting active project to '/Users/lisad/rproj/demopkg'
✔ Adding 'glue' to Imports field in DESCRIPTION
• Refer to functions with `glue::fun()`
This means that you should always use the full form glue::glue()
, rather than loading a package with the library()
function and using just the function name glue()
.
You can open the DESCRIPTION
file to see what has changed. Alternatively, you can manually add dependencies to this file under “Imports:”.
3.3.2 Load
Now restart R and make sure that your environment is clear. Run the following code to load your package. You can also use a keyboard shortcut to run this function (Mac: cmd-shift-L, Windows: ctl-shift-L).
The function should now be available.
apa_t_pair(x = self_res_att$f_self,
y = self_res_att$f_non,
dv = "preferences for female faces",
level1 = "participants who resembled those faces",
level2 = "non-self participants")
A paired-samples t-test was conducted to compare preferences for female faces between participants who resembled those faces (M = 3.5, SD = 1.6) and non-self participants (M = 3.0, SD = 1.2). There was a significant difference; t(107) = 3.60, p = 0.000.
And it should be easy to adapt for other pairs of values, such as the equivalent analysis for male faces.
apa_t_pair(x = self_res_att$m_self,
y = self_res_att$m_non,
dv = "preferences for male faces",
level1 = "participants who resembled those faces",
level2 = "non-self participants")
A paired-samples t-test was conducted to compare preferences for male faces between participants who resembled those faces (M = 3.5, SD = 1.4) and non-self participants (M = 2.9, SD = 1.0). There was a significant difference; t(107) = 3.89, p = 0.000.
3.3.3 Check
Run the CMD check using devtools::check()
or by clicking the Check icon in the Build tab. There will be a lot of grey output text, and hopefully a lot of green checkmarks. But at the end, you’ll probably get a message like this:
❯ checking R code for possible problems ... NOTE
apa_t_pair: no visible global function definition for ‘t.test’
apa_t_pair: no visible global function definition for ‘sd’
Undefined global functions or variables:
sd t.test
Consider adding
importFrom("stats", "sd", "t.test")
to your NAMESPACE file.
0 errors ✔ | 0 warnings ✔ | 1 note ✖
R CMD check succeeded
The no visible global function definition
note means that we’ve used some functions that aren’t from our own package and we haven’t specified what package they are from. t.test
and sd
are functions from the stats
package, which is automatically loaded when you start up R, but still needs to be added as a dependency.
You can add the stats::
prefix to t.test
and sd
and add the dependency using usethis::use_package("stats")
.
The part that suggests you add importFrom("stats", "sd", "t.test")
to the NAMESAPCE file is a way for you to add specific functions from another package to your package so you can use them without specifying the package name first. However, that specific instruction should be ignored because you should never edit the NAMESPACE file yourself. Instead use roxygen to set this up, which will be explained in Section 4.3.
3.3.4 Install
When you’re developing a package, you usually “load” it using devtools::load_all(".")
to be able to access all of the functions in the package for testing and development. This way, the development package is only available during the current session in this project. You will load it every time you make some changes to the package.
If you want your package to be available outside of project sessions where you’ve explicitly loaded it, you need to install the package using devtools::install()
or the Install button in the Build pane.
After your package is installed, make sure the environment is clear, and try the following code:
This is because round0
is an internal, non-exported function, so only the developer (you) is supposed to be able to use it. Technically, you can also access internal functions using the triple-colon.
3.4 Glossary
term | definition |
---|---|
argument | A variable that provides input to a function. |
default-value | A value that a function uses for an argument if it is skipped. |
environment | A data structure that contains R objects such as variables and functions |
function | A named section of code that can be reused. |
object | A word that identifies and stores the value of some data for later use. |
3.5 Further Resources
3.6 Further Practice
Add an argument called
alpha
that allows the user to set an alpha criterion for determining significance. Make the default value 0.05.Edit the function to handle p-values < .001.
Add 95% confidence intervals to the output.
Allow the user to set a custom confidence interval. Give this a sensible default value.
Create another function to produce the text for a different analysis you’re familiar with in R, such as an ANOVA or correlation.