# Chapter 7 Iteration & Functions

## 7.1 Learning Objectives

You will learn about functions and iteration by using simulation to calculate a power analysis for an independent samples t-test.

### 7.1.1 Basic

- Work with basic iteration functions
`rep`

,`seq`

,`replicate`

(video) - Use
`map()`

and`apply()`

functions (video) - Write your own custom functions with
`function()`

(video) - Set default values for the arguments in your functions

### 7.1.2 Intermediate

- Understand scope
- Use error handling and warnings in a function

### 7.1.3 Advanced

The topics below are not (yet) covered in these materials, but they are directions for independent learning.

- Repeat commands having multiple arguments using
`purrr::map2_*()`

and`purrr::pmap_*()`

- Create
**nested data frames**using`dplyr::group_by()`

and`tidyr::nest()`

- Work with
**nested data frames**in`dplyr`

- Capture and deal with errors using ‘adverb’ functions
`purrr::safely()`

and`purrr::possibly()`

## 7.2 Resources

- Chapters 19 and 21 of R for Data Science
- RStudio Apply Functions Cheatsheet

In the next two lectures, we are going to learn more about iteration (doing the same commands over and over) and custom functions through a data simulation exercise, which will also lead us into more traditional statistical topics.

## 7.3 Setup

```
# libraries needed for these examples
library(tidyverse) ## contains purrr, tidyr, dplyr
library(broom) ## converts test output to tidy tables
set.seed(8675309) # makes sure random numbers are reproducible
```

## 7.4 Iteration functions

We first learned about the two basic iteration functions, `rep()`

and `seq()`

in the Working with Data chapter.

### 7.4.1 rep()

The function `rep()`

lets you repeat the first argument a number of times.

Use `rep()`

to create a vector of alternating `"A"`

and `"B"`

values of length 24.

`rep(c("A", "B"), 12)`

```
## [1] "A" "B" "A" "B" "A" "B" "A" "B" "A" "B" "A" "B" "A" "B" "A" "B" "A" "B" "A"
## [20] "B" "A" "B" "A" "B"
```

If you don’t specify what the second argument is, it defaults to `times`

, repeating the vector in the first argument that many times. Make the same vector as above, setting the second argument explicitly.

`rep(c("A", "B"), times = 12)`

```
## [1] "A" "B" "A" "B" "A" "B" "A" "B" "A" "B" "A" "B" "A" "B" "A" "B" "A" "B" "A"
## [20] "B" "A" "B" "A" "B"
```

If the second argument is a vector that is the same length as the first argument, each element in the first vector is repeated than many times. Use `rep()`

to create a vector of 11 `"A"`

values followed by 3 `"B"`

values.

`rep(c("A", "B"), c(11, 3))`

`## [1] "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "B" "B" "B"`

You can repeat each element of the vector a sepcified number of times using the `each`

argument, Use `rep()`

to create a vector of 12 `"A"`

values followed by 12 `"B"`

values.

`rep(c("A", "B"), each = 12)`

```
## [1] "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "B" "B" "B" "B" "B" "B" "B"
## [20] "B" "B" "B" "B" "B"
```

What do you think will happen if you set both `times`

to 3 and `each`

to 2?

`rep(c("A", "B"), times = 3, each = 2)`

`## [1] "A" "A" "B" "B" "A" "A" "B" "B" "A" "A" "B" "B"`

### 7.4.2 seq()

The function `seq()`

is useful for generating a sequence of numbers with some pattern.

Use `seq()`

to create a vector of the integers 0 to 10.

`seq(0, 10)`

`## [1] 0 1 2 3 4 5 6 7 8 9 10`

You can set the `by`

argument to count by numbers other than 1 (the default). Use `seq()`

to create a vector of the numbers 0 to 100 by 10s.

`seq(0, 100, by = 10)`

`## [1] 0 10 20 30 40 50 60 70 80 90 100`

The argument `length.out`

is useful if you know how many steps you want to divide something into. Use `seq()`

to create a vector that starts with 0, ends with 100, and has 12 equally spaced steps (hint: how many numbers would be in a vector with 2 *steps*?).

`seq(0, 100, length.out = 13)`

```
## [1] 0.000000 8.333333 16.666667 25.000000 33.333333 41.666667
## [7] 50.000000 58.333333 66.666667 75.000000 83.333333 91.666667
## [13] 100.000000
```

### 7.4.3 replicate()

You can use the `replicate()`

function to run a function `n`

times.

For example, you can get 3 sets of 5 numbers from a random normal distribution by setting `n`

to `3`

and `expr`

to `rnorm(5)`

.

`replicate(n = 3, expr = rnorm(5))`

```
## [,1] [,2] [,3]
## [1,] -0.9965824 0.98721974 -1.5495524
## [2,] 0.7218241 0.02745393 1.0226378
## [3,] -0.6172088 0.67287232 0.1500832
## [4,] 2.0293916 0.57206650 -0.6599640
## [5,] 1.0654161 0.90367770 -0.9945890
```

By default, `replicate()`

simplifies your result into a matrix that is easy to convert into a table if your function returns vectors that are the same length. If you’d rather have a list of vectors, set `simplify = FALSE`

.

`replicate(n = 3, expr = rnorm(5), simplify = FALSE)`

```
## [[1]]
## [1] 1.9724587 -0.4418016 -0.9006372 -0.1505882 -0.8278942
##
## [[2]]
## [1] 1.98582582 0.04400503 -0.40428231 -0.47299855 -0.41482324
##
## [[3]]
## [1] 0.6832342 0.6902011 0.5334919 -0.1861048 0.3829458
```

### 7.4.4 map() and apply() functions

`purrr::map()`

and `lapply()`

return a list of the same length as a vector or list, each element of which is the result of applying a function to the corresponding element. They function much the same, but purrr functions have some optimisations for working with the tidyverse. We’ll be working mostly with purrr functions in this course, but apply functions are very common in code that you might see in examples on the web.

Imagine you want to calculate the power for a two-sample t-test with a mean difference of 0.2 and SD of 1, for all the sample sizes 100 to 1000 (by 100s). You could run the `power.t.test()`

function 20 times and extract the values for “power” from the resulting list and put it in a table.

```
<- power.t.test(n = 100, delta = 0.2, sd = 1, type="two.sample")
p100 # 18 more lines
<- power.t.test(n = 500, delta = 0.2, sd = 1, type="two.sample")
p1000
tibble(
n = c(100, "...", 1000),
power = c(p100$power, "...", p1000$power)
)
```

n | power |
---|---|

100 | 0.290266404572217 |

… | … |

1000 | 0.884788352886661 |

However, the `apply()`

and `map()`

functions allow you to perform a function on each item in a vector or list. First make an object `n`

that is the vector of the sample sizes you want to test, then use `lapply()`

or `map()`

to run the function `power.t.test()`

on each item. You can set other arguments to `power.t.test()`

after the function argument.

```
<- seq(100, 1000, 100)
n <- lapply(n, power.t.test,
pcalc delta = 0.2, sd = 1, type="two.sample")
# or
<- purrr::map(n, power.t.test,
pcalc delta = 0.2, sd = 1, type="two.sample")
```

These functions return a list where each item is the result of `power.t.test()`

, which returns a list of results that includes the named item “power.” This is a special list that has a summary format if you just print it directly:

`1]] pcalc[[`

```
##
## Two-sample t test power calculation
##
## n = 100
## delta = 0.2
## sd = 1
## sig.level = 0.05
## power = 0.2902664
## alternative = two.sided
##
## NOTE: n is number in *each* group
```

But you can see the individual items using the `str()`

function.

`1]] %>% str() pcalc[[`

```
## List of 8
## $ n : num 100
## $ delta : num 0.2
## $ sd : num 1
## $ sig.level : num 0.05
## $ power : num 0.29
## $ alternative: chr "two.sided"
## $ note : chr "n is number in *each* group"
## $ method : chr "Two-sample t test power calculation"
## - attr(*, "class")= chr "power.htest"
```

`sapply()`

is a version of `lapply()`

that returns a vector or array instead of a list, where appropriate. The corresponding purrr functions are `map_dbl()`

, `map_chr()`

, `map_int()`

and `map_lgl()`

, which return vectors with the corresponding data type.

You can extract a value from a list with the function `[[`

. You usually see this written as `pcalc[[1]]`

, but if you put it inside backticks, you can use it in apply and map functions.

`sapply(pcalc, `[[`, "power")`

```
## [1] 0.2902664 0.5140434 0.6863712 0.8064964 0.8847884 0.9333687 0.9623901
## [8] 0.9792066 0.9887083 0.9939638
```

We use `map_dbl()`

here because the value for “power” is a double.

`::map_dbl(pcalc, `[[`, "power") purrr`

```
## [1] 0.2902664 0.5140434 0.6863712 0.8064964 0.8847884 0.9333687 0.9623901
## [8] 0.9792066 0.9887083 0.9939638
```

We can use the `map()`

functions inside a `mutate()`

function to run the `power.t.test()`

function on the value of `n`

from each row of a table, then extract the value for “power,” and delete the column with the power calculations.

```
<- tibble(
mypower n = seq(100, 1000, 100)) %>%
mutate(pcalc = purrr::map(n, power.t.test,
delta = 0.2,
sd = 1,
type="two.sample"),
power = purrr::map_dbl(pcalc, `[[`, "power")) %>%
select(-pcalc)
```

## 7.5 Custom functions

In addition to the built-in functions and functions you can access from packages, you can also write your own functions (and eventually even packages!).

### 7.5.1 Structuring a function

The general structure of a function is as follows:

```
<- function(my_args) {
function_name # process the arguments
# return some value
}
```

Here is a very simple function. Can you guess what it does?

```
<- function(my_number) {
add1 + 1
my_number
}
add1(10)
```

`## [1] 11`

Let’s make a function that reports p-values in APA format (with “p = [rounded value]” when p >= .001 and “p < .001” when p < .001).

First, we have to name the function. You can name it anything, but try not to duplicate existing functions or you will overwrite them. For example, if you call your function `rep`

, then you will need to use `base::rep()`

to access the normal `rep`

function. Let’s call our p-value function `report_p`

and set up the framework of the function.

```
<- function() {
report_p }
```

### 7.5.2 Arguments

We need to add one argument, the p-value you want to report. The names you choose for the arguments are private to that argument, so it is not a problem if they conflict with other variables in your script. You put the arguments in the parentheses of `function()`

in the order you want them to default (just like the built-in functions you’ve used before).

```
<- function(p) {
report_p }
```

### 7.5.3 Argument defaults

You can add a default value to any argument. If that argument is skipped, then the function uses the default argument. It probably doesn’t make sense to run this function without specifying the p-value, but we can add a second argument called `digits`

that defaults to 3, so we can round p-values to any number of digits.

```
<- function(p, digits = 3) {
report_p }
```

Now we need to write some code inside the function to process the input arguments and turn them into a **return**ed output. Put the output as the last item in function.

```
<- function(p, digits = 3) {
report_p if (p < .001) {
= "p < .001"
reported else {
} <- round(p, digits)
roundp = paste("p =", roundp)
reported
}
reported }
```

You might also see the returned output inside of the `return()`

function. This does the same thing.

```
<- function(p, digits = 3) {
report_p if (p < .001) {
= "p < .001"
reported else {
} <- round(p, digits)
roundp = paste("p =", roundp)
reported
}
return(reported)
}
```

When you run the code defining your function, it doesn’t output anything, but makes a new object in the Environment tab under ** Functions**. Now you can run the function.

```
report_p(0.04869)
report_p(0.0000023)
```

```
## [1] "p = 0.049"
## [1] "p < .001"
```

### 7.5.4 Scope

What happens in a function stays in a function. You can change the value of a variable passed to a function, but that won’t change the value of the variable outside of the function, even if that variable has the same name as the one in the function.

```
<- "not changed"
reported
# inside this function, reported == "p = 0.002"
report_p(0.0023)
# still "not changed" reported
```

```
## [1] "p = 0.002"
## [1] "not changed"
```

### 7.5.5 Warnings and errors

`p`

? Or if you set `p`

to 1.5 or “a?”
You might want to add a more specific warning and stop running the function code if someone enters a value that isn’t a number. You can do this with the `stop()`

function.

If someone enters a number that isn’t possible for a p-value (0-1), you might want to warn them that this is probably not what they intended, but still continue with the function. You can do this with `warning()`

.

```
<- function(p, digits = 3) {
report_p if (!is.numeric(p)) stop("p must be a number")
if (p <= 0) warning("p-values are normally greater than 0")
if (p >= 1) warning("p-values are normally less than 1")
if (p < .001) {
= "p < .001"
reported else {
} <- round(p, digits)
roundp = paste("p =", roundp)
reported
}
reported }
```

`report_p()`

`## Error in report_p(): argument "p" is missing, with no default`

`report_p("a")`

`## Error in report_p("a"): p must be a number`

`report_p(-2)`

`## Warning in report_p(-2): p-values are normally greater than 0`

`report_p(2)`

`## Warning in report_p(2): p-values are normally less than 1`

```
## [1] "p < .001"
## [1] "p = 2"
```

## 7.6 Iterating your own functions

First, let’s build up the code that we want to iterate.

### 7.6.1 rnorm()

Create a vector of 20 random numbers drawn from a normal distribution with a mean of 5 and standard deviation of 1 using the `rnorm()`

function and store them in the variable `A`

.

`<- rnorm(20, mean = 5, sd = 1) A `

### 7.6.2 tibble::tibble()

A `tibble`

is a type of table or `data.frame`

. The function `tibble::tibble()`

creates a tibble with a column for each argument. Each argument takes the form `column_name = data_vector`

.

Create a table called `dat`

including two vectors: `A`

that is a vector of 20 random normally distributed numbers with a mean of 5 and SD of 1, and `B`

that is a vector of 20 random normally distributed numbers with a mean of 5.5 and SD of 1.

```
<- tibble(
dat A = rnorm(20, 5, 1),
B = rnorm(20, 5.5, 1)
)
```

### 7.6.3 t.test()

You can run a Welch two-sample t-test by including the two samples you made as the first two arguments to the function `t.test`

. You can reference one column of a table by its names using the format `table_name$column_name`

`t.test(dat$A, dat$B)`

```
##
## Welch Two Sample t-test
##
## data: dat$A and dat$B
## t = -1.7716, df = 36.244, p-value = 0.08487
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.2445818 0.0838683
## sample estimates:
## mean of x mean of y
## 4.886096 5.466453
```

You can also convert the table to long format using the `gather`

function and specify the t-test using the format `dv_column~grouping_column`

.

```
<- gather(dat, group, score, A:B)
longdat
t.test(score~group, data = longdat)
```

```
##
## Welch Two Sample t-test
##
## data: score by group
## t = -1.7716, df = 36.244, p-value = 0.08487
## alternative hypothesis: true difference in means between group A and group B is not equal to 0
## 95 percent confidence interval:
## -1.2445818 0.0838683
## sample estimates:
## mean in group A mean in group B
## 4.886096 5.466453
```

### 7.6.4 broom::tidy()

You can use the function `broom::tidy()`

to extract the data from a statistical test in a table format. The example below pipes everything together.

```
tibble(
A = rnorm(20, 5, 1),
B = rnorm(20, 5.5, 1)
%>%
) gather(group, score, A:B) %>%
t.test(score~group, data = .) %>%
::tidy() broom
```

estimate | estimate1 | estimate2 | statistic | p.value | parameter | conf.low | conf.high | method | alternative |
---|---|---|---|---|---|---|---|---|---|

-0.6422108 | 5.044009 | 5.68622 | -2.310591 | 0.0264905 | 37.27083 | -1.205237 | -0.0791844 | Welch Two Sample t-test | two.sided |

In the pipeline above, `t.test(score~group, data = .)`

uses the `.`

notation to change the location of the piped-in data table from it’s default position as the first argument to a different position.

Finally, we can extract a single value from this results table using `pull()`

.

```
tibble(
A = rnorm(20, 5, 1),
B = rnorm(20, 5.5, 1)
%>%
) gather(group, score, A:B) %>%
t.test(score~group, data = .) %>%
::tidy() %>%
broompull(p.value)
```

`## [1] 0.7075268`

### 7.6.5 Custom function: t_sim()

First, name your function `t_sim`

and wrap the code above in a function with no arguments.

```
<- function() {
t_sim tibble(
A = rnorm(20, 5, 1),
B = rnorm(20, 5.5, 1)
%>%
) gather(group, score, A:B) %>%
t.test(score~group, data = .) %>%
::tidy() %>%
broompull(p.value)
}
```

Run it a few times to see what happens.

`t_sim()`

`## [1] 0.00997552`

### 7.6.6 Iterate t_sim()

Let’s run the `t_sim`

function 1000 times, assign the resulting p-values to a vector called `reps`

, and check what proportion of p-values are lower than alpha (e.g., .05). This number is the power for this analysis.

```
<- replicate(1000, t_sim())
reps <- .05
alpha <- mean(reps < alpha)
power power
```

`## [1] 0.328`

### 7.6.7 Set seed

You can use the `set.seed`

function before you run a function that uses random numbers to make sure that you get the same random data back each time. You can use any integer you like as the seed.

`set.seed(90201)`

Make sure you don’t ever use `set.seed()`

**inside** of a simulation function, or you will just simulate the exact same data over and over again.

### 7.6.8 Add arguments

You can just edit your function each time you want to cacluate power for a different sample n, but it is more efficent to build this into your fuction as an arguments. Redefine `t_sim`

, setting arguments for the mean and SD of group A, the mean and SD of group B, and the number of subjects per group. Give them all default values.

```
<- function(n = 10, m1=0, sd1=1, m2=0, sd2=1) {
t_sim tibble(
A = rnorm(n, m1, sd1),
B = rnorm(n, m2, sd2)
%>%
) gather(group, score, A:B) %>%
t.test(score~group, data = .) %>%
::tidy() %>%
broompull(p.value)
}
```

Test your function with some different values to see if the results make sense.

```
t_sim(100)
t_sim(100, 0, 1, 0.5, 1)
```

```
## [1] 0.5065619
## [1] 0.001844064
```

Use `replicate`

to calculate power for 100 subjects/group with an effect size of 0.2 (e.g., A: m = 0, SD = 1; B: m = 0.2, SD = 1). Use 1000 replications.

```
<- replicate(1000, t_sim(100, 0, 1, 0.2, 1))
reps <- mean(reps < .05)
power power
```

`## [1] 0.268`

Compare this to power calculated from the `power.t.test`

function.

`power.t.test(n = 100, delta = 0.2, sd = 1, type="two.sample")`

```
##
## Two-sample t test power calculation
##
## n = 100
## delta = 0.2
## sd = 1
## sig.level = 0.05
## power = 0.2902664
## alternative = two.sided
##
## NOTE: n is number in *each* group
```

Calculate power via simulation and `power.t.test`

for the following tests:

- 20 subjects/group, A: m = 0, SD = 1; B: m = 0.2, SD = 1
- 40 subjects/group, A: m = 0, SD = 1; B: m = 0.2, SD = 1
- 20 subjects/group, A: m = 10, SD = 1; B: m = 12, SD = 1.5

## 7.7 Glossary

term | definition |
---|---|

argument | A variable that provides input to a function. |

data type | The kind of data represented by an object. |

double | A data type representing a real decimal number |

function | A named section of code that can be reused. |

iteration | Repeating a process or function |

matrix | A container data type consisting of numbers arranged into a fixed number of rows and columns |