PsyTeachR
Teaching case studies from Glasgow

psyteachr.github.io/ews24

Sean Westwood, Tobias Thejll-Madsen & Lisa DeBruine

Overview

Demystifying Functions

Why Demystification?

Programming as a beginner is mysterious and scary
Things almost seem to work by magic

How can we fix something magical when it breaks??

Why Demystification?

Similar panic can be induced by stats formula
Mapping intuition onto an equation is not easy
What if we can find a common solution?

The Goal

Crack open the black box of functions in coding
Systematically defang scary formulas
Instill a sense of confidence and independence

The `function()` Function

Input: We use the function function to specify our inputs

function(`input`)

Process: We put the process inside the curly brackets {}

function(`input`){
  # inside the curly brackets goes the process
}

Output: We specify the output using return()

function(`input`){
  `process` # inside the curly brackets goes the process
  return(`output`) # the outcome of the process goes here
}

Step 1: I Make a Mean Function

We need a non-intimidating example to ease into things
The mean is a simple & familiar statistical concept
An ideal starting point for demystification!!

\[ \bar{x} = \frac{\sum x}{n} \]

Mean Function: Input

A vector of numeric values (x)

mean_function <- function(x){
  
  
  
  
  
  
}

\[ \bar{x} = \frac {\sum \color{red}{x}} {n} \]

Mean Function: Process

Add up all the values within x

mean_function <- function(x){
  
  numer  <- sum(x)
  
  
  
  
}

\[ \bar{x} = \frac {\color{red}{\sum x}} {n} \]

Mean Function: Process

Add up all the values within x
Find the number of values within x

mean_function <- function(x){
  
  numer  <- sum(x)
  denom  <- length(x)
  
  
  
}

\[ \bar{x} = \frac {\sum x} {\color{red}{n}} \]

Mean Function: Process

Add up all the values within x
Find the number of values within x
Divide the sum of values by the number of values

mean_function <- function(x){
  
  numer  <- sum(x)
  denom  <- length(x)
  output <- numer/denom
  
  
}

\[ \bar{x} = \color{red}{\frac {\sum x} {n}} \]

Mean Function: Output

Return the value contained in output

mean_function <- function(x){
  
  numer  <- sum(x)
  denom  <- length(x)
  output <- numer/denom
  
  return(output)
}

\[ \color{red}{\bar{x}} = \frac {\sum x} {n} \]

Mean Function: Testing

Now that we have made our function, we can test it by comparing the output it gives us to the regular old mean() function in base R!

Let’s simulate some random values to test our function with:

test_data <- rnorm(n = 10, mean = 0, sd = 1)
test_data

 [1]  0.11148599  0.09301212 -1.31551523 -1.58759382 -1.26621010 -0.41885440
 [7]  1.59791447  0.62133804 -0.20493488  0.96448323

Mean Function: Testing

Let’s see how our function compares to mean() in base R

We can use test_data as a test case:

# print the mean that our function calculates
mean_function(test_data)

[1] -0.1404875

# print the mean that the base R function calculates
mean(test_data)

[1] -0.1404875

# return TRUE if the two  means are equivalent
mean_function(test_data) == mean(test_data)

[1] TRUE

It wasn’t magic after all!!

Step 2: Variance is the Spice of Life

Now we have dipped our toe in, let’s up the complexity
Variance is a little trickier, but not totally alien
We can use our new mean function too!

\[\sigma^2 = \frac{\sum(x - \bar{x})^2}{n-1}\]

Variance Function: Input

A vector of numeric values (x)

var_function <- function(x){
  
  
  
  
  
  
  
}

\[ \sigma^2 = \frac {\sum(\color{red}{x} - \bar{x})^2} {n-1} \]

Variance Function: Process

Calculate the mean of x using our mean_function()

var_function <- function(x){
  
  av     <- mean_function(x)
  
  
  
  
  
}

\[ \sigma^2 = \frac {\sum(x - \color{red}{\bar{x}})^2} {n-1} \]

Variance Function: Process

Calculate the top part (numerator) of the formula
1. subtract the mean from each value of x

var_function <- function(x){
  
  av     <- mean_function(x)
  numer  <- (x - av)
  
  
  
  
}

\[ \sigma^2 = \frac {\sum(x \color{red}{-} \bar{x})^2} {n-1} \]

Variance Function: Process

Calculate the top part (numerator) of the formula
1. subtract the mean from each value of x
2. square each of the resulting values

var_function <- function(x){
  
  av     <- mean_function(x)
  numer  <- (x - av)^2
  
  
  
  
}

\[ \sigma^2 = \frac {\sum(x - \bar{x})^\color{red}{2}} {n-1} \]

Variance Function: Process

Calculate the top part (numerator) of the formula
1. subtract the mean from each value of x
2. square each of the resulting values
3. sum all of the squared values together

var_function <- function(x){
  
  av     <- mean_function(x)
  numer  <- sum((x - av)^2) 
  
  
  
  
}

\[ \sigma^2 = \frac {\color{red}{\sum}(x - \bar{x})^2} {n-1} \]

Variance Function: Process

Calculate the bottom part (denominator) of the formula

var_function <- function(x){
  
  av     <- mean_function(x)
  numer  <- sum((x - av)^2)
  denom  <- length(x) - 1
  
  
  
}

\[ \sigma^2 = \frac {\sum(x - \bar{x})^2} {\color{red}{n-1}} \]

Variance Function: Process

Calculate the bottom part (denominator) of the formula
Divide the numerator by the denominator

var_function <- function(x){
  
  av     <- mean_function(x)
  numer  <- sum((x - av)^2)
  denom  <- length(x) - 1
  output <- numer/denom

  
}

\[ \sigma^2 = \color{red}{ \frac {\sum (x - \bar{x})^2} {n-1} } \]

Variance Function: Output

Return the resulting value from Step 4 in the process

var_function <- function(x){
  
  av     <- mean_function(x)
  numer  <- sum((x - av)^2)
  denom  <- length(x) - 1
  output <- numer/denom
  
  return(output)
}

\[ \color{red}{\sigma^2} = \frac {\sum (x - \bar{x})^2} {n-1} \]

Variance Function: Testing

Let’s use our test_data again to see if our function works:

# print the variance that our function calculates
var_function(test_data)

[1] 1.08501

# print the variance that the base R function calculates
var(test_data)

[1] 1.08501

# return TRUE if the two variances are equivalent
var_function(test_data) == var(test_data)

[1] TRUE

The mystery is disappearing before our eyes!!

Step 3: Making a Standard Error (but in a good way)

We are now ready to put it all together for our SEM function
This is a nice practical end goal as there no SEM in base R
A simple equation that neatly applies our mean and var

\[ SE = \frac{s}{\sqrt{n}} \]

Where \(s\) is the standard deviation of the sample (i.e. the square root of the variance)

SEM Function: Input

A vector of numeric values (x)

sem_function <- function(x){
  
  
  
  
  
  
}

\[ SE = \frac{s}{\sqrt{n}} \]

SEM Function: Process

For our numerator, calculate the standard deviation of x by taking the square root of our var_function()

sem_function <- function(x){
  
  numer  <- sqrt(var_function(x))
  
  
  
  
}

\[ SE = \frac {\color{red}{s}} {\sqrt{n}} \]

SEM Function: Process

For our numerator, calculate the standard deviation of x by taking the square root of our var_function()
For our denominator, take the square root of the number of values in x

sem_function <- function(x){
  
  numer  <- sqrt(var_function(x))
  denom  <- sqrt(length(x))
  
  
  
}

\[ SE = \frac {s} {\color{red}{\sqrt{n}}} \]

SEM Function: Process

For our numerator, calculate the standard deviation of x by taking the square root of our var_function()
For our denominator, take the square root of the number of values in x

sem_function <- function(x){
  
  numer  <- sqrt(var_function(x))
  denom  <- sqrt(length(x))
  output <- numer/denom
  
  
}

\[ SE = \color{red}{\frac {s} {\sqrt{n}} } \]

SEM Function: Output

Return the resulting value from this division

sem_function <- function(x){
  
  numer  <- sqrt(var_function(x))
  denom  <- sqrt(length(x))
  output <- numer/denom
  
  return(output)
}

\[ \color{red}{SE} = \frac {s} {\sqrt{n}} \]

And of course you can start to simplify things e.g.

sem_function_mini <- function(x){
  sd(x)/sqrt(length(x))
}

Of course many mysteries remain…

Demystification is a philosophy, not a lesson
The core idea is emboldening and empowering learners
Programming has a special capacity to overwhelm:
- Software and package installation/dependencies
- Project & file management
- Unintuitive logic & conventions
- \(\color{red}{\textbf{ERRORS}}\)

Errors - Help!

A familiar scenario

You go to check-in on a student and see the following:

A familiar scenario

You go to check-in on a student and see the following:

# simulate data
d <- tibble( 
  "Group" = rep(c("G1", "G2"), each = 10), 
  "ReactionT" = rnorm(20, 500, 150)) 

head(d, 2) # see first two rows

Error in tibble(Group = rep(c(“G1”, “G2”), each = 10), ReactionT = rnorm(20, : could not find function “tibble”

A familiar scenario

We realise that they did not load ‘tidyverse’, so

library(tidyverse)

# simulate data
d <- tibble( 
  "Group" = rep(c("G1", "G2"), each = 10), 
  "ReactionT" = rnorm(20, 500, 150)) 

head(d, 2) # see first two rows

# A tibble: 2 × 2
  Group ReactionT
  <chr>     <dbl>
1 G1         245.
2 G1         564.

Programming errors

Student programming errors comes two main types (Becker et al 2019):
1. Language specification errors (comes with an error message)
2. Program specification errors (program runs, but doesn’t do as intended)

Programming errors

Student programming errors comes two main types (Becker et al 2019):
1. Language specification errors (comes with an error message)
2. Program specification errors (program runs, but doesn’t do as intended)
Both are important when debugging code, but an immediate hindrance is in language specification errors

But why focus on errors?

Using errors in teaching has some evidence that is increase student programming ability and self-efficacy (Hoffman & Elmi, 2021; Keohler, 2020)
Effectively debugging code can allow students to progress independently
An excellent framing device for fostering students as self-regulated learners (for self-regulated learning, see e.g., Zimmerman, 2002)

Errors for non-programmers

Students are used to high-stakes assessments (not used to ‘move fast and break things’-mentality)
Red is scary!
Language is often convoluted and technical
We need to actively help create the right relationship with errors

Goal

First time students engage with errors it should be taught and not by chance
We want the first thought when seeing an error to be:
- “Great, that’s information about how I can do this cool thing”
- “Oh no, I’ve done something wrong - maybe coding is not for me”

How can we meaningfully use errors?

Make errors an explicit part of your teaching and not something that just happens
- Introduce errors just like you introduce a function
- Error-full live coding
- Fix errors to get code to run
- “My favourite error”
- Write functions with error handling

Introduce errors just like you introduce a function

So just as with introducing functions and it’s output

t.test(formula = ReactionT ~ Group, 
       data = d, paired = FALSE)


    Welch Two Sample t-test

data:  ReactionT by Group
t = 0.78859, df = 17.395, p-value = 0.441
alternative hypothesis: true difference in means between group G1 and group G2 is not equal to 0
95 percent confidence interval:
 -77.8362 171.0075
sample estimates:
mean in group G1 mean in group G2 
        554.0498         507.4641

Introduce errors just like you introduce a function

We also want to spend time on

t.test(formula = Group ~ ReactionT, 
       data = d, paired = FALSE)

Error in t.test.formula(formula = Group ~ ReactionT, data = d, paired = FALSE) : grouping factor must have exactly 2 levels

Error-full live coding

Pre-plan helpful errors for live coding
For instance:
- syntax/spelling errors you can easily fix

Error-full live coding

Pre-plan helpful errors for live coding
For instance:
- syntax/spelling errors you can easily fix

librarry(tidyverse)

Error in librarry(tidyverse) : could not find function “librarry”

Error-full live coding

Pre-plan helpful errors for live coding
For instance:
- syntax/spelling errors you can easily fix
- error that requires you to look at the ?help (documentation)

rnorm(mean = 0, sd = 1)

Error in rnorm(mean = 0, sd = 1) : argument “n” missing, with no default

Error-full live coding

Pre-plan helpful errors for live coding
For instance:
- syntax/spelling errors you can easily fix
- error that requires you to look at the ?help (documentation)
- error that requires you to google (exact answer)
- error that requires you to google (adaption required)

Fix errors to get code to run

Write code with error and have student fix them:

Fix errors to get code to run

Write code with error and have student fix them:

library(tidyverse)

# simulate data
d <- tibble( 
  "Group" = rep(c("G1", "G2"), each = 10), 
  "ReactionT" = rnorm(20, 500, 150)) %>
  mutate("Group" = as.factor(Group))
  
head(d, 2) # see first two rows

Error: unexpected input in: ” “Group” = rep(c(“G1”, “G2”), each = 10), “ReactionT” = rnorm(20, 500, 150)) %>”

“My favourite error”

But we can only think of so many errors…
… so get help from your students!
“My favourite error”-activity:
- students submit errors they come across during their coding
- you review before class
- choose a particularly interesting error and go through it in class
This can help form foundation for a community error library

Write functions with error handling

Error messages help us think about how programming works, and what function calls do

Write functions with error handling

Error messages help us think about how programming works, and what function calls do

So when trying the off-the-shelf mean()-function:

test_chr <- c(1,2, "hello")

mean(test_chr)

Warning: argument is not numeric or logical: returning NA

Write functions with error handling

Let’s compare to our function from earlier:

mean_function <- function(x){
  mean_sum    <- sum(x)
  mean_n      <- length(x)
  mean_output <- mean_sum/mean_n
  
  return(mean_output)}

test_chr <- c(1,2, "hello")

mean_function(test_chr)

Error in sum(x) : invalid ‘type’ (character) of argument

Write functions with error handling

Let’s write an error message:

mean_function <- function(x){
  
  if(!is.numeric(x)){
    stop("mean_function must take an array of numbers")}
  
  mean_sum    <- sum(x)
  mean_n      <- length(x)
  mean_output <- mean_sum/mean_n
  
  return(mean_output)}

Write functions with error handling

Now let’s test it:

Write functions with error handling

Now let’s test it:

test_num <- c(1,2,3)

mean_function(test_num)

[1] 2

Write functions with error handling

Now let’s test it:

test_num <- c(1,2,3)

mean_function(test_num)

[1] 2

test_chr <- c(1,2,"hello")

mean_function(test_chr)

Error in mean_function(test_chr) : mean_function must take an array of numbers

Write functions with error handling

Writing error messages forces students to think about functions as a series of steps each with its own requirements
Demystify why error messsages are there and helps open up the way we think about functions

Takeaway

Errors will be a part of a student’s coding journey, so we need to think about how we help students make the most of them
Reflection: What is one way you could incorporate errors in your teaching?

PsyTeachR

Embedding Data Skills in Research Methods Education:

Preparing Students for Reproducible Research

Phil McAleer, Niamh Stack, Heather Cleland Woods, Lisa DeBruine, Helena Paterson, Emily Nordmann, Carolina Kuepper-Tetzel, Dale Barr

10.31234/osf.io/hq68s

When starting from realistic raw data, nearly 80% of the data analytic effort for this task involves skills not commonly taught—namely, importing, manipulating, and transforming tabular data.