PsyTeachR
Teaching case studies from Glasgow

psyteachr.github.io/ews24

Sean Westwood, Tobias Thejll-Madsen & Lisa DeBruine

Overview

Demystifying Functions

Errors
Help!

PsyTeachR Resources

Demystifying Functions

Why Demystification?

  • Programming as a beginner is mysterious and scary
  • Things almost seem to work by magic

  • How can we fix something magical when it breaks??

Why Demystification?

  • Similar panic can be induced by stats formula
  • Mapping intuition onto an equation is not easy
  • What if we can find a common solution?

The Goal

  • Crack open the black box of functions in coding
  • Systematically defang scary formulas
  • Instill a sense of confidence and independence

The function() Function

  • Input: We use the function function to specify our inputs
function(`input`)
  • Process: We put the process inside the curly brackets {}
function(`input`){
  # inside the curly brackets goes the process
}
  • Output: We specify the output using return()
function(`input`){
  `process` # inside the curly brackets goes the process
  return(`output`) # the outcome of the process goes here
}

Step 1: I Make a Mean Function

  • We need a non-intimidating example to ease into things
  • The mean is a simple & familiar statistical concept
  • An ideal starting point for demystification!!

\[ \bar{x} = \frac{\sum x}{n} \]

Mean Function: Input

A vector of numeric values (x)

mean_function <- function(x){
  
  
  
  
  
  
}

\[ \bar{x} = \frac {\sum \color{red}{x}} {n} \]

Mean Function: Process

  1. Add up all the values within x
mean_function <- function(x){
  
  numer  <- sum(x)
  
  
  
  
}

\[ \bar{x} = \frac {\color{red}{\sum x}} {n} \]

Mean Function: Process

  1. Add up all the values within x
  2. Find the number of values within x
mean_function <- function(x){
  
  numer  <- sum(x)
  denom  <- length(x)
  
  
  
}

\[ \bar{x} = \frac {\sum x} {\color{red}{n}} \]

Mean Function: Process

  1. Add up all the values within x
  2. Find the number of values within x
  3. Divide the sum of values by the number of values
mean_function <- function(x){
  
  numer  <- sum(x)
  denom  <- length(x)
  output <- numer/denom
  
  
}

\[ \bar{x} = \color{red}{\frac {\sum x} {n}} \]

Mean Function: Output

Return the value contained in output

mean_function <- function(x){
  
  numer  <- sum(x)
  denom  <- length(x)
  output <- numer/denom
  
  return(output)
}

\[ \color{red}{\bar{x}} = \frac {\sum x} {n} \]

Mean Function: Testing

Now that we have made our function, we can test it by comparing the output it gives us to the regular old mean() function in base R!

Let’s simulate some random values to test our function with:

test_data <- rnorm(n = 10, mean = 0, sd = 1)
test_data
 [1]  0.11148599  0.09301212 -1.31551523 -1.58759382 -1.26621010 -0.41885440
 [7]  1.59791447  0.62133804 -0.20493488  0.96448323

Mean Function: Testing

Let’s see how our function compares to mean() in base R

We can use test_data as a test case:

# print the mean that our function calculates
mean_function(test_data)
[1] -0.1404875
# print the mean that the base R function calculates
mean(test_data)
[1] -0.1404875
# return TRUE if the two  means are equivalent
mean_function(test_data) == mean(test_data)
[1] TRUE

It wasn’t magic after all!!

Step 2: Variance is the Spice of Life

  • Now we have dipped our toe in, let’s up the complexity
  • Variance is a little trickier, but not totally alien
  • We can use our new mean function too!

\[\sigma^2 = \frac{\sum(x - \bar{x})^2}{n-1}\]

Variance Function: Input

A vector of numeric values (x)

var_function <- function(x){
  
  
  
  
  
  
  
}

\[ \sigma^2 = \frac {\sum(\color{red}{x} - \bar{x})^2} {n-1} \]

Variance Function: Process

  1. Calculate the mean of x using our mean_function()
var_function <- function(x){
  
  av     <- mean_function(x)
  
  
  
  
  
}

\[ \sigma^2 = \frac {\sum(x - \color{red}{\bar{x}})^2} {n-1} \]

Variance Function: Process

  1. Calculate the top part (numerator) of the formula

    1. subtract the mean from each value of x
var_function <- function(x){
  
  av     <- mean_function(x)
  numer  <- (x - av)
  
  
  
  
}

\[ \sigma^2 = \frac {\sum(x \color{red}{-} \bar{x})^2} {n-1} \]

Variance Function: Process

  1. Calculate the top part (numerator) of the formula

    1. subtract the mean from each value of x
    2. square each of the resulting values
var_function <- function(x){
  
  av     <- mean_function(x)
  numer  <- (x - av)^2
  
  
  
  
}

\[ \sigma^2 = \frac {\sum(x - \bar{x})^\color{red}{2}} {n-1} \]

Variance Function: Process

  1. Calculate the top part (numerator) of the formula

    1. subtract the mean from each value of x
    2. square each of the resulting values
    3. sum all of the squared values together
var_function <- function(x){
  
  av     <- mean_function(x)
  numer  <- sum((x - av)^2) 
  
  
  
  
}

\[ \sigma^2 = \frac {\color{red}{\sum}(x - \bar{x})^2} {n-1} \]

Variance Function: Process

  1. Calculate the bottom part (denominator) of the formula
var_function <- function(x){
  
  av     <- mean_function(x)
  numer  <- sum((x - av)^2)
  denom  <- length(x) - 1
  
  
  
}

\[ \sigma^2 = \frac {\sum(x - \bar{x})^2} {\color{red}{n-1}} \]

Variance Function: Process

  1. Calculate the bottom part (denominator) of the formula
  2. Divide the numerator by the denominator
var_function <- function(x){
  
  av     <- mean_function(x)
  numer  <- sum((x - av)^2)
  denom  <- length(x) - 1
  output <- numer/denom

  
}

\[ \sigma^2 = \color{red}{ \frac {\sum (x - \bar{x})^2} {n-1} } \]

Variance Function: Output

Return the resulting value from Step 4 in the process

var_function <- function(x){
  
  av     <- mean_function(x)
  numer  <- sum((x - av)^2)
  denom  <- length(x) - 1
  output <- numer/denom
  
  return(output)
}

\[ \color{red}{\sigma^2} = \frac {\sum (x - \bar{x})^2} {n-1} \]

Variance Function: Testing

Let’s use our test_data again to see if our function works:

# print the variance that our function calculates
var_function(test_data)
[1] 1.08501
# print the variance that the base R function calculates
var(test_data)
[1] 1.08501
# return TRUE if the two variances are equivalent
var_function(test_data) == var(test_data)
[1] TRUE

The mystery is disappearing before our eyes!!

Step 3: Making a Standard Error (but in a good way)

  • We are now ready to put it all together for our SEM function
  • This is a nice practical end goal as there no SEM in base R
  • A simple equation that neatly applies our mean and var

\[ SE = \frac{s}{\sqrt{n}} \]

Where \(s\) is the standard deviation of the sample (i.e. the square root of the variance)

SEM Function: Input

A vector of numeric values (x)

sem_function <- function(x){
  
  
  
  
  
  
}

\[ SE = \frac{s}{\sqrt{n}} \]

SEM Function: Process

  1. For our numerator, calculate the standard deviation of x by taking the square root of our var_function()
sem_function <- function(x){
  
  numer  <- sqrt(var_function(x))
  
  
  
  
}

\[ SE = \frac {\color{red}{s}} {\sqrt{n}} \]

SEM Function: Process

  1. For our numerator, calculate the standard deviation of x by taking the square root of our var_function()
  2. For our denominator, take the square root of the number of values in x
sem_function <- function(x){
  
  numer  <- sqrt(var_function(x))
  denom  <- sqrt(length(x))
  
  
  
}

\[ SE = \frac {s} {\color{red}{\sqrt{n}}} \]

SEM Function: Process

  1. For our numerator, calculate the standard deviation of x by taking the square root of our var_function()
  2. For our denominator, take the square root of the number of values in x
sem_function <- function(x){
  
  numer  <- sqrt(var_function(x))
  denom  <- sqrt(length(x))
  output <- numer/denom
  
  
}

\[ SE = \color{red}{\frac {s} {\sqrt{n}} } \]

SEM Function: Output

Return the resulting value from this division

sem_function <- function(x){
  
  numer  <- sqrt(var_function(x))
  denom  <- sqrt(length(x))
  output <- numer/denom
  
  return(output)
}

\[ \color{red}{SE} = \frac {s} {\sqrt{n}} \]

And of course you can start to simplify things e.g. 

sem_function_mini <- function(x){
  sd(x)/sqrt(length(x))
}

Of course many mysteries remain…

  • Demystification is a philosophy, not a lesson

  • The core idea is emboldening and empowering learners

  • Programming has a special capacity to overwhelm:

    • Software and package installation/dependencies
    • Project & file management
    • Unintuitive logic & conventions
    • \(\color{red}{\textbf{ERRORS}}\)

Errors - Help!

A familiar scenario

You go to check-in on a student and see the following:

A familiar scenario

You go to check-in on a student and see the following:

# simulate data
d <- tibble( 
  "Group" = rep(c("G1", "G2"), each = 10), 
  "ReactionT" = rnorm(20, 500, 150)) 

head(d, 2) # see first two rows

Error in tibble(Group = rep(c(“G1”, “G2”), each = 10), ReactionT = rnorm(20, : could not find function “tibble”

A familiar scenario

We realise that they did not load ‘tidyverse’, so

library(tidyverse)

# simulate data
d <- tibble( 
  "Group" = rep(c("G1", "G2"), each = 10), 
  "ReactionT" = rnorm(20, 500, 150)) 

head(d, 2) # see first two rows
# A tibble: 2 × 2
  Group ReactionT
  <chr>     <dbl>
1 G1         245.
2 G1         564.

Programming errors

Programming errors

  • Student programming errors comes two main types (Becker et al 2019):
    1. Language specification errors (comes with an error message)
    2. Program specification errors (program runs, but doesn’t do as intended)

Programming errors

  • Student programming errors comes two main types (Becker et al 2019):
    1. Language specification errors (comes with an error message)
    2. Program specification errors (program runs, but doesn’t do as intended)
  • Both are important when debugging code, but an immediate hindrance is in language specification errors

But why focus on errors?

  • Using errors in teaching has some evidence that is increase student programming ability and self-efficacy (Hoffman & Elmi, 2021; Keohler, 2020)
  • Effectively debugging code can allow students to progress independently
  • An excellent framing device for fostering students as self-regulated learners (for self-regulated learning, see e.g., Zimmerman, 2002)

Errors for non-programmers

  • Students are used to high-stakes assessments (not used to ‘move fast and break things’-mentality)

  • Red is scary!

  • Language is often convoluted and technical

  • We need to actively help create the right relationship with errors

Goal

  • First time students engage with errors it should be taught and not by chance
  • We want the first thought when seeing an error to be:
    • “Great, that’s information about how I can do this cool thing”
    • “Oh no, I’ve done something wrong - maybe coding is not for me”

How can we meaningfully use errors?

  • Make errors an explicit part of your teaching and not something that just happens
    • Introduce errors just like you introduce a function
    • Error-full live coding
    • Fix errors to get code to run
    • “My favourite error”
    • Write functions with error handling

Introduce errors just like you introduce a function

Introduce errors just like you introduce a function

  • So just as with introducing functions and it’s output
t.test(formula = ReactionT ~ Group, 
       data = d, paired = FALSE)

    Welch Two Sample t-test

data:  ReactionT by Group
t = 0.78859, df = 17.395, p-value = 0.441
alternative hypothesis: true difference in means between group G1 and group G2 is not equal to 0
95 percent confidence interval:
 -77.8362 171.0075
sample estimates:
mean in group G1 mean in group G2 
        554.0498         507.4641 

Introduce errors just like you introduce a function

  • We also want to spend time on
t.test(formula = Group ~ ReactionT, 
       data = d, paired = FALSE)

Error in t.test.formula(formula = Group ~ ReactionT, data = d, paired = FALSE) : grouping factor must have exactly 2 levels

Error-full live coding

  • Pre-plan helpful errors for live coding
  • For instance:
    • syntax/spelling errors you can easily fix

Error-full live coding

  • Pre-plan helpful errors for live coding
  • For instance:
    • syntax/spelling errors you can easily fix
librarry(tidyverse)

Error in librarry(tidyverse) : could not find function “librarry”

Error-full live coding

  • Pre-plan helpful errors for live coding
  • For instance:
    • syntax/spelling errors you can easily fix
    • error that requires you to look at the ?help (documentation)
rnorm(mean = 0, sd = 1)

Error in rnorm(mean = 0, sd = 1) : argument “n” missing, with no default

Error-full live coding

  • Pre-plan helpful errors for live coding

  • For instance:

    • syntax/spelling errors you can easily fix
    • error that requires you to look at the ?help (documentation)
    • error that requires you to google (exact answer)
    • error that requires you to google (adaption required)

Fix errors to get code to run

  • Write code with error and have student fix them:

Fix errors to get code to run

  • Write code with error and have student fix them:
library(tidyverse)

# simulate data
d <- tibble( 
  "Group" = rep(c("G1", "G2"), each = 10), 
  "ReactionT" = rnorm(20, 500, 150)) %>
  mutate("Group" = as.factor(Group))
  
head(d, 2) # see first two rows

Error: unexpected input in: ” “Group” = rep(c(“G1”, “G2”), each = 10), “ReactionT” = rnorm(20, 500, 150)) %>”

“My favourite error”

  • But we can only think of so many errors…

  • … so get help from your students!

  • “My favourite error”-activity:

    • students submit errors they come across during their coding
    • you review before class
    • choose a particularly interesting error and go through it in class
  • This can help form foundation for a community error library

Write functions with error handling

  • Error messages help us think about how programming works, and what function calls do

Write functions with error handling

  • Error messages help us think about how programming works, and what function calls do

So when trying the off-the-shelf mean()-function:

test_chr <- c(1,2, "hello")

mean(test_chr)

Warning: argument is not numeric or logical: returning NA

Write functions with error handling

Let’s compare to our function from earlier:

mean_function <- function(x){
  mean_sum    <- sum(x)
  mean_n      <- length(x)
  mean_output <- mean_sum/mean_n
  
  return(mean_output)}

test_chr <- c(1,2, "hello")

mean_function(test_chr)

Error in sum(x) : invalid ‘type’ (character) of argument

Write functions with error handling

Let’s write an error message:

mean_function <- function(x){
  
  if(!is.numeric(x)){
    stop("mean_function must take an array of numbers")}
  
  mean_sum    <- sum(x)
  mean_n      <- length(x)
  mean_output <- mean_sum/mean_n
  
  return(mean_output)}

Write functions with error handling

Now let’s test it:

Write functions with error handling

Now let’s test it:

test_num <- c(1,2,3)

mean_function(test_num)

[1] 2

Write functions with error handling

Now let’s test it:

test_num <- c(1,2,3)

mean_function(test_num)

[1] 2

test_chr <- c(1,2,"hello")

mean_function(test_chr)

Error in mean_function(test_chr) : mean_function must take an array of numbers

Write functions with error handling

  • Writing error messages forces students to think about functions as a series of steps each with its own requirements
  • Demystify why error messsages are there and helps open up the way we think about functions

Takeaway

  • Errors will be a part of a student’s coding journey, so we need to think about how we help students make the most of them
  • Reflection: What is one way you could incorporate errors in your teaching?

PsyTeachR

Embedding Data Skills in Research Methods Education:

Preparing Students for Reproducible Research

Phil McAleer, Niamh Stack, Heather Cleland Woods, Lisa DeBruine, Helena Paterson, Emily Nordmann, Carolina Kuepper-Tetzel, Dale Barr

10.31234/osf.io/hq68s

When starting from realistic raw data, nearly 80% of the data analytic effort for this task involves skills not commonly taught—namely, importing, manipulating, and transforming tabular data.

Resources

psyteachr.github.io

Undergraduate Textbooks

Postgraduate/CPD Textbooks

Workshops

R Packages

Quarto Books

booktem

Thank You!

psyteachr.github.io/ews24