Chapter 9 Introduction to GLM

9.1 Learning Objectives

9.1.1 Basic

  1. Define the components of the GLM
  2. Simulate data using GLM equations
  3. Identify the model parameters that correspond to the data-generation parameters
  4. Understand and plot residuals
  5. Predict new values using the model
  6. Explain the differences among coding schemes

9.1.2 Intermediate

  1. Demonstrate the relationships among two-sample t-test, one-way ANOVA, and linear regression
  2. Given data and a GLM, generate a decomposition matrix and calculate sums of squares, mean squares, and F ratios for a one-way ANOVA

9.4 GLM

9.4.1 What is the GLM?

The General Linear Model (GLM) a general mathematical framework for expressing relationships among variables that can express or test linear relationships between a numerical dependent variable and any combination of categorical or continuous independent variables.

9.4.2 Components

There are some mathematical conventions that you need to learn to understand the equations representing linear models. Once you understand those, learning about the GLM will get much easier.

Component of GLM Notation
Dependent Variable (DV) \(Y\)
Grand Average \(\mu\) (the Greek letter “mu”)
Main Effects \(A, B, C, \ldots\)
Interactions \(AB, AC, BC, ABC, \ldots\)
Random Error \(S(Group)\)

The linear equation predicts the dependent variable (\(Y\)) as the sum of the grand average value of \(Y\) (\(\mu\), also called the intercept), the main effects of all the predictor variables (\(A+B+C+ \ldots\)), the interactions among all the predictor variables (\(AB, AC, BC, ABC, \ldots\)), and some random error (\(S(Group)\)). The equation for a model with two predictor variables (\(A\) and \(B\)) and their interaction (\(AB\)) is written like this:

\(Y\) ~ \(\mu+A+B+AB+S(Group)\)

Don’t worry if this doesn’t make sense until we walk through a concrete example.

9.4.3 Simulating data from GLM

A good way to learn about linear models is to simulate data where you know exactly how the variables are related, and then analyse this simulated data to see where the parameters show up in the analysis.

We’ll start with a very simple linear model that just has a single categorical factor with two levels. Let’s say we’re predicting reaction times for congruent and incongruent trials in a Stroop task for a single participant. Average reaction time (mu) is 800ms, and is 50ms faster for congruent than incongruent trials (effect).

A factor is a categorical variable that is used to divide subjects into groups, usually to draw some comparison. Factors are composed of different levels. Do not confuse factors with levels!

In the example above, trial type is a , incongrunt is a , and congruent is a .

You need to represent categorical factors with numbers. The numbers, or coding you choose will affect the numbers you get out of the analysis and how you need to interpret them. Here, we will effect code the trial types so that congruent trials are coded as +0.5, and incongruent trials are coded as -0.5.

A person won’t always respond exactly the same way. They might be a little faster on some trials than others, due to random fluctuations in attention, learning about the task, or fatigue. So we can add an error term to each trial. We can’t know how much any specific trial will differ, but we can characterise the distribution of how much trials differ from average and then sample from this distribution.

Here, we’ll assume the error term is sampled from a normal distribution with a standard deviation of 100 ms (the mean of the error term distribution is always 0). We’ll also sample 100 trials of each type, so we can see a range of variation.

So first create variables for all of the parameters that describe your data.

Then simulate the data by creating a data table with a row for each trial and columns for the trial type and the error term (random numbers samples from a normal distribution with the SD specified by error_sd). For categorical variables, include both a column with the text labels (trial_type) and another column with the coded version (trial_type.e) to make it easier to check what the codings mean and to use for graphing. Calculate the dependent variable (RT) as the sum of the grand mean (mu), the coefficient (effect) multiplied by the effect-coded predictor variable (trial_type.e), and the error term.

Last but not least, always plot simulated data to make sure it looks like you expect.

Simulated Data

Figure 9.1: Simulated Data

9.4.4 Linear Regression

Now we can analyse the data we simulated using the function lm(). It takes the formula as the first argument. This is the same as the data-generating equation, but you can omit the error term (this is implied), and takes the data table as the second argument. Use the summary() function to see the statistical summary.

## 
## Call:
## lm(formula = RT ~ trial_type.e, data = dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -302.110  -70.052    0.948   68.262  246.220 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   788.192      7.206 109.376  < 2e-16 ***
## trial_type.e   61.938     14.413   4.297 2.71e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 101.9 on 198 degrees of freedom
## Multiple R-squared:  0.08532,    Adjusted R-squared:  0.0807 
## F-statistic: 18.47 on 1 and 198 DF,  p-value: 2.707e-05

Notice how the estimate for the (Intercept) is close to the value we set for mu and the estimate for trial_type.e is close to the value we set for effect.

Change the values of mu and effect, resimulate the data, and re-run the linear model. What happens to the estimates?

9.4.5 Residuals

You can use the residuals() function to extract the error term for each each data point. This is the Y values, minus the estimates for the intercept and trial type. We’ll make a density plot of the residuals below and compare it to the normal distribution we used for the error term.

Model residuals should be approximately normally distributed for each group

Figure 9.2: Model residuals should be approximately normally distributed for each group

You can also compare the model residuals to the simulated error values. If the model is accurate, they should be almost identical. If the intercept estimate is slightly off, the points will be slightly above or below the black line. If the estimate for the effect of trial type is slightly off, there will be a small, systematic difference between residuals for congruent and incongruent trials.

Model residuals should be very similar to the simulated error

Figure 9.3: Model residuals should be very similar to the simulated error

What happens to the residuals if you fit a model that ignores trial type (e.g., lm(Y ~ 1, data = dat))?

9.4.6 Predict New Values

You can use the estimates from your model to predict new data points, given values for the model parameters. For this simple example, we just need to now the trial type to make a prediction.

For congruent trials, you would predict that a new data point would be equal to the intercept estimate plus the trial type estimate multiplied by 0.5 (the effect code for congruent trials).

## [1] 819.1605

You can also use the predict() function to do this more easily.

##        1 
## 819.1605

9.4.7 Coding Categorical Variables

In the example above, we used effect coding for trial type. You can also use sum coding, which assigns +1 and -1 to the levels instead of +0.5 and -0.5. More commonly, you might want to use treatment coding, which assigns 0 to one level (usually a baseline or control condition) and 1 to the other level (usually a treatment or experimental condition).

Here we will add sum-coded and treatment-coded versions of trial_type to the dataset.

Here are the coefficients for the effect-coded version. They should be the same as those from the last analysis.

##  (Intercept) trial_type.e 
##    788.19166     61.93773

Here are the coefficients for the sum-coded version. This give the same results as effect coding, except the estimate for the categorical factor will be exactly half as large, as it represents the difference between each trial type and the hypothetical condition of 0 (the overall mean RT), rather than the difference between the two trial types.

##    (Intercept) trial_type.sum 
##      788.19166       30.96887

Here are the coefficients for the treatment-coded version. The estimate for the categorical factor will be the same as in the effect-coded version, but the intercept will decrease. It will be equal to the intercept minus the estimate for trial type from the sum-coded version.

##   (Intercept) trial_type.tr 
##     757.22279      61.93773

9.5 Relationships among tests

9.5.1 T-test

The t-test is just a special, limited example of a general linear model.

## 
##  Two Sample t-test
## 
## data:  RT by trial_type.e
## t = -4.2975, df = 198, p-value = 2.707e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -90.35945 -33.51601
## sample estimates:
## mean in group -0.5  mean in group 0.5 
##           757.2228           819.1605

What happens when you use other codings for trial type in the t-test above? Which coding maps onto the results of the t-test best?

9.5.2 ANOVA

ANOVA is also a special, limited version of the linear model.

##               Df    Sum Sq   Mean Sq  F value   Pr(>F)    
## (Intercept)    1 124249219 124249219 11963.12  < 2e-16 ***
## trial_type.e   1    191814    191814    18.47 2.71e-05 ***
## Residuals    198   2056432     10386                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The easiest way to get parameters out of an analysis is to use the broom::tidy() function. This returns a tidy table that you can extract numbers of interest from. Here, we just want to get the F-value for the effect of trial_type. Compare the square root of this value to the t-value from the t-tests above.

## [1] 4.297498

9.6 Understanding ANOVA

We’ll walk through an example of a one-way ANOVA with the following equation:

\(Y_{ij} = \mu + A_i + S(A)_{ij}\)

This means that each data point (\(Y_{ij}\)) is predicted to be the sum of the grand mean (\(\mu\)), plus the effect of factor A (\(A_i\)), plus some residual error (\(S(A)_{ij}\)).

9.6.1 Means, Variability, and Deviation Scores

Let’s create a simple simulation function so you can quickly create a two-sample dataset with specified Ns, means, and SDs.

Now we will use two_sample() to create a dataset dat with N=5 per group, means of -2 and +2, and SDs of 1 and 1 (yes, this is an effect size of d = 4).

You can calculate how each data point (Y) deviates from the overall sample mean (\(\hat{\mu}\)), which is represented by the horizontal grey line below and the deviations are the vertical grey lines. You can also calculate how different each point is from its group-specific mean (\(\hat{A_i}\)), which are represented by the horizontal coloured lines below and the deviations are the coloured vertical lines.

Deviations of each data point (Y) from the overall and group means

Figure 9.4: Deviations of each data point (Y) from the overall and group means

You can use these deviations to calculate variability between groups and within groups. ANOVA tests whether the variability between groups is larger than that within groups, accounting for the number of groups and observations.

9.6.2 Decomposition matrices

We can use the estimation equations for a one-factor ANOVA to calculate the model components.

  • mu is the overall mean
  • a is how different each group mean is from the overall mean
  • err is residual error, calculated by subtracting mu and a from Y

This produces a decomposition matrix, a table with columns for Y, mu, a, and err.

Y gr p mu a err
-1.4770938 A 0.1207513 -1.533501 -0.0643443
-2.9508741 A 0.1207513 -1.533501 -1.5381246
-0.6376736 A 0.1207513 -1.533501 0.7750759
-1.7579084 A 0.1207513 -1.533501 -0.3451589
-0.2401977 A 0.1207513 -1.533501 1.1725518
0.1968155 B 0.1207513 1.533501 -1.4574367
2.6308008 B 0.1207513 1.533501 0.9765486
2.0293297 B 0.1207513 1.533501 0.3750775
2.1629037 B 0.1207513 1.533501 0.5086516
1.2514112 B 0.1207513 1.533501 -0.4028410

Calculate sums of squares for mu, a, and err.

mu a err
0.1458088 23.51625 8.104182

If you’ve done everything right, SS$mu + SS$a + SS$err should equal the sum of squares for Y.

## [1] TRUE

Divide each sum of squares by its corresponding degrees of freedom (df) to calculate mean squares. The df for mu is 1, the df for factor a is K-1 (K is the number of groups), and the df for err is N - K (N is the number of observations).

mu a err
0.1458088 23.51625 1.013023

Then calculate an F-ratio for mu and a by dividing their mean squares by the error term mean square. Get the p-values that correspond to these F-values using the pf() function.

Put everything into a data frame to display it in the same way as the ANOVA summary function.

te rm D f SS MS F p
mu Intercept 1 0.146 0.146 0.144 0.714
a grp 1 23.516 23.516 23.214 0.001
err Residuals 8 8.104 1.013 NA NA

Now run a one-way ANOVA on your results and compare it to what you obtained in your calculations.

##             Df Sum Sq Mean Sq F value  Pr(>F)   
## (Intercept)  1  0.146   0.146   0.144 0.71427   
## grp          1 23.516  23.516  23.214 0.00132 **
## Residuals    8  8.104   1.013                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Using the code above, write your own function that takes a table of data and returns the ANOVA results table like above.

9.7 Exercises

Download the exercises. See the answers only after you’ve attempted all the questions.