14 Oneway ANOVA
14.0.1 Background: Intrusive memories
In the lecture reading materials you have worked through calculating an ANOVA by hand in order to gain a conceptual understanding. However, when you run an ANOVA, typically the computer does all of these calculations for you. In this chapter we'll show you how to run a onefactor and factorial ANOVA using the afex
package and posthoc tests using a package called emmeans
.
In this example we will be using data from experiment 2 of James, E. L., Bonsall, M. B., Hoppitt, L., Tunbridge, E. M., Geddes, J. R., Milton, A. L., & Holmes, E. A. (2015). Computer game play reduces intrusive memories of experimental trauma via reconsolidationupdate mechanisms. Psychological Science, 26, 12011215.
The abstract for the paper is as follows:
Memory of a traumatic event becomes consolidated within hours. Intrusive memories can then flash back repeatedly into the mind's eye and cause distress. We investigated whether reconsolidation  the process during which memories become malleable when recalled  can be blocked using a cognitive task and whether such an approach can reduce these unbidden intrusions. We predicted that reconsolidation of a reactivated visual memory of experimental trauma could be disrupted by engaging in a visuospatial task that would compete for visual working memory resources. We showed that intrusive memories were virtually abolished by playing the computer game Tetris following a memoryreactivation task 24 hr after initial exposure to experimental trauma. Furthermore, both memory reactivation and playing Tetris were required to reduce subsequent intrusions (Experiment 2), consistent with reconsolidationupdate mechanisms. A simple, noninvasive cognitivetask procedure administered after emotional memory has already consolidated (i.e., > 24 hours after exposure to experimental trauma) may prevent the recurrence of intrusive memories of those emotional events.
14.0.2 Activity 1: Setup
Do the following:
 Open R Studio and set the working directory to your chapter folder. Ensure the environment is clear.
 Open a new R Markdown document and save it in your working directory. Call the file "Oneway ANOVA".
 Download James Holmes_Expt 2_DATA.csv and save it in your chapter folder.
 If you're on the server, avoid a number of issues by restarting the session  click
Session
Restart R
 In a new code chunk, type and run the code that loads
pwr
,lsr
,car
,broom
,afex
,emmeans
,performance
andtidyverse
using thelibrary()
function and loads the data into an object nameddat
usingread_csv()
. If you are working on your own machine you may need to installafex
andemmeans
but as always do not install packages on university machines.
 Add (hint: mutate) a column to
dat
calledsubject
that equalsrow_number()
to act as a participant ID which is currently missing from the data set.
14.0.3 Activity 2: Data wrangling
There are a lot of columns in this data set that we don't need for this analysis and the names of the variable are also long and difficult to work with.
 Create a new object called
dat2
that just has the three columns we need  useselect()
to select the columnssubject
,Condition
, andDays_One_to_Seven_Image_Based_Intrusions_in_Intrusion_Diary
 Use
rename()
to renameDays_One_to_Seven_Image_Based_Intrusions_in_Intrusion_Diary
tointrusions
 See if you can do this all in one pipeline
 Hint: in rename,
new_name = old_name
14.0.4 Activity 3: Numbers and factors
In addition to the names of the variables being too long, the levels of Condition
are named 1,2,3,4 which R will think is a number rather than a category. We're going to overwrite the column Condition
with a column that recodes these numbers as a factor. Copy and paste the code below into your Markdown and then run it.
This is a really important step. If you forget to recode variables as factors and R treats them as numbers, a lot of things won't work. Trust us, we've spent a lot of time trying to figure out what was wrong because we forgot to do this step!
14.0.5 Activity 4: Create summary statistics
Next we want to calculate some descriptive statistics. We're really interested in the scores from each experimental group rather than overall.
 Create an object called
sum_dat
that contains the mean, standard deviation and standard error for the number of intrusions grouped byCondition
 Use the pipe to achieve this in one pipeline
 Your table should have four columns,
Condition
,mean
,sd
, andse
.
##
## * Use group_by(some_grouping_variable) %>% summarise(...)
## * standard error = sd/sqrt(n) = sd/sqrt(length(some_variable_name)
14.0.6 Activity 5: Visualisation
Now we can visualise the data. In the original paper they use a bar plot, which we will reproduce later but for now let's use a better plot that gives us more information about the data.
 Create the below violinboxplot with number of intrusions on the yaxis and condition on the xaxis (see the Visualisation chapter for more info).
 Change the labels on the xaxis to something more informative (hint:
scale_x_discrete(labels = c("label names")
)
We can see from this plot that there are outliers in each of the groups. This information isn't present in the bar plot, which is why it's not a good idea to use them but we will reproduce it anyway.The below code shows you how to produce the bar plot that is presented in the paper. Try and figure out what each bit of code is doing in the plot (remember to use the help documentation for each function) and see what happens when you change the values for each argument.
ggplot(sum_dat, aes(x = Condition, y = mean, fill = Condition))+
stat_summary(fun = "mean", geom = "bar", show.legend = FALSE)+
geom_errorbar(aes(ymin = mean  se, ymax = mean + se), width = 0.25)+
scale_y_continuous(limits = c(0,7),
breaks = c(0,1,2,3,4,5,6,7),
name = "IntrusiveMemory Frequency (Mean for the Week")+
scale_x_discrete(labels = c("Notask control", "Reactivation plus Tetris", "Tetris only",
"Reactivation only"))
14.0.7 Activity 6: Oneway ANOVA
Now we can run the oneway ANOVA using aov_ez
from the afex
package and save it to the object mod
. As well as running the ANOVA, the aov_ez
function also conducts a Levene's test for homogeneity of variance so that we can test our final assumption.
aov_ez()
will likely produce some messages that look like errors, do not worry about these, they are just letting you know what it's done.
 Copy and paste the below code to run and then view the results of the ANOVA using
anova(mod)
.
mod < aov_ez(id = "subject", # the column containing the subject IDs
dv = "intrusions", # the DV
between = "Condition", # the betweensubject variable
es = "pes", # sets effect size to partial etasquared
type = 3, # this affects how the sum of squares is calculated, set this to 3
include_aov = TRUE,
data = dat2)
anova(mod)
Just like with the ttests and correlations, we can use tidy()
to make the output easier to work with.
 Run the below code to transform the output. Don't worry about the warning message, it is just telling you it doesn't know how to automatically rename the columns so it will keep the original names.
mod_output < (mod$anova_table) %>% tidy()
## Warning in tidy.anova(.): The following column names in ANOVA output were not
## recognized or transformed: num.Df, den.Df, MSE, ges

term
= the IV

num.Df
= degrees of freedom effect 
den.Df
= degrees of freedom residuals 
MSE
= Meansquared errors 
statistic
= Fstatistic 
ges
= effect size

p.value
= p.value
You should refer to the lecture for more information on what each variable means and how it is calculated.
 Is the overall effect of Condition significant?
 What is the Fstatistics to 2 decimal places?
 According to the rules of thumb, the effect size is
14.0.8 Activity 7: Assumption checking
You may be wondering why we haven't yet checked the assumptions. Well, unlike the ttests and correlations, in order to test the assumptions we need to use the model we created with aov_ez()
, so we couldn't assess them all until this point. For a oneway independent ANOVA, the assumptions are the same as for a Student's ttest:
 The DV is interval or ratio data
 The observations should be independent
 The residuals should be normally distributed
 There should be homogeneity of variance between the groups
We know that 1 and 2 are met because of the design of our study. To test 3, we can look at the QQplot of the residuals and test for normality with the ShapiroWilk test. The residuals have been stored as one of the components of mod
. To access them we specify mod$aov$residuals
.
qqPlot(mod$aov$residuals)
shapiro.test(mod$aov$residuals)
## [1] 11 60
##
## ShapiroWilk normality test
##
## data: mod$aov$residuals
## W = 0.87739, pvalue = 4.252e06
There are a few things to note about the assumption test results. First, look at the pvalue for the ShapiroWilk test  4.252e06
. Whenever you see the e
at the end of a number it means that R is using scientific notation. Scientific notation is a way of writing very large or very small numbers. Because the number after the e
is negative it means the number should be divided by 10 to the power of six. Put simply, move the decimal place six places to the left and you will get the standard number. When reporting pvalues in your results section, you should not use scientific notation, instead you should round to 3 decimal places.
 What is the value of
4.252e06
?
If you want R to round this for you to make it easier to read, you could use the below code to save it to an object, tidy it and then round the p.value. Just remember that in APA style you should never write "p = 0", instead, you should write "p < .001" (because p will never equal actual zero, it can just be very, very, very small).
shapiro < shapiro.test(mod$aov$residuals) %>% #run the test
tidy() %>% # tidy the output
mutate(p.value = round(p.value, digits = 3)) # overwrite the pvalue with one rounded to 3 decimal places
If you find scientific notation difficult to read, you can paste and run the following code in your console to turn it off
options(scipen=999)
The second thing to note is that from both the qqplot and the ShapiroWilk test it is clear that the assumption of normality has not been met. Is this a problem? Well, Field et al. (2009) say that if the sample sizes for each group are equal then ANOVA is robust to violations of both normality and of homogeneity of variance. There's also a good discussion of this here but it is a bit technical. We can check how many participants are in each condition using count()
:
dat2 %>% count(Condition)
Condition  n 

1  18 
2  18 
3  18 
4  18 
Thankfully the sample sizes are equal so we should be OK to proceed with the ANOVA. It is not clear whether normality was checked in the original paper.
For the last assumption, we can test homogeneity of variance with Levene's test and the function test_levene()
from performance
. The code for this is very simple, you just need to supply the ANOVA model we created earlier mod
.
test_levene(mod)
## Warning: Functionality has moved to the 'performance' package.
## Calling 'performance::check_homogeneity()'.
## Warning: Variances differ between groups (Levene's Test, p = 0.039).
The results from Levene's test show that the assumption of homogeneity of variance has also not been met. The paper does indicate this might be the case as it specifies that the ANOVAs do not assume equal variance, however, the results of the ANOVA that are reported are identical to our results above where no correction has been made although the posthoc tests are Welch tests (you can tell this because the degrees of freedom have been adjusted and are not whole numbers).
Whilst all of this might seem very confusing  we imagine you might be wondering what the point of assumption testing is given that it seems to be ignored  we're showing you this for three reasons:
 To reassure you that sometimes the data can fail to meet the assumptions and it is still ok to use the test. To put this in statistical terms, many tests are robust to mild deviations of normality and unequal variance, particularly with equal sample sizes.
 As a critical thinking point, to remind you that just because a piece of research has been published does not mean it is perfect and you should always evaluate whether the methods used are appropriate.
 To reinforce the importance of preregistration where these decisions could be made in advance, and/or open data and code so that analyses can be reproduced exactly to avoid any ambiguity about exactly what was done. In this example, given the equal sample sizes and the difference in variance between the groups isn't too extreme, it looks like it is still appropriate to use an ANOVA but the decisions and justification for those decisions could have been more transparent.
14.0.9 Activity 8: Posthoc tests
For posthoc comparisons, as mentioned, the paper appears to have computed Welch ttests but there is no mention of any multiple comparison correction. We could reproduce these results by using t.test
for each of the contrasts.
For example, to compare condition 1 (the control group) with condition 2 (the reactivation plus tetris group) we could run:
dat2 %>%
filter(Condition %in% c("1", "2")) %>%
droplevels() %>%
t.test(intrusions ~ Condition, data = .)
Because Condition
has four levels, we can’t just specify intrustion ~ Condition
because a ttest compares two groups and it wouldn’t know which of the four to compare so first we have to filter the data and use a new function droplevels()
. It's important to remember that when it comes to R there are two things to consider, the data you can see and the underlying structure of that data. In the above code we use filter()
to select only conditions 1 and 2 so that we can compare them. However, that doesn't change the fact that R "knows" that Condition
has four levels  it doesn't matter if two of those levels don't have any observations any more, the underlying structure still says there are four groups. droplevels()
tells R to remove any unused levels from a factor. Try running the above code but without droplevels()
and see what happens.
However, a quicker and better way of doing this that allows you apply a correction for multiple comparisons easily is to use emmeans()
which computes all possible pairwise comparison ttests and applies a correction to the pvalue.
First, we use emmeans()
to run the comparisons and then we can pull out the contrasts and use tidy()
to make it easier to work with.
 Run the code below. Which conditions are significantly different from each other? Are any of the comparisons different from the ones reported in the paper now that a correction for multiple comparisons has been applied?
mod_pairwise <emmeans(mod, pairwise ~ Condition, adjust = "bonferroni")
mod_contrasts < mod_pairwise$contrasts %>% tidy()
The inquisitive amongst you may have noticed that mod
is a list of 5 and seemingly contains the same thing three times: anova_table
, aov
and Anova
. The reasons behind the differences are too complex to go into detail on this course (see here for more info) but the simple version is that anova_table
and Anova
use one method of calculating the results (type 3 sum of squares) and aov
uses a different method (type 1 sum of squares). What's important for your purposes is that you need to use anova_table
to view the overall results (and replicate the results from papers) and aov
to run the followup tests and to get access to the residuals (or lm()
for factorial ANOVA). As always, precision and attention to detail is key.
14.0.10 Activity 9: Power and effect size
Finally, we can replicate their power analysis using pwr.anova.test
.
On the basis of the effect size of d = 1.14 from Experiment 1, we assumed a large effect size of f = 0.4. A sample size of 18 per condition was required in order to ensure an 80% power to detect this difference at the 5% significance level.
pwr.anova.test(k = 4, f = .4, sig.level = .05, power = .8)
##
## Balanced oneway analysis of variance power calculation
##
## k = 4
## n = 18.04262
## f = 0.4
## sig.level = 0.05
## power = 0.8
##
## NOTE: n is number in each group
We've already got the effect size for the overall ANOVA, however, we should also really calculate Cohen's D using cohensD
from lsr
for each of the pairwise comparisons. This code is a little complicated because you need to do it separately for each comparison, bind them all together and then add them to mod_contrasts
 just make sure your understand which bits of the code you would need to change to run this on different data.
d_1_2 < cohensD(intrusions ~ Condition,
data = filter(dat2, Condition %in% c(1,2)) %>%
droplevels())
d_1_3 < cohensD(intrusions ~ Condition,
data = filter(dat2, Condition %in% c(1,3)) %>%
droplevels())
d_1_4 < cohensD(intrusions ~ Condition,
data = filter(dat2, Condition %in% c(1,4)) %>%
droplevels())
d_2_3 < cohensD(intrusions ~ Condition,
data = filter(dat2, Condition %in% c(2,3)) %>%
droplevels())
d_2_4 < cohensD(intrusions ~ Condition,
data = filter(dat2, Condition %in% c(2,4)) %>%
droplevels())
d_3_4 < cohensD(intrusions ~ Condition,
data = filter(dat2, Condition %in% c(3,4)) %>%
droplevels())
pairwise_ds < c(d_1_2,d_1_3,d_1_4,d_2_3,d_2_4,d_3_4)
mod_contrasts < mod_contrasts %>%
mutate(eff_size = pairwise_ds)
What are your options if the data don't meet the assumptions and it's really not appropriate to continue with a regular oneway ANOVA? As always, there are multiple options and it is a judgement call.
 You could run a nonparametric test, the KruskalWallis for betweensubject designs and the Friedman test for withinsubject designs.
 If normality is the problem, you could try transforming the data. Field et al. (2009) has a good section on data transformation.
 You could use bootstrapping, which is not something we will cover in this course at all. Again, Field et al. (2009) covers this although it is a little complicated.
14.0.11 Activity 10: Writeup
The below code replicates the writeup in the paper, although has changed the Welch ttest to the pairwise comparisons from emmeans
.
for the 7day diary postintervention, there was a significant difference between groups in overall intrusion frequency in daily life, F(`r mod_output$num.Df`, `r mod_output$den.Df`) = `r mod_output$statistic %>% round(2)`, p = `r mod_output$p.value %>% round(3)`, ηp2 = .`r mod_output$ges %>% round(2)`. Planned comparisons demonstrated that relative to the notask control group, only those in the reactivationplusTetris group, t(`r mod_contrasts$df[1]`) = `r mod_contrasts$statistic[1] %>% round(2)`, p = `r mod_contrasts$adj.p.value[1] %>% round(2)`, d = `r mod_contrasts$eff_size[1] %>% round(2)`, experienced significantly fewer intrusive memories; this finding replicated Experiment 1. The reactivationplusTetris group had significantly fewer intrusive thoughts than the reactivationonly group, t(`r mod_contrasts$df[5]`) = `r mod_contrasts$statistic[5] %>% round(2)`, p = `r mod_contrasts$adj.p.value[5] %>% round(2)`, d = `r mod_contrasts$eff_size[5] %>% round(2)`. Further, there were no significant differences between the reactivationplusTetris group and the Tetrisonly group, t(`r mod_contrasts$df[4]`) = `r mod_contrasts$statistic[4] %>% round(2)`, p = `r mod_contrasts$adj.p.value[4] %>% round(2)`, d = `r mod_contrasts$eff_size[4] %>% round(2)`, the notask control group and the reactivationonly group, t(`r mod_contrasts$df[3]`) = `r mod_contrasts$statistic[3] %>% round(2)`, p = `r mod_contrasts$adj.p.value[3] %>% round(2)`, or between the notask control group and the Tetrisonly group, t(`r mod_contrasts$df[2]`) = `r mod_contrasts$statistic[2] %>% round(2)`, p = `r mod_contrasts$adj.p.value[2] %>% round(2)` Second, and critically,
Second, and critically, for the 7day diary postintervention, there was a significant difference between groups in overall intrusion frequency in daily life, F(3, 68) = 3.79, p = 0.014, ηp2 = .0.14. Planned comparisons demonstrated that relative to the notask control group, only those in the reactivationplusTetris group, t(68) = 3.04, p = 0.02, d = 1, experienced significantly fewer intrusive memories; this finding replicated Experiment 1. Critically, as predicted by reconsolidation theory, the reactivationplusTetris group had significantly fewer intrusive memories than the Tetrisonly group, t(68) = 1.89, p = 0.38, d = 0.84, as well as the reactivationonly group, t(68) = 2.78, p = 0.04, d = 1.11. Further, there were no significant differences between the notask control group and the reactivationonly group, t(68) = 0.26, p = 1, or between the notask control group and the Tetrisonly group, t(68) = 1.15, p = 1
14.0.12 Activity solutions
Below this line you will find the solutions to the above tasks. Only look at them after giving the tasks a good try yourself!
14.0.12.1 Activity 1
library("pwr")
library("lsr")
library("car")
library("broom")
library("afex")
library("emmeans")
ry("performance")
libr\library("tidyverse")
< read_csv("James Holmes_Expt 2_DATA.csv")%>%
dat mutate(subject = row_number())
** Click tab to see solution **