This course provides an overview of skills needed for reproducible research and open science using the statistical programming language R. Students will learn about data visualisation, data tidying and wrangling, archiving, iteration and functions, probability and data simulations, general linear models, and reproducible workflows. Learning is reinforced through weekly assignments that involve working with different types of data.

0.1 Course Aims

This course aims to teach students the basic principles of reproducible research and to provide practical training in data processing and analysis in the statistical programming language R.

0.2 Intended Learning Outcomes

By the end of this course students will be able to:

  • Draw on a range of specialised skills and techniques to formulate a research design appropriate to various kinds of questions in psychology and neuroscience
  • Write scripts in R to organise and transform data sets using best accepted practices
  • Explain basics of probability and its role in statistical inference
  • Critically analyse data and report descriptive and inferential statistics in a reproducible manner

0.3 Course Outline

The overview below lists the beginner learning outcomes only. Some lessons have additional learning outcomes for intermediate or advanced students.

  1. Getting Started
    1. Understand the components of the RStudio IDE
    2. Type commands into the console
    3. Understand function syntax
    4. Install a package
    5. Organise a project
    6. Appropriately structure an R script or RMarkdown file
    7. Create and compile an Rmarkdown document
  2. Working with Data
    1. Understand the use the basic data types
    2. Understand and use the basic container types (list, vector)
    3. Create vectors and store them as variables
    4. Understand vectorized operations
    5. Create a data table
    6. Import data from CSV and Excel files
  3. Data Visualisation
    1. Understand what types of graphs are best for different types of data
    2. Create common types of graphs with ggplot2: geom_bar(), geom_density(), geom_freqpoly(), geom_histogram(), geom_violin(), geom_boxplot(), geom_col(), geom_point(), geom_smooth()
    3. Set custom labels and colours
    4. Represent factorial designs with different colours or facets
    5. Save plots as an image file
  4. Tidy Data
    1. Understand the concept of tidy data
    2. Be able to use the 4 basic tidyr verbs: gather(), separate(), spread(), unite()
    3. Be able to chain functions using pipes
  5. Data Wrangling
    1. Be able to use the 6 main dplyr one-table verbs: select(), filter(), arrange(), mutate(), summarise(), group_by()
  6. Data Relations
    1. Be able to use the 4 mutating join verbs: left_join(), right_join(), inner_join(), full_join()
    2. Use the by argument to set the join columns
  7. Iteration & Functions
    1. Work with iteration functions: rep(), seq(), and replicate()
    2. Use arguments by order or name
    3. Write your own custom functions with function()
    4. Set default values for the arguments in your functions
  8. Probability & Simulation
    1. Understand what types of data are best modeled by different distributions: uniform, binomial, normal, poisson
    2. Generate and plot data randomly sampled from the above distributions
    3. Test sampled distributions against a null hypothesis using: exact binomial test, t-test (1-sample, independent samples, paired samples), correlation (pearson, kendall and spearman)
    4. Define the following statistical terms: p-value, alpha, power, smallest effect size of interest (SESOI), false positive (type I error), false negative (type II error), confidence interval (CI)
    5. Calculate power using iteration and a sampling function
  9. Introduction to GLM
    1. Define the components of the GLM
    2. Simulate data using GLM equations
    3. Identify the model parameters that correspond to the data-generation parameters
    4. Understand and plot residuals
    5. Predict new values using the model
    6. Explain the differences among coding schemes
  10. Reproducible Workflows
    1. Create a reproducible script in R Markdown
    2. Edit the YAML header to add table of contents and other options
    3. Include a table
    4. Include a figure
    5. Use source() to include code from an external file
    6. Report the output of an analysis using inline R

0.4 Formative Exercises

Exercises are available at the end of each lesson's webpage. These are not marked or mandatory, but if you can work through each of these (using web resources, of course), you will easily complete the marked assessments.

Download all exercises and data files below as a ZIP archive.

0.5 Packages used in this book

  • tidyverse
  • broom
  • cowsay
  • goodshirt
  • ukbabynames
  • cowplot
  • plotly
  • MASS
  • ggExtra
  • faux

0.6 Resources

Miscellanous materials added throughout the semester, such as tips on installation, or the results of live-coding demos, can be found in the Appendices.

0.6.2 Cheat sheets

  • You can access several cheatsheets in RStudio under the Help menu
  • Or get the most recent RStudio Cheat Sheets

0.6.3 Other