Overview
This book provides an overview of skills needed for reproducible and open research using the statistical programming language R and tidyverse packages. It covers reproducible workflows, data visualisation, data tidying and wrangling, archiving, iteration and functions, probability and data simulations.
While this book mainly focusses on technical data skills, reproducible and open research is the reason for learning these skills. The following papers provide a great overview of these concepts if you are not already familiar with them.
- Easing into open science: A guide for graduate students and their advisors (Kathawalla et al., 2021)
- An open science workflow for more credible, rigorous research (Corker, 2021)
- Seven easy steps to open science (Crüwell et al., 2019)
- A community-sourced glossary of open scholarship terms (Parsons et al., 2022)
Resources
Videos Each chapter has several short video lectures for the main learning outcomes. The videos are captioned and watching with the captioning on is a useful way to learn the jargon of computational reproducibility. If you cannot access YouTube, the videos are available by request. The videos were created in 2020, so a few aspects of the RStudio interface or the book text have changed.
-
reprores This is a custom R package for this course. You can install it with the code below. It will download all of the packages that are used in the book, along with an offline copy of this book, the shiny apps used in the book, and the exercises.
devtools::install_github("psyteachr/reprores-v3")
glossary Coding and statistics both have a lot of specialist terms. Throughout this book, jargon will be linked to the glossary. Each chapter will end with a table of glossary terms relevant to the chapter.
How to learn data skills
Learning data skills is kind of like having a gym membership (HT to Phil McAleer for the analogy). You'll be given state-of-the-art equipment to use and instructions for how to use them, but your data skills won't get any stronger unless you practice.
Data skills do not require you to memorise lots of code. You will be introduced to many different functions, but the main skill to learn is how to efficiently find the information you need. This will require getting used to the structure of help files and cheat sheets, learning how to Goggle your problem and choose a helpful solution, and learning how to read error messages.
Learning to code involves making a lot of mistakes. These mistakes are completely essential to the process, so try not to feel too frustrated. Many of the chapter exercises will give you broken code to fix so you get experience seeing what common errors look like. As you become a more experienced coder, you might not make fewer errors, but you'll recover from them much faster.
I found a bug!
This book is a work in progress, so you might find errors. Please help me fix them! The best way is to open an issue on github that describes the error, but you can also email Lisa.
Other Resources
- RStudio Cheat Sheets
- Improving Pedagogy through Registered Reports
- Learning Statistics with R by Navarro
- R for Data Science by Grolemund and Wickham
- Improving your statistical inferences on Coursera
- swirl
- R for Reproducible Scientific Analysis
- codeschool.com
- datacamp
- Style guide for R programming
- #rstats on twitter highly recommended!