2 Reports with R Markdown
Intended Learning Outcomes
- Be able to structure a project
- Be able to knit a simple reproducible report with R Markdown
- Be able to create code chunks, tables, images, and inline R in an R Markdown document
Download the R Markdown Cheat Sheet
Walkthrough video
There is a walkthrough video of this chapter available via Echo360. Please note that there may have been minor edits to the book since the video was recorded. Where there are differences, the book should always take precedence.
2.1 Setup
For reference, here are the packages we will use in this chapter. You may need to install them, as explained in Section 1.3.1, if running the code below in the console pane gives you the error Error in library(package_name) : there is no package called ‘packagename’
.
2.2 Organising a project
Before we write any code, first, we need to get organised. Projects in RStudio are a way to group all the files you need for one project. Most projects include scripts, data files, and output files like the PDF report created by the script or images.
2.2.1 Default working directory
First, make a new directory (i.e., folder) on your computer where you will keep all of your R projects. Name it something like “R-projects” (avoid spaces and other special characters). Make sure you know how to get to this directory using your computer’s Finder or Explorer.
Next, open
You can set the working directory to another location manually with menu commands:
2.2.2 Start a Project
Start by making a directory inside your default project directory where you will keep all of your materials for this class; we’d suggest naming it something like ADS-23
.
To create a new project for the work we’ll do in this chapter:
File > New Project… - Name the project
02-reports
- Save it inside the
ADS-23
directory
RStudio will restart itself and open with this new project directory as the working directory.
Click on the Files tab in the lower right pane to see the contents of the project directory. You will see a file called 02-reports.Rproj
, which is a file that contains all of the project information. When you’re in the Finder/Explorer, you can double-click on it to open up the project.
2.2.3 Naming Things
Before we start creating new files, it’s important to review how to name your files. This might seem a bit pedantic, but following clear naming rules so that both people and computers can easily find things will make your life much easier in the long run. Here are some important principles:
- file and directory names should only contain letters, numbers, dashes, and underscores, with a full stop (
.
) between the file name and extension (that means no spaces!) - be consistent with capitalisation (set a rule to make it easy to remember, like always use lowercase)
- use underscores (
_
) to separate parts of the file name, like the title and date, and dashes (-
) to separate words in each part (e.g.,social-media-report_2021-10.Rmd
) - name files with a pattern that alphabetises in a sensible order and makes it easy for you to find the file you’re looking for
- prefix a file name with an underscore to move it to the top of the list, or prefix all files with numbers to control their order
For example, these file names are a mess:
report.doc
report final.doc
Data (Customers) 11-15.xls
Customers Data Nov 12.xls
final report2.doc
project notes.txt
Vendor Data November 15.xls
Here is one way to structure them so that similar files have the same structure and it’s easy for a human to scan the list or to use code to find relevant files. See if you can figure out what the last one should be.
_project-notes.txt
report_v1.doc
report_v2.doc
report_v3.doc
data_customer_2021-11-12.xls
data_customer_2021-11-15.xls
2.3 R Markdown
Throughout this course we will use R Markdown to create reproducible reports with a table of contents, text, tables, images, and code. The text can be written using markdown, which is a way to specify formatting, such as headers, paragraphs, lists, bolding, and links.
2.3.1 New document
To open a new R Markdown document, click Important Info
. You can also change the author name. Keep the output format as HTML.
Once you’ve opened a new document be sure to save it by clicking important_info
(if you are on a Mac and can see the file extension, name it important_info.Rmd
). This file will automatically be saved in your project folder (i.e., your working directory) so you should now see this file appear in your file viewer pane.
When you first open a new R Markdown document you will see a bunch of welcome text that looks like this:
Do the following steps:
- Change the title to “Important Information” and the author to your name
- Delete everything after the setup chunk
- Skip a line after the setup chunk and type “## My info” (with the hashes but without the quotation marks); make sure there are no spaces before the hashes and at least one space after the hashes before the subtitle
- Skip a line and click the insert new code menu (a green box with a C and a plus sign) then choose
R
Your Markdown document should now look something like this:
2.3.2 Code chunks
What you have created is a subtitle and a code chunk. In R Markdown, anything written in a grey code chunk is assumed to be code, and anything written in the white space (between the code chunks) is regarded as normal text (the actual colours will depend on which theme you have applied, but we will refer to the default white and grey). This makes it easy to combine both text and code in one document.
In your code chunk, write the code you created in Section 1.4.
2.3.3 Running code
When you’re working in an R Markdown document, there are several ways to run your lines of code.
First, you can highlight the code you want to run and then click
Alternatively, you can press the green “play” button at the top-right of the code chunk and this will run all lines of code in that chunk.
Even better is to learn some of the keyboard short cuts for R Studio. To run a single line of code, make sure that the cursor is in the line of code you want to run (it can be anywhere) and press
Run your code using each of the methods above. You should see the variables name
, age
, today
, and christmas
appear in the environment pane. (Restart R to reset.)
2.3.4 Inline code
We keep talking about using R Markdown for reproducible reports, but it’s easier to show you than tell you why this is so powerful and to give you an insight into how this course will (hopefully!) change the way you work with data forever!
One important feature of R Markdown is that you can combine text and code to insert values into your writing using inline coding. If you’ve ever had to copy and paste a value or text from one file to another, you’ll know how easy it can be to make mistakes. Inline code avoids this. Again it’s easier to show you what inline code does rather than to explain it so let’s have a go.
First, copy and paste this text to the white space underneath your code chunk. If you used a different variable name than christmas
, you should update this with the name of the object you created, but otherwise don’t change anything else.
2.3.5 Knitting your file
Now we are going to knit, or compile, the file into a document type of our choosing. In this case we’ll create a default html file, but you will learn how to create other files like Word and PDF throughout this course. To knit your file, click
R Markdown will create and display a new HTML document, but it will also automatically save this file in your working directory.
As if by magic, that slightly odd bit of text you copied and pasted now appears as a normal sentence with the values pulled in from the objects you created.
My name is Emily and I am 36 years old. It is 325 days until Christmas, which is my favourite holiday.
2.4 Loading data
Now let’s try another example of using Markdown, but this time rather than using objects we have created from scratch, we will read in a data file.
Save and close your important_info.Rmd
document. Then open and save a new Markdown document, this time named sales_data.Rmd
. You can again get rid of everything after the setup chunk. Add library(tidyverse)
to the setup chunk so that tidyverse functions are available to your script.
2.4.1 Online sources
First, let’s try loading data that is stored online. Create a code chunk in your document and copy, paste, and run the below code. This code loads some simulated sales data.
- The data is stored in a
.csv
file so we’re going to use theread_csv()
function to load it in. - Note that the url is contained within double quotation marks - it won’t work without this.
This dataset is simulated sales data for different types of vehicles (originally from Kaggle) where each line of data is a single order. There are multiple ways to view and check a dataset in R. Do each of the following and make a note of what information each approach seems to give you. If you’d like more information about each of these functions, you can look up the help documentation with ?function
:
- Click on the
sales_online
object in the environment pane - Run
head(sales_online)
in the console - Run
summary(sales_online)
in the console - Run
str(sales_online)
in the console - Run
View(sales_online)
in the console
2.4.2 Local data files
More commonly, you will be working from data files that are stored locally on your computer. But where should you put all of your files? You usually want to have all your scripts and data files for a single project inside one folder on your computer, that project’s working directory, and we have already set up the main directory 02-reports
for this chapter.
You can organise files in subdirectories inside this main project directory, such as putting all raw data files in a subdirectory called data
and saving any image files to a subdirectory called images
. Using subdirectories helps avoid one single folder becoming too cluttered, which is important if you’re working on big projects.
In your 02-reports
directory, create a new folder named data
, download a copy of the sales data file, and save it in this new subdirectory.
To load in data from a local file, again we can use the read_csv()
function, but this time rather than specifying a url, give it the subdirectory and file name.
Things to note:
- You must include the file extension (in this case
.csv
) - The subdirectory folder name (
data
) and the file name are separated by a forward slash/
- Precision is important, if you have a typo in the file name it won’t be able to find your file; remember that R is case sensitive -
Sales_Data.csv
is a completely different file tosales_data.csv
as far as R is concerned.
2.5 Writing a report
We’re going to write a basic report for this sales dataset using R Markdown to show you some of the features. We’ll be expanding on almost every bit of what we’re about to show you throughout this course; the most important outcome is that you start to get comfortable with how R Markdown works and what you can use it to do.
2.5.1 Data analysis
For this report we’re just going to present some simple sales stats for three types of vehicles: planes, motorcycles, and classic cars. We’ll come back to how to write this kind of code yourself in Chapter 5. For now, see if you can follow the logic of what the code is doing via the code comments.
Create a new code chunk, then copy, paste and run the following code and then view sales_counts
by clicking on the object in the environment pane. Note that it doesn’t really matter whether you use sales_local
or sales_online
in the first line as they’re identical.
Because each row of the dataset is a sale, this code gives us a nice and easy way of seeing how many sales were made of each type of vehicle; it just counts the number of rows in each group.
PRODUCTLINE | n |
---|---|
Classic Cars | 967 |
Motorcycles | 331 |
Planes | 306 |
2.5.2 Text formatting
You can use the visual markdown editor if you have RStudio version 1.4 or higher. This will be a button at the top of the source pane and the menu options should be very familiar to anyone who has worked with software like Microsoft Word.
This is useful for complex styling, but you can also use these common plain-text style markups:
- Headers are created by prefacing subtitles with one or more hashes (
#
) and a space (do not exclude the space). If you include a table of contents, this will be created from your document headers. - Format text with italics or bold by surrounding the text with one or two asterisks or underscores.
- Make lists using numbers, asterisks or dashes before items. Indent items to make nested lists.
- Make links like this:
[psyTeachR](https://psyteachr.github.io/)
- Download the R Markdown Cheat Sheet to learn more.
Copy and paste the below text into the white space below the code chunk that loads in the data. Save the file and then click knit to view the results. It will look a bit messy for now as it contains the code and messages from loading the data but don’t worry, we’ll get rid of that soon.
## Sample sales report
This report summarises the sales data for different types of vehicles sold between 2003 and 2005. This data is from [Kaggle](https://www.kaggle.com/kyanyoga/sample-sales-data).
### Sales by type
The *total* number of **planes** sold was `r sales_counts$n[3]`
The *total* number of **classic cars** sold was `r sales_counts$n[1]`.
Try and match up the inline code with what is in the sales_counts
table. Of note:
- The
$
sign is used to indicate specific variables (or columns) in an object using theobject$variable
syntax. - Square brackets with a number e.g.,
[3]
indicate a particular observation - So
sales_counts$n[3]
asks the inline code to display the third observation of the variablen
in the datasetsales_online
.
2.5.3 Code comments
In the above code we’ve used code comments and it’s important to highlight how useful these are. You can add comments inside R chunks with the hash symbol (#
). R will ignore characters from the hash to the end of the line.
It’s usually good practice to start a code chunk with a comment that explains what you’re doing there, especially if the code is not explained in the text of the report.
If you name your objects clearly, you often don’t need to add clarifying comments. For example, if I’d named the three objects above total_number_of_sales
, first_year
and last_year
, I would omit the comments. It’s a bit of an art to comment your code well, but try to add comments as you’re working through this book - it will help consolidate your learning and when future you comes to review your code, you’ll thank past you for being so clear.
2.5.4 Images
As the saying goes, a picture paints a thousand words and sometimes you will want to communicate your data using visualisations.
Create a code chunk to display a graph of the data in your document after the text we’ve written so far. We’ll use some code that you’ll learn more about in Chapter 3 to make a simple bar chart that represents the sales data – focus on trying to follow how bits of the code map on to the plot that is created.
Copy and paste the below code. Run the code in your Markdown to see the plot it creates and then knit the file to see how it is displayed in your document.
sales_data.Rmd
You can also include images that you did not create in R using the markdown syntax for images or knitr::include_graphics()
. This is very similar to loading data in that you can either use an image that is stored on your computer, or via a url.
Create a new code chunk underneath each of the sales figures for planes, classic cars, and motorcycles and add in an image from Google or Wikipedia for each (right click on an image and select copy image address to get a url). See the section on chunk defaults to see how to change the display size.
Alternatively, you can use the markdown notation ![caption](url)
to show an image. This goes in the markdown text section of the document, not inside is grey code block. The caption is optional; you can omit it like this:
![](images/reports/google-images.png)
2.5.5 Tables
Rather than a figure, we might want to display our data in a table.
- Add a new level 2 heading (two hashtags) to your document, name the heading “Data in table form” and then create a new code chunk below this.
First, let’s see what the table looks like if we don’t make any edits. Simply write the name of the table you want to display in the code chunk (in our case sales_counts
) and then click knit to see what it looks like.
## # A tibble: 3 × 2
## # Groups: PRODUCTLINE [3]
## PRODUCTLINE n
## <chr> <int>
## 1 Classic Cars 967
## 2 Motorcycles 331
## 3 Planes 306
It’s just about readable but it’s not great.
Another way to customise tables uses the function kable()
from the kableExtra
package.
Amend your code to load the kableExtra
package and apply the kable()
function to the table. Once you’ve done this, knit the file again to see the output.
sales_data.Rmd
PRODUCTLINE | n |
---|---|
Classic Cars | 967 |
Motorcycles | 331 |
Planes | 306 |
It’s better, but it’s still not amazing. So let’s make a few adjustments. We can change the names of the columns, add a caption, and also change the alignment of the cell contents using arguments to kable()
.
We can also add a theme to change the overall style. In this example we’ve used kable_classic
but there are 5 others: kable_paper
, kable_classic_2
, kable_minimal
, kable_material
and kable_material_dark
. Try them all and see which one you prefer.
Finally, we can change the formatting of the first row using row_spec
. Look up the help documentation for row_spec
to see what other options are available. Try changing the value of any of the arguments below to figure out what they do.
sales_data.Rmd
Product | Sales |
---|---|
Classic Cars | 967 |
Motorcycles | 331 |
Planes | 306 |
2.6 Refining your report
2.6.1 Chunk defaults
Let’s finish by tidying up the report and organising our code a bit better. When you create a new R Markdown file in RStudio, a setup chunk is automatically created - we’ve mostly ignored this chunk until now.
You can set more default options for your document here. Type the following code into the console to see the full list of options that you can set and their default values. However, the most useful and common options to change for the purposes of writing reports revolve around whether you want to show your code and the size of your images.
Replace the code in your setup chunk with the below code and then try changing each option from FALSE
to TRUE
and changing the numeric values then knit the file again to see the difference it makes.
```{r setup, include=FALSE}
knitr::opts_chunk$set(
echo = FALSE, # whether to show code chunks
message = FALSE, # whether to show messages from your code
warning = FALSE, # whether to show warnings from your code
fig.width = 8, # figure width in inches (at 96 dpi)
fig.height = 5, # figure height in inches (at 96 dpi)
out.width = "50%" # figures/images span 50% of the page width
)
```
2.6.2 Override defaults
These setup options change the behaviour for the entire document, however, you can override the behaviour for individual code chunks.
For example, by default you might want to hide your code but there also might be an occasion where you want to show the code you used to analyse your data. You can set echo = FALSE
in your setup chunk to make hiding code the default but in the individual code chunk for your plot set echo = TRUE
. Try this now and knit the file to see the results.
Additionally, you can also override the default image display size or dimensions.
2.6.3 Loading packages
You should add the packages you need in your setup chunk using library()
. Often when you are working on a script, you will realize that you need to load another add-on package. Don’t bury the call to library(package_I_need)
way down in the script. Put it in the setup chunk so the user has an overview of what packages are needed.
2.6.4 YAML header
Finally, the YAML header is the bit at the very top of your Markdown document. You can set several options here as well.
---
title: "Sales Data Report"
author: "Your name"
output:
html_document:
df_print: paged
theme:
version: 4
bootswatch: yeti
toc: true
toc_float:
collapsed: false
smooth_scroll: false
toc_depth: 3
number_sections: false
---
The df_print: paged
option prints data frames using rmarkdown::paged_table()
automatically. You can use df_print: kable
to default to the simple kable style, but you will need the code from Section 2.5.5 for more complex tables with kableExtra.
The built-in bootswatch themes are: default, cerulean, cosmo, darkly, flatly, journal, lumen, paper, readable, sandstone, simplex, spacelab, united, and yeti. You can view and download more themes. Try changing the theme to see which one you like best.
2.6.5 Table of Contents
The table of contents is created by setting toc: true
. It will be displayed at the top of your document unless you set toc_float: true
or include toc_float:
with its options collapsed
and smooth_scroll
(options for a setting are indented under it).
---
output:
html_document:
toc: true
toc_float:
collapsed: false
smooth_scroll: false
toc_depth: 3
---
This will use the markdown header structure to create the table of contents. toc_depth: 3
means that the table of contents will only display headers up to level 3 (i.e., those that start with three hashes: ###
). Add {-}
after the header title to remove it from the table of contents (e.g., ### Overview {-}
).
2.6.6 Formats
So far we’ve just knitted to html. To generate PDF reports, you need to install
Once you’ve done this, update your YAML heading to add a pdf_document
section and knit a PDF document. The options for PDFs are more limited than for HTML documents, so if you just replace html_document
with pdf_document
, you may need to remove some options, such as toc_float
if you get an error that looks like “Functions that produce HTML output found in document targeting PDF output.”
---
output:
pdf_document:
toc: TRUE
html_document:
toc: TRUE
toc_float: TRUE
---
As an alternative, you can also knit to a Word document. When you click the Knit
button, the first format will knit by default, but you can use the drop-down menu under the Knit button to choose another format.
---
output:
pdf_document:
toc: TRUE
html_document:
toc: TRUE
toc_float: TRUE
word_document:
toc: TRUE
---
2.6.7 Summary
This chapter has covered a lot but hopefully now you have a much better idea of what Markdown is able to do. Whilst working in Markdown takes longer in the initial set-up stage, once you have a fully reproducible report you can plug in new data each week or month and simply click knit, reducing duplication of effort, and the human error that comes with it.
You can access a working R Markdown file with the code from the example above to compare to your own code.
As you continue to work through the book you will learn how to wrangle and analyse your data and how to use Markdown to present it. We’ll slowly build on the available customisation options so over the course of next few weeks, you’ll find your Markdown reports start to look more polished and professional.
2.7 Exercises
Below are some additional exercises that will let you apply what you have learned in this chapter. We would suggest taking a break before you do these - it might feel slightly more effortful, but spreading out your practice will help you learn more in the long run.
2.7.1 New project
Create a new project called “demo_report” (Section 2.2).
2.7.2 New script
In the “demo_report” project, create a new Rmarkdown document called “job.Rmd” (Section 2.3). Edit the YAML header to output tables using kable and set a custom theme (Section 2.6.4).
---
title: "My Job"
author: "Me"
output:
html_document:
df_print: kable
theme:
version: 4
bootswatch: sandstone
---
2.7.3 R Markdown
Write a short paragraph describing your job or a job you might like to have in the future (Section 2.5.2). Include a bullet-point list of links to websites that are useful for that job (Section 2.5.2).
I am a research psychologist who is interested in open science
and teaching computational skills.
* [psyTeachR books](https://psyteachr.github.io/)
* [Google Scholar](https://scholar.google.com/)
2.7.4 Tables
Use the following code to load a small table of tasks (Section 2.3.2). Edit it to be relevant to your job (you can change the categories entirely if you want).
Figure out how to make it so that code chunks don’t show in your knitted document (Section 2.6.1).
You can set the default to echo = FALSE
in the setup chunk at the top of the script.
To set visibility for a specific code chunk, put echo = FALSE
inside the curly brackets.
Display the table with purple italic column headers. Try different styles using
2.7.5 Images
Add an image of anything relevant (Section 2.5.4).
You can add an image from the web using its URL:
![Applied Data Skills](https://psyteachr.github.io/ads-v2/images/logos/logo.png)
Or save an image into your project directory (e.g., in the images folder) and add it using the relative path:
![Applied Data Skills](images/logos/logo.png)
2.7.6 Inline R
Use inline R to include the version of R you are using in the following sentence: “This report was created using R version 4.2.1 (2022-06-23).” You can get the version using the object R.version.string
(Section 2.3.4).
This report was created using R version 4.2.1 (2022-06-23)
.
2.7.7 Knit
Knit this document to html (Section 2.3.5).
2.8 Glossary
term | definition |
---|---|
directory | A collection or “folder” of files on a computer. |
extension | The end part of a file name that tells you what type of file it is (e.g., .R or .Rmd). |
knit | To create an HTML, PDF, or Word document from an R Markdown (Rmd) document |
markdown | A way to specify formatting, such as headers, paragraphs, lists, bolding, and links. |
project | A way to organise related files in RStudio |
r markdown | The R-specific version of markdown: a way to specify formatting, such as headers, paragraphs, lists, bolding, and links, as well as code blocks and inline code. |
script | A plain-text file that contains commands in a coding language, such as R. |
working directory | The filepath where R is currently reading and writing files. |
yaml | A structured format for information |
2.9 Further Resources
- R Markdown Cheat Sheet
- R Markdown Tutorial
- R Markdown: The Definitive Guide by Yihui Xie, J. J. Allaire, & Garrett Grolemund
- Chapter 27: R Markdown of R for Data Science
- Project Structure by Danielle Navarro
- How to name files by Jenny Bryan
- kableExtra
- gt