Appendix F — Debugging

The best advice for dealing with errors (debugging) is to check your work early and often. This means setting up your report outline and getting all of the YAML header bugs fixed before you deal with adding code. Once the general structure and look of the report is right, start adding code and testing that everything is working after each code block and knitting after every section. This way, when you encounter the inevitable errors, there is only a small amount of new code to check.

F.1 Report Setup

Create a new R Markdown file and delete everything below the setup chunk. Edit the YAML header to use a floating table of contents and add the outline of your report.

---
title: "Report"
date: "2022-12-21"
output: 
  html_document:
    toc: true
    toc_float: true
---



## Introduction

## Data

### Term 1

### Term 2

## Analysis

## References

Save this file and knit it. Ideally, this will generate some output in a new tab in the console pane called “Render” that starts with processing file: report-demo.Rmd and ends with Output created: report-demo.html. There will be a lot of output in between, but you don’t need to worry about it until something goes wrong.

F.2 YAML Errors

One of the more frequent problems is errors in the YAML header. Let’s create a few to see how to deal with them.

F.2.1 YAML borders

Delete the last dash below the header and knit.

---
title: "Report"
date: "2022-12-21"
output: 
  html_document:
    toc: true
    toc_float: true
--

This will actually knit without error (and look odd), but you’ll get a warning about the empty title. This is because R Markdown doesn’t recognise that there even is a YAML header if the three dashes to start and end it aren’t right.

F.2.2 Spaces

Unlike R and markdown, YAML is extremely picky about spaces. Try removing the space after the colon after “toc”.

---
title: "Report"
date: "2022-12-21"
output: 
  html_document:
    toc:true
    toc_float: true
---

You should get an error that looks like this:

Error in yaml::yaml.load(..., eval.expr = TRUE) : 
  Scanner error: mapping values are not allowed in this context at line 6, column 14
Calls: <Anonymous> ... parse_yaml_front_matter -> yaml_load -> <Anonymous>
Execution halted

If you see Error in yaml and it gives you a line and column number, this refers to the YAML line, so start counting with 1 at the title line. Sometimes the actual problem is in the line above or below the reference. Here, the problem is a missing space in the toc line, but that doesn’t cause an error in the YAML parsing until it gets to the next line.

F.2.3 Indenting

YAML is also extremely picky about indenting. A common error is not putting html_document: on a separate line when adding options like a table of contents.

---
title: "Report"
date: "2022-12-21"
output: html_document:
    toc: true
    toc_float: true
---

You should get an error that looks like this:

Error in yaml::yaml.load(..., eval.expr = TRUE) : 
  Scanner error: mapping values are not allowed in this context at line 3, column 22

Some indenting problems don’t cause an error, but result in an output that isn’t doing what you expect. Try removing the indent for the table of contents lines and knitting.

---
title: "Report"
date: "2022-12-21"
output: 
  html_document:
  toc: true
  toc_float: true
---

F.3 Common Errors

The best way to learn to deal with errors is to make a lot of them. That way, the next time you encounter a similar error, you’ll have some experience solving it.

Run the following code in the console; don’t add it to the report script.

Run in the console
library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ tibble  3.1.8     ✔ forcats 0.5.2
✔ stringr 1.4.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ faux::%||%()             masks purrr::%||%()
✖ dplyr::filter()          masks stats::filter()
✖ kableExtra::group_rows() masks dplyr::group_rows()
✖ rvest::guess_encoding()  masks readr::guess_encoding()
✖ dplyr::lag()             masks stats::lag()

Now, make a code chunk somewhere in your report like this and run it interactively (not by knitting). It should create a new table called droids with 6 rows.

droids <- starwars %>% filter(species == "Droid")

Now try to knit the report. Because you didn’t load the tidyverse package bundle in the script, you’ll get an error about not being able to find the function %>% (you’ll learn about the pipe in Section 5.3.2). When you knit, any objects in your global environment or packages that you’ve loaded are unavailable and the script only has access to objects it creates and packages it loads.

Add library(tidyverse) to the setup chunk and knit to confirm this works.

F.3.1 Could not find function

title <- pasteO("Lavendar", "Haze")
Error in pasteO("Lavendar", "Haze"): could not find function "pasteO"

When you get the message could not find function "func", usually one of two things has happened: you haven’t loaded the package that the function is from or you’ve made a typo in the function name. In this example, the function is actually paste0() with a zero.

F.3.2 Unused argument

rnorm(N = 10)
Error in rnorm(N = 10): unused argument (N = 10)

When you get the error “unused argument”, it usually means either that you’ve made a typo in an argument name, or the function doesn’t have that argument. Remember that argument, like functions and objects, are case-sensitive. Check the arguments with tab-autocomplete or checking the help for that function.

F.3.3 Non-numeric argument to binary operator

1 + "A"
Error in 1 + "A": non-numeric argument to binary operator

When you try to apply mathematical operations to objects that aren’t numbers, you get this error. You might see this from a function that internally applies these operators; it just means that the person who wrote the function didn’t specifically check that the arguments you input were numeric and write a more specific error message, they just used what you provided and relied on the error messages from the binary operators. Either way, to solve this you need to figure out what should be numeric, but isn’t.

F.3.4 Tibble columns must have compatible sizes

tibble::tibble(
  x = 1:2,
  y = 1:3
)
Error:
! Tibble columns must have compatible sizes.
• Size 2: Existing data.
• Size 3: Column `y`.
ℹ Only values of size one are recycled.

This error occurs when you’re creating a table using tibble() and the columns have different lengths. You can set a column to a single value (i.e., a vector with length 1) and it will be “recycled” for every row, but you can’t give two columns values with different lengths if their lengths are greater than 1.

The same problem occurs if the function you’re using adds columns to a tibble. The tidyverse error messages are generall very useful in this case.

mtcars3 <- mutate(mtcars, newcol = 1:3)
Error in `mutate()`:
! Problem while computing `newcol = 1:3`.
✖ `newcol` must be size 32 or 1, not 3.

F.3.5 Arguments imply differing number of rows

data.frame(
  x = 1:2,
  y = 1:3
)
Error in data.frame(x = 1:2, y = 1:3): arguments imply differing number of rows: 2, 3

A similar problem occurs if you’re using the base R function data.frame() (or the function you’re using does). The error message is different, but it’s the same problem. You will also see a related error message if you use base R techniques to add a column with a different length to the data frame.

mtcars$newcol <- 1:3
Error in `$<-.data.frame`(`*tmp*`, newcol, value = 1:3): replacement has 3 rows, data has 32

F.4 Debugging methods

F.4.1 Restart and rerun

It’s very useful to be able to run code interactively, but this can sometimes lead to confusion about what objects are available in your code. You might have made a data table called profits, and then decided to edit the code to make it slightly differently. If you forgot to re-run the code, you’ll be using the old table in your interactive code, but the new table when you knit.

Restart R (under the Session menu) and run the code in order up to the chunk where you’re having a problem. You can use the Run menu in the upper right of the source pane to run all chunks above your cursor position.

F.4.2 Comment out

A useful method of debugging a tricky error is commenting out parts of your code and re-running the code to figure out exactly which code is causing the problem. Try

dat <- starwars %>%
  select(name, height, mass, species) %>%
  filter(Species == "Droid") %>%
  select(-species) %>%
  filter(mass < 100)
Error in `filter()`:
! Problem while computing `..1 = Species == "Droid"`.
Caused by error in `mask$eval_all_filter()`:
! object 'Species' not found

Imagine the error message was a bit less helpful. You can try running the code line by line. Either select just the code you want to run, or comment out the code you don’t want to run. Remember to also comment out linking functions at the end of lines, like the pipe (%>%) or the ggplot plus (+).

dat <- starwars #%>%
  # select(name, height, mass, species) %>%
  # filter(Species == "Droid") %>%
  # select(-species) %>%
  # filter(mass < 100)
Tip

You can comment out multiple lines by selecting them with your cursor and choosing Code > Comment/Uncomment Lines (or using the keyboard shortcut).

Select more code or delete the comments until you locate the error.

dat <- starwars %>%
  select(name, height, mass, species) %>%
  filter(Species == "Droid") #%>%
Error in `filter()`:
! Problem while computing `..1 = Species == "Droid"`.
Caused by error in `mask$eval_all_filter()`:
! object 'Species' not found
  # select(-species) %>%
  # filter(mass < 100)

F.4.3 Google the error

Many error messages seem incomprehensible. Googling this message can often lead you to solutions. Take the famous example of “object of type ‘closure’ is not subsettable”.

data$x <- 1
Error in data$x <- 1: object of type 'closure' is not subsettable

A Google search will show several sources explaining this confounding message and how to fix it. Although you may also find Jenny Bryan’s famous talk of the same name, which is an excellent discussion of troubleshooting in R.

Note

An “object of type ‘closure’” is coding jargon for a function (like the type of 1 is numeric or the type of "A" is character). And “subsetting” is accessing part of a table using $ or square brackets. Here, it means that data isn’t a table, but actually a function, so you can’t add a column to it.

F.4.4 Reproducible examples

You might see people in coding forums like StackOverflow asking for a “reprex”, or a reproducible example. This is the smallest, completely self-contained example of your problem or question.

For example, you may have a question about how to figure out how to select rows that contain the value “test” in a certain column, but it isn’t working. It’s clearer if you can provide a concrete example, but you don’t want to have to type out the whole table you’re using or all the code that got you to this point in your script.

You can include a very small table with just the basics or a smaller version of your problem. Make comments at each step about what you expect and what you actually got.

Which version is easier for you to figure out the solution?

# this doesn't work
no_test_data <- data |>
  filter(!str_detect(type, "test"))

… OR …

library(tidyverse)

# with a minimal example table
data <- tribble(
  ~id, ~type, ~x,
  1, "test", 12,
  2, "testosterone", 15,
  3, "estrogen", 10
)

# this should keep IDs 2 and 3, but removes ID 2
no_test_data <- data |>
  filter(!str_detect(type, "test"))

# expected to be true
all(no_test_data$type == c("testosterone", "estrogen"))

One of the big benefits to creating a reprex is that you often solve your own problem while you’re trying to break it down to explain to someone else.

If you really want to go down the rabbit hole, you can create a reproducible example using the reprex package from tidyverse.