4 Detecting "runs" in a sequence
- Dale Barr (October 30, 2019)
Let's say you have a table like below, and you want to find the start and end frames where you have a run of Z amidst a, b, c, d. Here is code that sets up this kind of situation. Don't worry if you don't understand this code; just run it to create the example data in runsdata, and have a look at that table.
library("tidyverse")
create_run_vec <- function() {
  ## create a random string of letters with two runs
  c(rep(sample(letters[1:4]), sample(2:4, 4, TRUE)),
               rep("Z", 3),
               rep(sample(letters[1:4]), sample(2:4, 4, TRUE)),
               rep("Z", 3),
               rep(sample(letters[1:4], 2), sample(2:4, 2, TRUE)))
}
## 5 subjects, 3 trials each
runsdata <- tibble(
  subject = rep(1:5, each = 3),
  trial = rep(1:3, 5),
  stimulus = rerun(15, create_run_vec())) %>%
  unnest(stimulus) %>%
  group_by(subject, trial) %>%
  ungroup() %>%
  select(subject, trial, stimulus)Let's say you want to find the start and stop frames where Z appears in stimulus, and do this independently for each combination of subject and trial. Here's how stimulus looks for subject 1 and trial 1.
##  [1] "b" "b" "b" "d" "d" "d" "c" "c" "a" "a" "Z" "Z" "Z" "d" "d" "d" "a" "a" "a"
## [20] "a" "c" "c" "c" "b" "b" "Z" "Z" "Z" "c" "c" "c" "c" "b" "b" "b"So here you can see that the first run of Zs is from frame 11 to 13, 28 and the second is from 26 to 28. We want to write a function that processes the data for each trial and results in a table like this:
| subject | trial | run | start_frame | end_frame | 
|---|---|---|---|---|
| 1 | 1 | 1 | 11 | 13 | 
| 1 | 1 | 2 | 26 | 28 | 
The first thing to do is to add a logical vector to your tibble whose value is TRUE when the target value (e.g., Z) is present in the sequence, false otherwise.
runsdata_tgt <- runsdata %>%
  mutate(is_target = (stimulus == "Z"))
head(runsdata_tgt)| subject | trial | stimulus | is_target | 
|---|---|---|---|
| 1 | 1 | b | FALSE | 
| 1 | 1 | b | FALSE | 
| 1 | 1 | b | FALSE | 
| 1 | 1 | d | FALSE | 
| 1 | 1 | d | FALSE | 
| 1 | 1 | d | FALSE | 
We want to iterate over subjects and trials. We'll start by creating a tibble with columns is_target nested into a column called subtbl.
runs_nest <- runsdata_tgt %>%
  select(-stimulus) %>% # don't need it anymore
  nest(subtbl = c(is_target))We want to iterate over the little subtables stored within subtbl in each row of the table, passing the table to a function that will find the runs and return another table, which we'll store in new column. Let's write a function to detect the runs. That function will need the function rle() (Run-Length Encoding) from base R. We'll run that on the logical vector we created (is_target). Before creating the function, let's see what rle() does on the values in is_target for subject 1, trial 1.
##  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE
## [13]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [25] FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## Run Length Encoding
##   lengths: int [1:5] 10 3 12 3 7
##   values : logi [1:5] FALSE TRUE FALSE TRUE FALSEIf that doesn't make sense, look at the help for rle() (type ?rle in the console). Now we're ready to write our function, detect_runs().
detect_runs <- function(x) {  
  if (!is.logical(x[[1]])) stop("'x' must be a tibble whose first column is of type 'logical'")
  runs <- rle(x[[1]])
  run_start_fr <- c(1L, cumsum(runs$lengths[-length(runs$lengths)]) + 1L)
  run_end_fr <- run_start_fr + (runs$lengths - 1L)
  
  tgt_start <- run_start_fr[runs$values]
  tgt_end <- run_end_fr[runs$value]
  tibble(run = seq_along(tgt_start),
         start_fr = tgt_start,
         end_fr = tgt_end)
}We can test the function on s1t1 just to make sure it works.
detect_runs(tibble(lvec = s1t1))| run | start_fr | end_fr | 
|---|---|---|
| 1 | 11 | 13 | 
| 2 | 26 | 28 | 
OK, now we're ready to run the function.
result <- runs_nest %>%
  mutate(runstbl = map(subtbl, detect_runs))
head(result)| subject | trial | subtbl | runstbl | 
|---|---|---|---|
| 1 | 1 | FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE | 1, 2, 11, 26, 13, 28 | 
| 1 | 2 | FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE | 1, 2, 15, 30, 17, 32 | 
| 1 | 3 | FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE | 1, 2, 12, 29, 14, 31 | 
| 2 | 1 | FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE | 1, 2, 12, 29, 14, 31 | 
| 2 | 2 | FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE | 1, 2, 15, 32, 17, 34 | 
| 2 | 3 | FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE | 1, 2, 17, 32, 19, 34 | 
Now we just have to unnest and we're done!
data <- result %>%
  select(-subtbl) %>%
  unnest(runstbl)
head(data)| subject | trial | run | start_fr | end_fr | 
|---|---|---|---|---|
| 1 | 1 | 1 | 11 | 13 | 
| 1 | 1 | 2 | 26 | 28 | 
| 1 | 2 | 1 | 15 | 17 | 
| 1 | 2 | 2 | 30 | 32 | 
| 1 | 3 | 1 | 12 | 14 | 
| 1 | 3 | 2 | 29 | 31 |