Formative Exercise 04: Data

Load built-in datasets

List the datasets in dplyr.

Load the built-in dataset starwars and use glimpse() to see an overview.

Convert the built-in base R mtcars dataset to a tibble (you will need to find the function for this; it isn’t in the chapter), and store it in the object mt.

mt <- NULL

Import data from CSV and Excel files

Using the data directory created by reprores::getdata() (or download the zip file, read “disgust_scores.csv” into a table.

disgust <- NULL

Override the default column specifications to skip the id column.

disgust_skip <- NULL

How many rows and columns are in the disgust dataset?

disgust_rows <- NULL
disgust_cols <- NULL

Load the data in “data/stroop.csv” as stroop1 and “data/stroop.xlsx” as stroop2.

stroop1 <- NULL
stroop2 <- NULL

Use glimpse() to figure out the difference between the two data tables and fix the problem.

stroop2b <- NULL

Create a data table

Create a tibble with the columns name, age, and country of origin for 2 people you know.

people <- NULL

Create a tibble that has the structure of the table below, using the minimum typing possible. (Hint: rep()). Store it in the variable my_tbl.

ID	A	B	C
1	A1	B1	C1
2	A1	B2	C1
3	A1	B1	C1
4	A1	B2	C1
5	A2	B1	C1
6	A2	B2	C1
7	A2	B1	C1
8	A2	B2	C1

my_tbl <- NULL

Understand the use the basic data types

Set the following objects to the number 1 with the indicated data type:

one_int (integer)
one_dbl (double)
one_chr (character)

one_int <- NULL
one_dbl <- NULL
one_chr <- NULL

Set the objects T_log, T_chr, T_int and T_dbl to logical, character, integer and double values that will all be equal to TRUE.

T_log <- NULL
T_chr <- NULL
T_int <- NULL
T_dbl <- NULL

Check your answers with this code:

# these should all evaluate to TRUE
tests <- list(
  T_log_is_TRUE = T_log == TRUE,
  T_chr_is_TRUE = T_chr == TRUE,
  T_int_is_TRUE = T_int == TRUE,
  T_dbl_is_TRUE = T_dbl == TRUE,
  T_log_is_log = is.logical(T_log),
  T_chr_is_chr = is.character(T_chr),
  T_int_is_int = is.integer(T_int),
  T_dbl_is_dbl = is.double(T_dbl)
)

str(tests) # this shows a condensed version of the list

## List of 8
##  $ T_log_is_TRUE: logi(0) 
##  $ T_chr_is_TRUE: logi(0) 
##  $ T_int_is_TRUE: logi(0) 
##  $ T_dbl_is_TRUE: logi(0) 
##  $ T_log_is_log : logi FALSE
##  $ T_chr_is_chr : logi FALSE
##  $ T_int_is_int : logi FALSE
##  $ T_dbl_is_dbl : logi FALSE

Understand and use the basic container types

Create a vector of the numbers 3, 6, and 9.

threes <- NULL

The built-in vector letters contains the letters of the English alphabet. Use an indexing vector of integers to extract the letters that spell ‘cat’.

cat <- NULL

The function colors() returns all of the color names that R is aware of. What is the length of the vector returned by this function? (Use code to find the answer.)

col_length <- NULL

Create a named list called col_types where the name is each column in the built-in dataset table1 and the value is the column data type (e.g., “double”, “character”, “integer”, “logical”).

col_types <- NULL

Use vectorized operations

Set the object x to the integers 1 to 100. Use vectorised operations to set y to x squared. Use plot(x, y) to visualise the relationship between these two numbers.

x <- NULL
y <- NULL

Set t to the numbers 0 to 100 in increments of 0.1. Set x to the sine of t and y to the cosine of t (you will need to find the functions for sine and cosine). Plot x against y.

t <- NULL
x <- NULL
y <- NULL

The function call runif(n, min, max) will draw n numbers from a uniform distribution from min to max. If you set n to 10000, min to 0 and max to 1, this simulates the p-values that you would get from 10000 experiments where the null hypothesis is true. Create the following objects:

pvals: 10000 simulated p-values using runif()
is_sig: a logical vector that is TRUE if the corresponding element of pvals is less than .05, FALSE otherwise
sig_vals: a vector of just the significant p-values
prop_sig: the proportion of those p-values that were significant

set.seed(8675309) # ensures you get the same random numbers each time you run this code chunk

pvals    <- NULL
is_sig   <- NULL
sig_vals <- NULL
prop_sig <- NULL