23 V

23.1 value

A single number or piece of data.

In a tidy dataset, each cell contains only one value.

In this example, the age column contains 3 values: mean, minimum and maximum.

library(dplyr)
library(tidyr)

untidy <- data.frame(
  group = c("A", "B"),
  age = c("20.4 [18-25]", "19.9 [18-24]")
)
group age
A 20.4 [18-25]
B 19.9 [18-24]
tidy <- untidy %>%
  separate(age, c("mean", "min", "max"), 
           sep = "( \\[|-|\\])",
           extra = "drop")
group mean min max
A 20.4 18 25
B 19.9 18 24

23.2 variable

(coding): A word that identifies and stores the value of some data for later use; (stats): An attribute or characteristic of an observation that you can measure, count, or describe

Some also refer to columns in data frames as variables.

23.3 variable (coding)

A word that identifies and stores the value of some data for later use.

Variables in R are usually referred to as objects. See the definition for object.

23.4 variable (stats)

An attribute or characteristic of an observation that you can measure, count, or describe.

For example, age, name, and weight are variables you could collect about a person.

See also:

23.5 variance

A descriptive statistic for how much an average data point varies from the mean.

data <- c(1,2,3,4,5,6,7,8,9)
var(data)
#> [1] 7.5

Variance is equal to standard deviation squared.

sd(data)^2
#> [1] 7.5

You calculate variance by summing the squared differences between each data point and their mean (sum(diff^2)) and dividing this by the number of data points minus 1 ((n-1))

m <- mean(data)   # calculate the mean
diff <- data - m  # difference between each data point and the mean
n <- length(data) # number of data points
sum(diff^2)/(n-1) # variance
#> [1] 7.5

23.6 vector

A type of data structure that collects values with the same data type, like T/F values, numbers, or strings.

The following things are examples of vectors:

# use the c() function to make a vector of strings or numbers
ingredients <- c("vodka", "gin", "rum", "tequila", "triple sec", 
                      "orange juice", "coke", "sour mix")

fun_to_play_at <- c(25, 13, 3, 1)

# the colon between two integers gives you all the numbers from the first to the last integer
likert <- 1:7
More...

Elements are always the same data type. If you try to combine values with different data types, they are coerced into a common data type. Use a list to combine values with different types without coercion.

The variable letters is a built-in vector with the Latin letters in order. You can select part of a vector by putting the numeric location (or index) of what element you want inside of square brackets after the vector. You can even put a vector of numbers inside the square brackets to select several elements.

letters[26]
#> [1] "z"
letters[1:5]
#> [1] "a" "b" "c" "d" "e"
letters[fun_to_play_at]
#> [1] "y" "m" "c" "a"

See Ch 20 of R for Data Science for a thorough explanation of vectors.

23.7 vectorized

An operator or function that acts on each element in a vector

# Add 10 to each element in the vector a
a <- 1:5
a + 10
#> [1] 11 12 13 14 15

# paste "!" to the end of each element in the vector b
b <- c("Hey", "You")
paste0(b, "!")
#> [1] "Hey!" "You!"

23.8 version control

A way to save a record of changes to your files.

Git is one type of version control that is most commonly used with RStudio. GitHub is a cloud-based service for storing and sharing your version controlled files.

Set up git and github with RStudio.