# 23 V

## 23.1 value

A single number or piece of data.

In a tidy dataset, each cell contains only one value.

In this example, the age column contains 3 values: mean, minimum and maximum.

library(dplyr)
library(tidyr)

untidy <- data.frame(
group = c("A", "B"),
age = c("20.4 [18-25]", "19.9 [18-24]")
)
group age
A 20.4 [18-25]
B 19.9 [18-24]
tidy <- untidy %>%
separate(age, c("mean", "min", "max"),
sep = "( \$|-|\$)",
extra = "drop")
group mean min max
A 20.4 18 25
B 19.9 18 24

## 23.2 variable

(coding): A word that identifies and stores the value of some data for later use; (stats): An attribute or characteristic of an observation that you can measure, count, or describe

Some also refer to columns in data frames as variables.

## 23.3 variable (coding)

A word that identifies and stores the value of some data for later use.

Variables in R are usually referred to as objects. See the definition for object.

## 23.4 variable (stats)

An attribute or characteristic of an observation that you can measure, count, or describe.

For example, age, name, and weight are variables you could collect about a person.

## 23.5 variance

A descriptive statistic for how much an average data point varies from the mean.

data <- c(1,2,3,4,5,6,7,8,9)
var(data)
#>  7.5

Variance is equal to standard deviation squared.

sd(data)^2
#>  7.5

You calculate variance by summing the squared differences between each data point and their mean (sum(diff^2)) and dividing this by the number of data points minus 1 ((n-1))

m <- mean(data)   # calculate the mean
diff <- data - m  # difference between each data point and the mean
n <- length(data) # number of data points
sum(diff^2)/(n-1) # variance
#>  7.5

## 23.6 vector

A type of data structure that collects values with the same data type, like T/F values, numbers, or strings.

The following things are examples of vectors:

# use the c() function to make a vector of strings or numbers
ingredients <- c("vodka", "gin", "rum", "tequila", "triple sec",
"orange juice", "coke", "sour mix")

fun_to_play_at <- c(25, 13, 3, 1)

# the colon between two integers gives you all the numbers from the first to the last integer
likert <- 1:7
More...

Elements are always the same data type. If you try to combine values with different data types, they are coerced into a common data type. Use a list to combine values with different types without coercion.

The variable letters is a built-in vector with the Latin letters in order. You can select part of a vector by putting the numeric location (or index) of what element you want inside of square brackets after the vector. You can even put a vector of numbers inside the square brackets to select several elements.

letters
#>  "z"
letters[1:5]
#>  "a" "b" "c" "d" "e"
letters[fun_to_play_at]
#>  "y" "m" "c" "a"

See Ch 20 of R for Data Science for a thorough explanation of vectors.

## 23.7 vectorized

An operator or function that acts on each element in a vector

# Add 10 to each element in the vector a
a <- 1:5
a + 10
#>  11 12 13 14 15

# paste "!" to the end of each element in the vector b
b <- c("Hey", "You")
paste0(b, "!")
#>  "Hey!" "You!"

## 23.8 version control

A way to save a record of changes to your files.

Git is one type of version control that is most commonly used with RStudio. GitHub is a cloud-based service for storing and sharing your version controlled files.

Set up git and github with RStudio.