15 N
15.1 NA
A missing value that is "Not Available"
You can use NA
to represent missing values in a vector. Use the function is.na()
to check if values are missing.
If the results of a calculation like mean()
or sd()
is NA
, this usually means that you have some missing values in your vector. You can remove NA
values using na.rm = TRUE
in many functions.
Dealing with missing values when calculating correlations is a little trickier.
dat <- tribble(
~x, ~y, ~z,
1, 3, NA, # x-y included when p.c.o
2, 1, 4,
3, 5, 3,
4, 1, 2,
NA, 5, 1 # y-z included when p.c.o
)
# uses only rows 2:4 for all correlations
cor(dat, use = "complete.obs")
#> x y z
#> x 1 0 -1
#> y 0 1 0
#> z -1 0 1
# uses rows 1:4 for x-y, 2:5 for y-z, and 2:4 for x-z
cor(dat, use = "pairwise.complete.obs")
#> x y z
#> x 1.00000 -0.1348400 -1.0000000
#> y -0.13484 1.0000000 -0.4472136
#> z -1.00000 -0.4472136 1.0000000
You can filter a table down to only rows with no NA values using na.omit()
.
complete_dat <- na.omit(dat)
x | y | z |
---|---|---|
2 | 1 | 4 |
3 | 5 | 3 |
4 | 1 | 2 |
15.2 NaN
An impossible number that is "Not a number"
In R
impossible numbers are represented with the symbol NaN
. Use the function is.nan()
to check if values are impossible numbers.
value <- 0/0
value
#> [1] NaN
is.nan(value)
#> [1] TRUE
15.4 nominal
Categorical variables that don't have an inherent order, such as types of animal.
15.5 normal distribution
A symmetric distribution of data where values near the centre are most probable.
A normal distribution is characterised by its mean and standard deviation. You can sample numbers from a simulated normal distribution with the function rnorm()
.
# sample 1 million numbers from a normal distribution with
# a mean of 0 and a standard deviation of 1
x <- rnorm(1000000, mean = 0, sd = 1)
About 68% of the values are within 1 SD of the mean.
# proportion between -1 and 1
mean(x > -1 & x < 1)
#> [1] 0.682617
About 95% of the values are within 2 SDs of the mean.
# proportion between -2 and 2
mean(x > -2 & x < 2)
#> [1] 0.954465
15.6 null effect
An outcome that does not show an otherwise expected effect.
A null effect could be a difference of 0 between two groups, or a chance value, such as 50% in a two-alternative forced choice task.
15.7 null hypothesis
The hypothesis that an observed difference between groups or from a specific value is due to chance alone.
The null hypothesis is also commonly referred to as H0. This is contrasted with H1, the alternate hypothesis in a null hypothesis significance testing (NHST) framework.
15.8 numeric
A data type representing a real decimal number or integer.
The integer and double data types are numeric.
You can check if a variable is numeric using the function is.numeric
and you can convert a variable to its numeric representation using the function as.numeric
.
is.numeric(2.4)
#> [1] TRUE
is.numeric(2L)
#> [1] TRUE
# complex numbers are not numeric
is.numeric(2i)
#> [1] FALSE
is.numeric("A")
#> [1] FALSE
# numbers represented as strings are not numeric
is.numeric("3")
#> [1] FALSE
as.numeric(2.4)
#> [1] 2.4
as.numeric(2L)
#> [1] 2
# the imaginary part of complex numbers is discarded when converting to numeric
as.numeric(3+2i)
#> Warning: imaginary parts discarded in coercion
#> [1] 3
# strings that do not represent numbers are converted to NA
as.numeric("A")
#> Warning: NAs introduced by coercion
#> [1] NA
# numbers represented as strings can be convertd to their numeric version
as.numeric("3")
#> [1] 3