A missing value that is "Not Available"
You can use
NA to represent missing values in a vector. Use the function
is.na() to check if values are missing.
If the results of a calculation like
NA, this usually means that you have some missing values in your vector. You can remove
NA values using
na.rm = TRUE in many functions.
Dealing with missing values when calculating correlations is a little trickier.
dat <- tribble( ~x, ~y, ~z, 1, 3, NA, # x-y included when p.c.o 2, 1, 4, 3, 5, 3, 4, 1, 2, NA, 5, 1 # y-z included when p.c.o ) # uses only rows 2:4 for all correlations cor(dat, use = "complete.obs") #> x y z #> x 1 0 -1 #> y 0 1 0 #> z -1 0 1 # uses rows 1:4 for x-y, 2:5 for y-z, and 2:4 for x-z cor(dat, use = "pairwise.complete.obs") #> x y z #> x 1.00000 -0.1348400 -1.0000000 #> y -0.13484 1.0000000 -0.4472136 #> z -1.00000 -0.4472136 1.0000000
You can filter a table down to only rows with no NA values using
complete_dat <- na.omit(dat)
An impossible number that is "Not a number"
R impossible numbers are represented with the symbol
NaN. Use the function
is.nan() to check if values are impossible numbers.
value <- 0/0 value #>  NaN is.nan(value) #>  TRUE
Categorical variables that don't have an inherent order, such as types of animal.
15.5 normal distribution
A symmetric distribution of data where values near the centre are most probable.
A normal distribution is characterised by its mean and standard deviation. You can sample numbers from a simulated normal distribution with the function
# sample 1 million numbers from a normal distribution with # a mean of 0 and a standard deviation of 1 x <- rnorm(1000000, mean = 0, sd = 1)
About 68% of the values are within 1 SD of the mean.
# proportion between -1 and 1 mean(x > -1 & x < 1) #>  0.682617
About 95% of the values are within 2 SDs of the mean.
# proportion between -2 and 2 mean(x > -2 & x < 2) #>  0.954465
15.6 null effect
An outcome that does not show an otherwise expected effect.
A null effect could be a difference of 0 between two groups, or a chance value, such as 50% in a two-alternative forced choice task.
15.7 null hypothesis
The hypothesis that an observed difference between groups or from a specific value is due to chance alone.
The null hypothesis is also commonly referred to as H0. This is contrasted with H1, the alternate hypothesis in a null hypothesis significance testing (NHST) framework.
A data type representing a real decimal number or integer.
The integer and double data types are numeric.
You can check if a variable is numeric using the function
is.numeric and you can convert a variable to its numeric representation using the function
is.numeric(2.4) #>  TRUE
is.numeric(2L) #>  TRUE
# complex numbers are not numeric is.numeric(2i) #>  FALSE
is.numeric("A") #>  FALSE
# numbers represented as strings are not numeric is.numeric("3") #>  FALSE
as.numeric(2.4) #>  2.4
as.numeric(2L) #>  2
# the imaginary part of complex numbers is discarded when converting to numeric as.numeric(3+2i) #> Warning: imaginary parts discarded in coercion #>  3
# strings that do not represent numbers are converted to NA as.numeric("A") #> Warning: NAs introduced by coercion #>  NA
# numbers represented as strings can be convertd to their numeric version as.numeric("3") #>  3