A word that identifies and stores the value of some data for later use.
Sometimes objects are also called variables. An object in R:
- contains only letters, numbers, full stops, and underscores
- starts with a letter or a full stop and a letter
- distinguishes uppercase and lowercase letters (
rickastleyis not the same as
The following are valid and different objects:
The following are not valid objects:
- song data
All of the data about a single trial or question.
In a tidy dataset, each row contains only one observation.
Each row contains 3 observations:
library(dplyr) library(tidyr) untidy <- data.frame( id = 1:5, score_1 = sample(1:7, 5), score_2 = sample(1:7, 5), score_3 = sample(1:7, 5), rt_1 = rnorm(5, 800, 100) %>% round(), rt_2 = rnorm(5, 800, 100) %>% round(), rt_3 = rnorm(5, 800, 100) %>% round() )
Now each row contains 1 observation:
tidy <- untidy %>% gather(var, val, score_1:rt_3) %>% separate(var, c("var", "trial")) %>% spread(var, val)
A statistical test for which the critical region consists of all values of the test statistic greater or less than a given value.
See p-value for a comparison of one-tailed and two-tailed tests.
A symbol that performs some mathematical or comparative process.
Arithmetic operators in R
||Addition: adds two numbers||
||Subtraction: subtracts the second number from the first (
||Multiplication: multiplies two numbers||
||Division: divides the first number by the second||
||Modulus: returns the remainder after dividing the first number by the second||
||Exponent: raises the first number to the power of the second||
Relational operators in R
||Not equal to||
||Greater than or equal to||
||Less than or equal to||
Logical operators in R
||AND (compares each element of vectors)||
||OR (compares each element of vectors)||
||AND (only compares the first element of vectors)||
||OR (only compares the first element of vectors)||
Discrete variables that have an inherent order, such as level of education or dislike/like.
Ordinal variables are not necessarily evenly spaced. That is, there may be a bigger (or smaller) difference between any two consecutive items. E.g., the first and second element on an ordinal variable may be further apart than the second and third; in other words, you can assume 3 is higher than 2 on a likert scale, but not by how much, and you cannot assume that 2 is just as far away from 1 as it is from 3. Therefore, think carefully before averaging ordinal values, since the average of 2 and 4 is not necessarily equal to 3.
16.6 outer join
A mutating join that lets you join up rows in two tables while keeping all of the information from both tables (full_join)
The term "outer join" is more commonly used in SQL. See full_join for the R version.
A data point that is extremely distant from most of the other data points
Outliers can be clear errors (e.g., a value of 1.56 cm for human height), fully random extreme values (e.g., 0.27% of values from a normal distribution are expected to be more than 3 SD from the mean), or reflect potential moderators (e.g., reaction times when paying attention versus being distracted).
The unthinking "rule" to label all data points more than 3 SD from the mean as outliers is not considered to be a good way to deal with outliers. The paper below contains useful suggestions.
Leys, C., Delacre, M., Mora, Y. L., Lakens, D., & Ley, C. (2019). How to Classify, Detect, and Manage Univariate and Multivariate Outliers, With Emphasis on Pre-Registration. International Review of Social Psychology, 32(1), 5. DOI: 10.5334/irsp.289