16 O
16.1 object
A word that identifies and stores the value of some data for later use.
Sometimes objects are also called variables. An object in R:
- contains only letters, numbers, full stops, and underscores
- starts with a letter or a full stop and a letter
- distinguishes uppercase and lowercase letters (
rickastley
is not the same asRickAstley
)
The following are valid and different objects:
- songdata
- SongData
- song_data
- song.data
- .song.data
- never_gonna_give_you_up_never_gonna_let_you_down
The following are not valid objects:
- _song_data
- 1song
- .1song
- song data
- song-data
16.2 observation
All of the data about a single trial or question.
In a tidy dataset, each row contains only one observation.
Each row contains 3 observations:
library(dplyr)
library(tidyr)
untidy <- data.frame(
id = 1:5,
score_1 = sample(1:7, 5),
score_2 = sample(1:7, 5),
score_3 = sample(1:7, 5),
rt_1 = rnorm(5, 800, 100) %>% round(),
rt_2 = rnorm(5, 800, 100) %>% round(),
rt_3 = rnorm(5, 800, 100) %>% round()
)
id | score_1 | score_2 | score_3 | rt_1 | rt_2 | rt_3 |
---|---|---|---|---|---|---|
1 | 6 | 4 | 2 | 679 | 923 | 908 |
2 | 7 | 5 | 6 | 884 | 821 | 701 |
3 | 5 | 1 | 7 | 696 | 951 | 1011 |
4 | 4 | 3 | 5 | 774 | 751 | 805 |
5 | 3 | 6 | 3 | 814 | 1047 | 636 |
Now each row contains 1 observation:
tidy <- untidy %>%
gather(var, val, score_1:rt_3) %>%
separate(var, c("var", "trial")) %>%
spread(var, val)
id | trial | rt | score |
---|---|---|---|
1 | 1 | 679 | 6 |
1 | 2 | 923 | 4 |
1 | 3 | 908 | 2 |
2 | 1 | 884 | 7 |
2 | 2 | 821 | 5 |
2 | 3 | 701 | 6 |
3 | 1 | 696 | 5 |
3 | 2 | 951 | 1 |
3 | 3 | 1011 | 7 |
4 | 1 | 774 | 4 |
4 | 2 | 751 | 3 |
4 | 3 | 805 | 5 |
5 | 1 | 814 | 3 |
5 | 2 | 1047 | 6 |
5 | 3 | 636 | 3 |
16.3 one-tailed
A statistical test for which the critical region consists of all values of the test statistic greater or less than a given value.
See p-value for a comparison of one-tailed and two-tailed tests.
16.4 operator
A symbol that performs some mathematical or comparative process.
Arithmetic operators in R
Operator | Definition | Example |
---|---|---|
+ |
Addition: adds two numbers | 3+2 = 5 |
- |
Subtraction: subtracts the second number from the first (3-2 = 1 ) |
|
* |
Multiplication: multiplies two numbers | 3*2 = 6 |
/ |
Division: divides the first number by the second | 3/2 = 1.5 |
%% |
Modulus: returns the remainder after dividing the first number by the second | 3%%2 = 1 |
^ |
Exponent: raises the first number to the power of the second | 3^2 = 9 |
Relational operators in R
Operator | Definition | Example |
---|---|---|
== |
Equal to |
1 == 1 or "A" == "A"
|
!= |
Not equal to |
1 != 2 or "A" != "B"
|
> |
Greater than |
2 > 1 or "B" > "A"
|
>= |
Greater than or equal to |
2 >= 1 or "B" >= "A"
|
< |
Less than |
1 < 2 or "A" < "B"
|
<= |
Less than or equal to |
1 <= 2 or "A" <= "B"
|
%in% |
Match operator | "A" %in% LETTERS |
Logical operators in R
Operator | Definition | Example |
---|---|---|
& |
AND (compares each element of vectors) | c(T, T, F, F) & c(T, F, T, F) == c(T, F, F, F) |
| |
OR (compares each element of vectors) | c(T, T, F, F) | c(T, F, T, F) == c(T, T, T, F) |
&& |
AND (only compares the first element of vectors) | c(T, F) && c(T, F) == TRUE |
|| |
OR (only compares the first element of vectors) | c(T, F) || c(F, F) == TRUE |
! |
NOT | !TRUE == FALSE |
16.5 ordinal
Discrete variables that have an inherent order, such as level of education or dislike/like.
Ordinal variables are not necessarily evenly spaced. That is, there may be a bigger (or smaller) difference between any two consecutive items. E.g., the first and second element on an ordinal variable may be further apart than the second and third; in other words, you can assume 3 is higher than 2 on a likert scale, but not by how much, and you cannot assume that 2 is just as far away from 1 as it is from 3. Therefore, think carefully before averaging ordinal values, since the average of 2 and 4 is not necessarily equal to 3.
16.6 outer join
A mutating join that lets you join up rows in two tables while keeping all of the information from both tables (full_join)
The term "outer join" is more commonly used in SQL. See full_join for the R version.
16.7 outlier
A data point that is extremely distant from most of the other data points
Outliers can be clear errors (e.g., a value of 1.56 cm for human height), fully random extreme values (e.g., 0.27% of values from a normal distribution are expected to be more than 3 SD from the mean), or reflect potential moderators (e.g., reaction times when paying attention versus being distracted).
The unthinking "rule" to label all data points more than 3 SD from the mean as outliers is not considered to be a good way to deal with outliers. The paper below contains useful suggestions.
Leys, C., Delacre, M., Mora, Y. L., Lakens, D., & Ley, C. (2019). How to Classify, Detect, and Manage Univariate and Multivariate Outliers, With Emphasis on Pre-Registration. International Review of Social Psychology, 32(1), 5. DOI: 10.5334/irsp.289