16 O

16.1 object

A word that identifies and stores the value of some data for later use.

Sometimes objects are also called variables. An object in R:

  • contains only letters, numbers, full stops, and underscores
  • starts with a letter or a full stop and a letter
  • distinguishes uppercase and lowercase letters (rickastley is not the same as RickAstley)

The following are valid and different objects:

The following are not valid objects:

  • _song_data
  • 1song
  • .1song
  • song data
  • song-data

16.2 observation

All of the data about a single trial or question.

In a tidy dataset, each row contains only one observation.

Each row contains 3 observations:

library(dplyr)
library(tidyr)

untidy <- data.frame(
  id = 1:5,
  score_1 = sample(1:7, 5),
  score_2 = sample(1:7, 5),
  score_3 = sample(1:7, 5),
  rt_1 = rnorm(5, 800, 100) %>% round(),
  rt_2 = rnorm(5, 800, 100) %>% round(),
  rt_3 = rnorm(5, 800, 100) %>% round()
)
id score_1 score_2 score_3 rt_1 rt_2 rt_3
1 6 4 2 679 923 908
2 7 5 6 884 821 701
3 5 1 7 696 951 1011
4 4 3 5 774 751 805
5 3 6 3 814 1047 636

Now each row contains 1 observation:

tidy <- untidy %>%
  gather(var, val, score_1:rt_3) %>%
  separate(var, c("var", "trial")) %>%
  spread(var, val)
id trial rt score
1 1 679 6
1 2 923 4
1 3 908 2
2 1 884 7
2 2 821 5
2 3 701 6
3 1 696 5
3 2 951 1
3 3 1011 7
4 1 774 4
4 2 751 3
4 3 805 5
5 1 814 3
5 2 1047 6
5 3 636 3

16.3 one-tailed

A statistical test for which the critical region consists of all values of the test statistic greater or less than a given value.

See p-value for a comparison of one-tailed and two-tailed tests.

16.4 operator

A symbol that performs some mathematical or comparative process.

Arithmetic operators in R

Operator Definition Example
+ Addition: adds two numbers 3+2 = 5
- Subtraction: subtracts the second number from the first (3-2 = 1)
* Multiplication: multiplies two numbers 3*2 = 6
/ Division: divides the first number by the second 3/2 = 1.5
%% Modulus: returns the remainder after dividing the first number by the second 3%%2 = 1
^ Exponent: raises the first number to the power of the second 3^2 = 9

Relational operators in R

Operator Definition Example
== Equal to 1 == 1 or "A" == "A"
!= Not equal to 1 != 2 or "A" != "B"
> Greater than 2 > 1 or "B" > "A"
>= Greater than or equal to 2 >= 1 or "B" >= "A"
< Less than 1 < 2 or "A" < "B"
<= Less than or equal to 1 <= 2 or "A" <= "B"
%in% Match operator "A" %in% LETTERS

Logical operators in R

Operator Definition Example
& AND (compares each element of vectors) c(T, T, F, F) & c(T, F, T, F) == c(T, F, F, F)
| OR (compares each element of vectors) c(T, T, F, F) | c(T, F, T, F) == c(T, T, T, F)
&& AND (only compares the first element of vectors) c(T, F) && c(T, F) == TRUE
|| OR (only compares the first element of vectors) c(T, F) || c(F, F) == TRUE
! NOT !TRUE == FALSE

16.5 ordinal

Discrete variables that have an inherent order, such as level of education or dislike/like.

Ordinal variables are not necessarily evenly spaced. That is, there may be a bigger (or smaller) difference between any two consecutive items. E.g., the first and second element on an ordinal variable may be further apart than the second and third; in other words, you can assume 3 is higher than 2 on a likert scale, but not by how much, and you cannot assume that 2 is just as far away from 1 as it is from 3. Therefore, think carefully before averaging ordinal values, since the average of 2 and 4 is not necessarily equal to 3.

16.6 outer join

A mutating join that lets you join up rows in two tables while keeping all of the information from both tables (full_join)

The term "outer join" is more commonly used in SQL. See full_join for the R version.

16.7 outlier

A data point that is extremely distant from most of the other data points

Outliers can be clear errors (e.g., a value of 1.56 cm for human height), fully random extreme values (e.g., 0.27% of values from a normal distribution are expected to be more than 3 SD from the mean), or reflect potential moderators (e.g., reaction times when paying attention versus being distracted).

The unthinking "rule" to label all data points more than 3 SD from the mean as outliers is not considered to be a good way to deal with outliers. The paper below contains useful suggestions.

Leys, C., Delacre, M., Mora, Y. L., Lakens, D., & Ley, C. (2019). How to Classify, Detect, and Manage Univariate and Multivariate Outliers, With Emphasis on Pre-Registration. International Review of Social Psychology, 32(1), 5. DOI: 10.5334/irsp.289