16 O

16.1 object

A word that identifies and stores the value of some data for later use.

Sometimes objects are also called variables. An object in R:

contains only letters, numbers, full stops, and underscores
starts with a letter or a full stop and a letter
distinguishes uppercase and lowercase letters (rickastley is not the same as RickAstley)

The following are valid and different objects:

songdata
SongData
song_data
song.data
.song.data
never_gonna_give_you_up_never_gonna_let_you_down

The following are not valid objects:

_song_data
1song
.1song
song data
song-data

16.2 observation

All of the data about a single trial or question.

In a tidy dataset, each row contains only one observation.

Each row contains 3 observations:

library(dplyr)
library(tidyr)

untidy <- data.frame(
  id = 1:5,
  score_1 = sample(1:7, 5),
  score_2 = sample(1:7, 5),
  score_3 = sample(1:7, 5),
  rt_1 = rnorm(5, 800, 100) %>% round(),
  rt_2 = rnorm(5, 800, 100) %>% round(),
  rt_3 = rnorm(5, 800, 100) %>% round()
)

id	score_1	score_2	score_3	rt_1	rt_2	rt_3
1	6	4	2	679	923	908
2	7	5	6	884	821	701
3	5	1	7	696	951	1011
4	4	3	5	774	751	805
5	3	6	3	814	1047	636

Now each row contains 1 observation:

tidy <- untidy %>%
  gather(var, val, score_1:rt_3) %>%
  separate(var, c("var", "trial")) %>%
  spread(var, val)

id	trial	rt	score
1	1	679	6
1	2	923	4
1	3	908	2
2	1	884	7
2	2	821	5
2	3	701	6
3	1	696	5
3	2	951	1
3	3	1011	7
4	1	774	4
4	2	751	3
4	3	805	5
5	1	814	3
5	2	1047	6
5	3	636	3

16.3 one-tailed

A statistical test for which the critical region consists of all values of the test statistic greater or less than a given value.

See p-value for a comparison of one-tailed and two-tailed tests.

16.4 operator

A symbol that performs some mathematical or comparative process.

Arithmetic operators in R

Operator	Definition	Example
`+`	Addition: adds two numbers	`3+2 = 5`
`-`	Subtraction: subtracts the second number from the first (`3-2 = 1`)
`*`	Multiplication: multiplies two numbers	`3*2 = 6`
`/`	Division: divides the first number by the second	`3/2 = 1.5`
`%%`	Modulus: returns the remainder after dividing the first number by the second	`3%%2 = 1`
`^`	Exponent: raises the first number to the power of the second	`3^2 = 9`

Relational operators in R

Operator	Definition	Example
`==`	Equal to	`1 == 1` or `"A" == "A"`
`!=`	Not equal to	`1 != 2` or `"A" != "B"`
`>`	Greater than	`2 > 1` or `"B" > "A"`
`>=`	Greater than or equal to	`2 >= 1` or `"B" >= "A"`
`<`	Less than	`1 < 2` or `"A" < "B"`
`<=`	Less than or equal to	`1 <= 2` or `"A" <= "B"`
`%in%`	Match operator	`"A" %in% LETTERS`

Logical operators in R

Operator	Definition	Example
`&`	AND (compares each element of vectors)	`c(T, T, F, F) & c(T, F, T, F) == c(T, F, F, F)`
`\|`	OR (compares each element of vectors)	`c(T, T, F, F) \| c(T, F, T, F) == c(T, T, T, F)`
`&&`	AND (only compares the first element of vectors)	`c(T, F) && c(T, F) == TRUE`
`\|\|`	OR (only compares the first element of vectors)	`c(T, F) \|\| c(F, F) == TRUE`
`!`	NOT	`!TRUE == FALSE`

16.5 ordinal

Discrete variables that have an inherent order, such as level of education or dislike/like.

Ordinal variables are not necessarily evenly spaced. That is, there may be a bigger (or smaller) difference between any two consecutive items. E.g., the first and second element on an ordinal variable may be further apart than the second and third; in other words, you can assume 3 is higher than 2 on a likert scale, but not by how much, and you cannot assume that 2 is just as far away from 1 as it is from 3. Therefore, think carefully before averaging ordinal values, since the average of 2 and 4 is not necessarily equal to 3.

16.6 outer join

A mutating join that lets you join up rows in two tables while keeping all of the information from both tables (full_join)

The term "outer join" is more commonly used in SQL. See full_join for the R version.

16.7 outlier

A data point that is extremely distant from most of the other data points

Outliers can be clear errors (e.g., a value of 1.56 cm for human height), fully random extreme values (e.g., 0.27% of values from a normal distribution are expected to be more than 3 SD from the mean), or reflect potential moderators (e.g., reaction times when paying attention versus being distracted).

The unthinking "rule" to label all data points more than 3 SD from the mean as outliers is not considered to be a good way to deal with outliers. The paper below contains useful suggestions.

Leys, C., Delacre, M., Mora, Y. L., Lakens, D., & Ley, C. (2019). How to Classify, Detect, and Manage Univariate and Multivariate Outliers, With Emphasis on Pre-Registration. International Review of Social Psychology, 32(1), 5. DOI: 10.5334/irsp.289

15 N

17 P