Generate 50 data points from a normal distribution with a mean of 0 and SD of 1.
a <- rnorm(50, 0, 1)
Generate another variable (b
) that is equal to the sum
of a
and another 50 data points from a normal distribution
with a mean of 0.5 and SD of 1.
b <- a + rnorm(50, 0.5, 1)
Run a two-tailed, paired-samples t-test comparing a
and
b
.
t <- t.test(a, b, paired = TRUE) |> print()
##
## Paired t-test
##
## data: a and b
## t = -4.4433, df = 49, p-value = 5.059e-05
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## -0.8573186 -0.3233440
## sample estimates:
## mean difference
## -0.5903313
Use at least 1e4 replications to estimate power accurately.
Calculate power for a two-tailed t-test with an alpha (sig.level) of .05 for detecting a difference between two independent samples of 50 with an effect size of 0.3.
Hint: You can use the sim_t_ind
function from the T-Test
Class Notes.
sim_t_ind <- function(n, m1, sd1, m2, sd2) {
v1 <- rnorm(n, m1, sd1)
v2 <- rnorm(n, m2, sd2)
t_ind <- t.test(v1, v2, paired = FALSE)
return(t_ind$p.value)
}
my_reps <- replicate(1e4, sim_t_ind(50, 0, 1, 0.3, 1))
alpha <- 0.05
power.sim <- mean(my_reps < alpha) |> print()
## [1] 0.319
Compare this to the result of power.t.test
for the same
design.
power.analytic <- power.t.test(n = 50,
delta = 0.3,
sig.level = 0.05,
type = "two.sample")$power
power.analytic # prints the value
## [1] 0.3175171
Modify the sim_t_ind
function to handle different sample
sizes. Use it to calculate the power of the following design:
sim_t_ind <- function(n1, m1, sd1, n2, m2, sd2,
alternative = "two.sided") {
v1 <- rnorm(n1, m1, sd1)
v2 <- rnorm(n2, m2, sd2)
t_ind <- t.test(v1, v2,
alternative = alternative,
paired = FALSE)
return(t_ind$p.value)
}
my_reps <- replicate(1e4, sim_t_ind(20, 10, 4, 30, 13, 4.5))
power6 <- mean(my_reps < 0.05)
power6 # prints the value
## [1] 0.671
Do noisy environments slow down reaction times for a dot-probe task? Calculate power for a one-tailed t-test with an alpha (sig.level) of .005 for detecting a difference of at least 50ms, where participants in the quiet condition have a mean reaction time of 800ms. Assume both groups have 80 participants and both SD = 100.
my_reps <- replicate(1e4, sim_t_ind(80, 800, 100, 80, 850, 100, "less"))
power7 <- mean(my_reps < 0.005)
power7 # prints the value
## [1] 0.7031
The [poisson distribution(https://en.wikipedia.org/wiki/Poisson_distribution) is good for modeling the rate of something, like the number of texts you receive per day. Then you can test hypotheses like you receive more texts on weekends than weekdays. The poisson distribution gets more like a normal distribution when the rate gets higher, so it’s most useful for low-rate events.
Use ggplot
to create a histogram of 1000 random numbers
from a poisson distribution with a lambda
of 4. Values can
only be integers, so set an appropriate binwidth.
lambda <- 4 # lambda sets the mean of the poisson distribution
pois <- rpois(1000, lambda)
ggplot() +
geom_histogram(aes(pois), fill="white", color="black", binwidth = 1)
Demonstrate to yourself that the binomial distribution looks like the normal distribution when the number of trials is greater than 10.
Hint: You can calculate the equivalent mean for the normal
distribution as the number of trials times the probability of success
(binomial_mean <- trials * prob
) and the equivalent SD
as the square root of the mean times one minus probability of success
(binomial_sd <- sqrt(binomial_mean * (1 - prob))
).
n <- 1e5 # use a large n to get good estimates of the distributions
trials <- 50
prob <- 0.8
binomial_mean <- trials * prob
binomial_sd <- sqrt(binomial_mean * (1 - prob))
# sample numbers from binomial and normal distributions
norm_bin_comp <- tibble (
normal = rnorm(n, binomial_mean, binomial_sd),
binomial = rbinom(n, trials, prob)
) |>
gather("distribution", "value", normal:binomial)
# plot the sampled numbers to compare
ggplot(norm_bin_comp, aes(value, color=distribution)) +
geom_freqpoly(binwidth = 1)
Remember, there are many, many ways to do things in R. The important thing is to create your functions step-by-step, checking the accuracy at each step.
Write a function to create a pair of variables of any size with any
specified correlation. Have the function return a tibble with columns
X1
and X2
. Make sure all of the arguments have
a default value.
Hint: modify the code from the Bivariate Normal section from the class notes.
bvn2 <- function(n = 10, rho = 0, m1 = 0, m2 = 0, sd1 = 1, sd2 = 1) {
# n = number of random samples
# rho = population correlation between the two variables
mu <- c(m1, m2) # the means of the samples
stdevs <- c(sd1, sd2) # the SDs of the samples
# correlation matrix
cor_mat <- matrix(c( 1, rho,
rho, 1), 2)
# create the covariance matrix
sigma <- (stdevs %*% t(stdevs)) * cor_mat
# sample from bivariate normal distribution
m <- MASS::mvrnorm(n, mu, sigma)
# convert to a tibble to make it easier to deal with in further steps
colnames(m) <- c("X1", "X2")
as_tibble(m)
}
Use the function for create a table of 10 pairs of values with the default values for other parameters.
dat10 <- bvn2(n = 10) |> print() # prints the table for default arguments
## # A tibble: 10 × 2
## X1 X2
## <dbl> <dbl>
## 1 -0.113 1.82
## 2 -2.07 0.331
## 3 2.91 -0.356
## 4 -0.899 -0.508
## 5 -1.48 -0.609
## 6 -0.352 0.475
## 7 1.88 0.381
## 8 1.02 0.405
## 9 0.426 1.55
## 10 -0.576 0.102
Use faux::rnorm_multi()
to make the same table as
above.
dat_faux <- rnorm_multi(n = 10, vars = 2) |> print()
## X1 X2
## 1 -0.05610811 1.3828794
## 2 -1.32204027 1.3588010
## 3 -0.60711341 0.2402648
## 4 0.08218137 -0.4738538
## 5 1.05465450 -1.8098317
## 6 -0.31603522 -0.5331290
## 7 0.05029671 -0.4963899
## 8 0.52761529 -2.7835103
## 9 2.38606969 0.9769623
## 10 0.92189126 -0.2379080
Use cor.test
to test the Pearson’s product-moment
correlation between two variables generated with either your function,
having an n
of 50 and a rho
of 0.45.
my_vars <- bvn2(50, 0.45)
my_cor <- cor.test(my_vars$X1, my_vars$X2)
Test your function from Question 10 by calculating the correlation
between your two variables many times for a range of values and plotting
the results. Hint: the purrr::map()
functions might be
useful here.
# set up all values you want to test
sims_bvn2 <- crossing(
rep = 1:1000, # how many replications per combination
n = c(10, 100, 1000),
rho = c(-0.5, 0, .5)
) |>
mutate(r = purrr::map2_dbl(n, rho, function(n, rho) {
b <- bvn2(n = n, rho = rho)
cor(b$X1, b$X2)
}),
n = as.factor(n), # factorise for plotting
rho = as.factor(rho) # factorise for plotting
)
ggplot(sims_bvn2, aes(r, color=rho)) +
geom_density() +
facet_wrap(~n, ncol = 1, scales = "free_y", labeller = label_both)
Compare that to the same test and plot for
rnorm_multi()
.
# set up all values you want to test
sims_faux <- crossing(
rep = 1:1000, # how many replications per combination
n = c(10, 100, 1000),
rho = c(-0.5, 0, .5)
) |>
mutate(r = purrr::map2_dbl(n, rho, function(n, rho) {
b <- rnorm_multi(n = n, vars = 2, r = rho)
cor(b$X1, b$X2)
}),
n = as.factor(n), # factorise for plotting
rho = as.factor(rho) # factorise for plotting
)
ggplot(sims_faux, aes(r, color=rho)) +
geom_density() +
facet_wrap(~n, ncol = 1, scales = "free_y", labeller = label_both)
Make a new function that calculates power for cor.test
through simulation.
power.cor.test <- function(reps = 1000, n = 50, rho = 0.5, alpha = 0.05, method = "pearson"){
power <- tibble(
data = rerun(reps, bvn2(n = n, rho = rho))
# alternatively --
#data = rerun(reps, rnorm_multi(n = n, vars = 2, r = rho))
) |>
mutate(power = map(data, function(data) {
cor.test(data$X1, data$X2, method = method) |>
broom::tidy()
})) |>
unnest(power) |>
mutate(sig = p.value < alpha)
tibble(
n = n,
rho = rho,
alpha = alpha,
power = mean(power$sig),
method = method,
reps = reps
)
}
power_cor <- power.cor.test(1000, 30, 0.5, method = "spearman") |> print()
## # A tibble: 1 × 6
## n rho alpha power method reps
## <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
## 1 30 0.5 0.05 0.763 spearman 1000