Formative Exercise 08: GLM

The `personality_scores` dataset

Load the dataset reprores::personality_scores.

data("personality_scores", package = "reprores")

Question 1

Use ggplot2 to visualise the relationship between extraversion (Ex) on the horizontal axis and neuroticism (Ne) on the vertical axis.

ggplot(personality_scores, aes(x = Ex, y = Ne)) +
  geom_density_2d_filled(show.legend = FALSE, alpha = 0.5) +
  geom_smooth(method = lm, formula = y~x)

## Error in FUN(X[[i]], ...): object 'Ex' not found

Question 2

Run a regression model that predicts neuroticism from extraversion, and store the model object in the variable personality_mod. End the block by printing out the summary of the model.

personality_mod <- lm(Ne ~ Ex, data = personality_scores)

## Error in eval(predvars, data, env): object 'Ne' not found

summary(personality_mod) #print out the model summary

## Error in summary(personality_mod): object 'personality_mod' not found

Question 3

Make a histogram of the residuals of the model using ggplot2.

residuals <- residuals(personality_mod)

## Error in residuals(personality_mod): object 'personality_mod' not found

ggplot() +
  geom_histogram(aes(residuals), color = "black", binwidth = 0.25)

## Error: Aesthetics must be valid data columns. Problematic aesthetic(s): x = residuals. 
## Did you mistype the name of a data column or forget to add after_stat()?

Question 4

Write code to predict the neuroticism score for the minimum, mean, and maximum extraversion scores. Store the vector of predictions in the variable personality_pred.

scores <-  data.frame(
  Ex = c(
    min(personality_scores$Ex, na.rm = TRUE),
    mean(personality_scores$Ex, na.rm = TRUE),
    max(personality_scores$Ex, na.rm = TRUE)
  ),
  # adding the row names makes the output of predict easier to read
  row.names = c("min", "mean", "max")
)

## Warning in min(personality_scores$Ex, na.rm = TRUE): no non-missing arguments to
## min; returning Inf

## Warning in mean.default(personality_scores$Ex, na.rm = TRUE): argument is not
## numeric or logical: returning NA

## Warning in max(personality_scores$Ex, na.rm = TRUE): no non-missing arguments to
## max; returning -Inf

personality_pred <- predict(personality_mod, newdata = scores)

## Error in predict(personality_mod, newdata = scores): object 'personality_mod' not found

personality_pred # print the predicted values

## Error in eval(expr, envir, enclos): object 'personality_pred' not found

Simulating data from the linear model

Question 5

NOTE: You can knit this file to html to see formatted versions of the equations below (which are enclosed in $ characters); alternatively, if you find it easier, you can hover your mouse pointer over the $ in the code equations to see the formatted versions.

Write code to randomly generate 10 Y values from a simple linear regression model with an intercept of 3 and a slope of -7. Recall the form of the linear model:

$Y_i = \beta_0 + \beta_1 X_i + e_i$

The residuals ($e_i$s) are drawn from a normal distribution with mean 0 and variance $\sigma^2 = 4$, and $X$ is the vector of integer values from 1 to 10. Store the 10 observations in the variable Yi below. (NOTE: the standard deviation is the square root of the variance, i.e. $\sigma$; rnorm() takes the standard deviation, not the variance, as its third argument).

X <- 1:10
err <- rnorm(10, sd = 2)
Yi <- 3 - 7 * X + err

Yi # print the values of Yi

##  [1]  -1.964926  -9.243591 -14.011918 -24.223399 -32.112792 -37.240136 -44.785966
##  [8] -54.105806 -61.179889 -66.301258