Processing math: 100%

13 Multiple Linear Regression

Let's say we want to make a prediction about the following four people who have these scores on the five OCEAN scales:

Openness Conscientousness Extraversion Agreeableness Neuroticism
Participant 1 9 8 6 6 5
Participant 2 8 9 5 5 6
Participant 3 5 6 8 8 9
Participant 4 5 6 9 9 8

And let's say we have this output from the multiple linear regression model we created using all of the five OCEAN scales as predictors:

Unstandardised Coefficient t-value p-value
Intercept 197.65 22.43 .0001
Openness -5.28 -7.68 .0001
Conscientiousness -5.68 -7.44 .0001
Extraversion -0.4 -0.65 .522
Agreeableness 0.64 0.69 .495
Neuroticism 1.89 2.56 .014
--- --- --- ---
Multiple R2 .7619 Adjusted R2 .7348
F-statistic 28.16 on 5 and 44 DF p-value .001

13.1 Making a Prediction

Now based on the information we have gained above we can actually use that to make predictions about a person we have not measured based on the formula:

ˆY=b0+b1X1+b2X2+b3X3+b4X4+b5X5

Well technically it is

ˆY=b0+b1X1+b2X2+b3X3+b4X4+b5X5+error

But we will disregard the error for now. Also you might have noticed the small hat about the Y making it ˆY (pronounced Y-hat). This means we are making a prediction of Y (ˆY) as opposed to an actually measured (i.e. observed) value (Y).

And to break that formula down a bit we can say:

  • b0 is the intercept of the model. In this case b0 = 197.65
  • b1X1, for example, is read at the non-standardised coeffecient of the first predictor b1 multiplied by the measured value of the first predictor X1
  • b1, b2, b3, b4, b5 are the non-standardised coefficient values of the different predictors in the model. There is one non-standardised coefficient for each predictor. For example, let's say Openness is our first predictor and as such b1 = -5.28
  • X1, X2, X3, X4, X5 are the measured values of the different predictors for a participant. For example, for Participant 1, again assuming Openness is our first predictor, then X1 = 9. If Conscientousness is our second predictor, Extraversion our third predictor, Agreeableness our fourth predictor, and Neuroticism our fifth predictor, then X2 = 8, X3 = 6, X4 = 6 and X5 = 5.

Then, using the information above, we know can start to fill in the information for Participant 1 as follows:

ˆY=197.65+(5.28×9)+(5.68×8)+(0.4×6)+(0.64×6)+(1.89×5)

And if we start to work that through, dealing with the multiplications first, we see:

ˆY=197.65+(47.52)+(45.44)+(2.4)+(3.84)+(9.45)

Which then becomes:

ˆY=115.58

Giving a predicted value of ˆY = 115.58, to two decimal places.

Likewise for Participant 2 we would have:

ˆY=197.65+(5.28×8)+(5.68×9)+(0.4×5)+(0.64×5)+(1.89×6)

And if we start to work that through, dealing with the multiplications first, we see:

ˆY=197.65+(42.24)+(51.12)+(2)+(3.2)+(11.34)

Which then becomes:

ˆY=116.83

Giving a predicted value of ˆY = 116.83, to two decimal places.

Likewise for Participant 3 we would have:

ˆY=197.65+(5.28×5)+(5.68×6)+(0.4×8)+(0.64×8)+(1.89×9)

And if we start to work that through, dealing with the multiplications first, we see:

ˆY=197.65+(26.4)+(34.08)+(3.2)+(5.12)+(17.01)

Which then becomes:

ˆY=156.1

Giving a predicted value of ˆY = 156.1, to two decimal places.

And finally for Participant 4 we would have:

ˆY=197.65+(5.28×5)+(5.68×6)+(0.4×9)+(0.64×9)+(1.89×8)

And if we start to work that through, dealing with the multiplications first, we see:

ˆY=197.65+(26.4)+(34.08)+(3.6)+(5.76)+(15.12)

Which then becomes:

ˆY=154.45

Giving a predicted value of ˆY = 154.45, to two decimal places.

And so in summary, based on our above model, and the above measured values, we would predict the following values:

  • Participant 1, ˆY = 115.58
  • Participant 2, ˆY = 116.83
  • Participant 3, ˆY = 156.1
  • Participant 4, ˆY = 154.45