13 Multiple Linear Regression
Let's say we want to make a prediction about the following four people who have these scores on the five OCEAN scales:
Openness | Conscientousness | Extraversion | Agreeableness | Neuroticism | |
---|---|---|---|---|---|
Participant 1 | 9 | 8 | 6 | 6 | 5 |
Participant 2 | 8 | 9 | 5 | 5 | 6 |
Participant 3 | 5 | 6 | 8 | 8 | 9 |
Participant 4 | 5 | 6 | 9 | 9 | 8 |
And let's say we have this output from the multiple linear regression model we created using all of the five OCEAN scales as predictors:
Unstandardised Coefficient | t-value | p-value | |
---|---|---|---|
Intercept | 197.65 | 22.43 | .0001 |
Openness | -5.28 | -7.68 | .0001 |
Conscientiousness | -5.68 | -7.44 | .0001 |
Extraversion | -0.4 | -0.65 | .522 |
Agreeableness | 0.64 | 0.69 | .495 |
Neuroticism | 1.89 | 2.56 | .014 |
--- | --- | --- | --- |
Multiple R2 | .7619 | Adjusted R2 | .7348 |
F-statistic | 28.16 on 5 and 44 DF | p-value | .001 |
13.1 Making a Prediction
Now based on the information we have gained above we can actually use that to make predictions about a person we have not measured based on the formula:
ˆY=b0+b1X1+b2X2+b3X3+b4X4+b5X5
Well technically it is
ˆY=b0+b1X1+b2X2+b3X3+b4X4+b5X5+error
But we will disregard the error for now. Also you might have noticed the small hat about the Y making it ˆY (pronounced Y-hat). This means we are making a prediction of Y (ˆY) as opposed to an actually measured (i.e. observed) value (Y).
And to break that formula down a bit we can say:
- b0 is the intercept of the model. In this case b0 = 197.65
- b1X1, for example, is read at the non-standardised coeffecient of the first predictor b1 multiplied by the measured value of the first predictor X1
- b1, b2, b3, b4, b5 are the non-standardised coefficient values of the different predictors in the model. There is one non-standardised coefficient for each predictor. For example, let's say Openness is our first predictor and as such b1 = -5.28
- X1, X2, X3, X4, X5 are the measured values of the different predictors for a participant. For example, for Participant 1, again assuming Openness is our first predictor, then X1 = 9. If Conscientousness is our second predictor, Extraversion our third predictor, Agreeableness our fourth predictor, and Neuroticism our fifth predictor, then X2 = 8, X3 = 6, X4 = 6 and X5 = 5.
Then, using the information above, we know can start to fill in the information for Participant 1 as follows:
ˆY=197.65+(−5.28×9)+(−5.68×8)+(−0.4×6)+(0.64×6)+(1.89×5)
And if we start to work that through, dealing with the multiplications first, we see:
ˆY=197.65+(−47.52)+(−45.44)+(−2.4)+(3.84)+(9.45)
Which then becomes:
ˆY=115.58
Giving a predicted value of ˆY = 115.58, to two decimal places.
Likewise for Participant 2 we would have:
ˆY=197.65+(−5.28×8)+(−5.68×9)+(−0.4×5)+(0.64×5)+(1.89×6)
And if we start to work that through, dealing with the multiplications first, we see:
ˆY=197.65+(−42.24)+(−51.12)+(−2)+(3.2)+(11.34)
Which then becomes:
ˆY=116.83
Giving a predicted value of ˆY = 116.83, to two decimal places.
Likewise for Participant 3 we would have:
ˆY=197.65+(−5.28×5)+(−5.68×6)+(−0.4×8)+(0.64×8)+(1.89×9)
And if we start to work that through, dealing with the multiplications first, we see:
ˆY=197.65+(−26.4)+(−34.08)+(−3.2)+(5.12)+(17.01)
Which then becomes:
ˆY=156.1
Giving a predicted value of ˆY = 156.1, to two decimal places.
And finally for Participant 4 we would have:
ˆY=197.65+(−5.28×5)+(−5.68×6)+(−0.4×9)+(0.64×9)+(1.89×8)
And if we start to work that through, dealing with the multiplications first, we see:
ˆY=197.65+(−26.4)+(−34.08)+(−3.6)+(5.76)+(15.12)
Which then becomes:
ˆY=154.45
Giving a predicted value of ˆY = 154.45, to two decimal places.
And so in summary, based on our above model, and the above measured values, we would predict the following values:
- Participant 1, ˆY = 115.58
- Participant 2, ˆY = 116.83
- Participant 3, ˆY = 156.1
- Participant 4, ˆY = 154.45