1 Point Estimates
1.1 A note on Descriptives and Inferentials
In the first part of this book we are going to look at descriptive statistics which we use to summarise and describe data we have collected - normally from a sample of the population of interest. Remember that the population is all the members of a group that we wish to generalise our findings to, and the sample is the subset of that population we have gathered data from. We draw our sample from the population, so to speak. The population is what we determine it to be based on our research question - everyone in Psychology at Glasgow; everyone study Psychology in Scotland; everyone studying Psychology in the world! Most of the time, testing everyone in a population is an unrealistic goal so we take a sample of that population and make inferences from that sample to the population using statistics. And the first step we take is to describe our sample.
Descriptive statistics can be broken down into three kinds:
- point estimates - these tend to be single values, or points, that summarise a set of data. Most common would be ones that summarise the center of the data such as the mean and the median.
- interval estimates - these tend to be a range of values that summarise a set of data. Most common would be ones that summarise the spread of the data such as the variance and the standard deviation.
- visualisations - are figures that help display the point and interval estimates of the data and can take various different forms depending on the data type.
Once we have described the data in our sample we then use inferential statistics to make predictions about, or comparisons between, our data. We infer how the general population would perform based on the sample we have collected and described. We will look at inferential statistics later in this book but first we will start with descriptives statistics, looking at point estimates first, before moving on to interval estimates.
Point Estimates
We will start by looking at three point estimates. They are:
- the Mean
- the Median
- the Mode
1.2 The Mean
The mean is a descriptive statistic that measures the average value of a set of numbers.
The symbol for the mean is generally written as \(\overline{X}\) (pronounced as X-bar) and the formula for the mean is:
\[\overline{X} = \frac{\sum_i^n{x_i}}{n}\]
Which reads as sum (\(\sum\)) all the values from \(i\) (the first value) to \(n\) (the last value) and then divide that number by the total number of values (\(n\)). Let's look at an example.
Say we are interested in how years of experience in driving impacts on some metric of peformance. We recruit 25 participants and ask them how many years they have been driving. Here are their responses:
\[7, 1, 2, 6, 3, 4, 3, 4, 3, 4, 5, 4, 7, 5, 6, 5, 5, 4, 5, 6, 5, 6, 3, 2, 5\]
If we start to fill in the information from above we get:
\[\overline{X} = \frac{7 + 1 + 2+ 6+ 3+\\ 4+ 3+ 4+ 3+ 4+\\ 5+ 4+ 7+ 5+ 6+\\ 5+ 5+ 4+ 5+ 6+\\ 5+ 6+ 3+ 2+ 5}{25}\]
So the first step is to add all the values on the top half of the formula together, which is called the numerator, giving us:
\[\overline{X} =\frac{110}{25}\]
Which can also be read as:
\[\overline{X} = 110/25\]
And if we then divide top of the formula, the numerator, by the bottom half, called the denominator, we would get:
\[\overline{X} = 4.4\]
And so, when using the standard APA write-up format of M = ..., we would write M = 4.4.
1.2.1 Test Yourself - The Mean
Here are 5 sets of data to practice calculating the mean on. Calculate the mean on each one and then answer the questions below to see if you were correct.
- DataSet 1 - \(22, 7, 26, 1, 2, 12, 13, 26, 23, 29\)
- DataSet 2 - \(17, 26, 12, 30, 26, 15, 21, 24, 23, 26\)
- DataSet 3 - \(10, 24, 25, 10, 2, 24, 11, 4, 25, 9\)
- DataSet 4 - \(6, 5, 8, 10, 27, 10, 29, 15, 4, 9\)
- DataSet 5 - \(4, 27, 15, 19, 23, 22, 11, 16, 8, 29\)
- What is the mean of DataSet 1?
- What is the mean of DataSet 2?
- What is the mean of DataSet 3?
- What is the mean of DataSet 4?
- What is the mean of DataSet 5?
1.3 The Median
The median is the next point estimate we will look at and is the middle number in a distribution where half of the values in the data are larger and half are smaller. It is literally the value that divides the data in half.
The standard way of calculating the Median involves two steps:
- Sort all values from lowest to highest (i.e. from \(i\) (the first value) to \(n\) (the last value))
- The median is the value at position \(\frac{(n + 1)}{2}\)
So if we look at our driving data again, the responses were:
\[7, 1, 2, 6, 3, 4, 3, 4, 3, 4, 5, 4, 7, 5, 6, 5, 5, 4, 5, 6, 5, 6, 3, 2, 5\]
And if we sort them from lowest value to highest value we get:
\[1, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7\]
Now we need to figure out what the value would be at the median position of our dataset, where the median position is \(\frac{(n + 1)}{2}\) and we have n = 25 participants.
\[Median\space Position = \frac{(n + 1)}{2} = \frac{(25 + 1)}{2} = \frac{26}{2} = 13\]
Meaning that the Median Position for n = 25 is the 13th position and so the Median is the value at position 13 after we have sorted the data from smallest to largest:
\[1, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7\]
And if we count along the line until we get to the Median Position, we can see that the value in position 13 is 5, meaning that for this data the Median = 5. And so, when using the standard APA write-up format of Mdn = ..., we would write Mdn = 5.
1.3.1 Test Yourself - The Median
Here are 5 sets of data to practice calculating the median on. Calculate the median on each one and then answer the questions below to see if you were correct.
- DataSet 1 - \(7, 8, 10, 10, 7, 6, 9, 6, 7, 8, 10, 6, 10\)
- DataSet 2 - \(5, 4, 3, 1, 5, 4, 3, 3, 5, 1, 3, 3, 5, 4, 2\)
- DataSet 3 - \(18, 18, 17, 39, 39, 18, 15, 15, 18, 15, 33, 17, 18, 18, 39\)
- DataSet 4 - \(3, 4, 6, 4, 6, 6, 2, 2, 3, 6, 6\)
- DataSet 5 - \(20, 20, 10, 12, 10, 14, 12, 20, 14, 14, 14, 13\)
- What is the median of DataSet 1?
- What is the median of DataSet 2?
- What is the median of DataSet 3?
- What is the median of DataSet 4?
- What is the median of DataSet 5?
1.4 The Mode
The mode is the last point estimate we will consider here and is the value or category that appears most often, most frequently, in your data set. There is no formula for the mode and it involves counting how many of each category or value you have and finding the most common one. Here are our values again:
\[7, 1, 2, 6, 3, 4, 3, 4, 3, 4, 5, 4, 7, 5, 6, 5, 5, 4, 5, 6, 5, 6, 3, 2, 5\]
And if we sort them from largest to smallest for easy of reading we get:
\[1, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7\]
And if we start to count all the different values we see that we have:
Years | n |
---|---|
1 | 1 |
2 | 2 |
3 | 4 |
4 | 5 |
5 | 7 |
6 | 4 |
7 | 2 |
Meaning, for example, we have 4 people with three years of driving experience, and 5 people with four years of driving experience. And looking at our table we can see the most common number of years of driving experience is 5 as there are 7 with that number of years experience. We would therefore write Mode = 5
1.4.1 Test Yourself - The Mode
Here are 5 sets of data to practice calculating the mode on. Calculate the mean on each one and then answer the questions below to see if you were correct.
- DataSet 1 - \(5, 4, 3, 1, 5, 4, 3, 3, 5, 1, 3, 3, 5, 4, 2\)
- DataSet 2 - \(7, 8, 10, 10, 7, 6, 9, 6, 7, 8, 10, 6, 10\)
- DataSet 3 - \(18, 18, 17, 39, 39, 18, 15, 15, 18, 15, 33, 17, 18, 18, 39\)
- DataSet 4 - \(20, 20, 10, 12, 10, 14, 12, 20, 14, 14, 14, 13\)
- DataSet 5 - \(3, 4, 6, 4, 6, 6, 2, 2, 3, 6, 6\)
- What is the mode of DataSet 1?
- What is the mode of DataSet 2?
- What is the mode of DataSet 3?
- What is the mode of DataSet 4?
- What is the mode of DataSet 5?
1.5 Section glossary
term | definition |
---|---|
denominator | name for the bottom half of a formula |
descriptive | Statistics that describe an aspect of data (e.g., mean, median, mode, variance, range) |
inferential | Statistics that allow you to make predictions about or comparisons between data (e.g., t-value, F-value, rho) |
interval estimates | a range of values that summarise an aspect of a data set. Examples include the range, variance, standard deviation, standard error and confidence intervals. |
mean | A descriptive statistic that measures the average value of a set of numbers. |
median | The middle number in a distribution where half of the values are larger and half are smaller. |
numerator | name for the top half of a formula |
participant | the word used to describe someone who has taken part in a study. Note that subject is outdated and no longer used. |
point estimates | a single value that summarises an aspect of a data set. Examples include the mean, median, and the mode. |
population | All members of a group that we wish to generalise our findings to. E.g. all students taking Psychology at the University of Glasgow. We draw our testing sample from the population. |
sample | A subset of the population that you wish to make an inference about through your test. |