Visualisation Exercise

Let’s practice your data visualisation skills. First, you should save this file in a folder you have access to. Then, set your working directory to the folder containing this file and the datafile - remember both the .Rmd file and the datafile need to be in the same folder. If you are using the RServer then you will need to upload the .Rmd file and .csv file to the server.

Try knitting this file. The .Rmd file itself will currently knit as there are no errors in the code and this is a good test that you can perform from time to time, and before you submit any assignments in the future, to make sure that there are still no errors in your code. Obviously this doesn’t mean that all your answers are correct; it just means that the code is error free.

There are a number of code chunks already set up. Some of the code chunks will require entering just a number whilst other tasks will require entering code that you have been learning up to this point. Follow the instructions of each task and pay close attention to what is asked. Do only what is asked. Do not change the names of any variables or dataframes given to you. Do not change any of the code chunk rules or names. In this exercise the names are T01 to T11. Do not change these names. Nearly all the tasks will involve replacing NULLs with either a number or code relating to wrangling and/or visualising, e.g.:

From:

number <- NULL

to

number <- 1

or, for data wrangling, from:

mydata <- NULL

to

mydata <- group_by(data, variable) %>% summarise(mean_score = mean(score))

or, for visualisation, from:

ggplot(NULL, aes(NULL, NULL)) + NULL

to

ggplot(my_data, aes(x_axis, y_axis)) + geom_something()

Note that plots are currently showing as grey squares as only the most basic information has been added. Once you add the code into the code chunks, and knit again, they will appear as proper figures.

The dataset we are using for this assignment is from the 2019 World Happiness Survey available through the Open Data Science website Kaggle. This is a very large dataset with a number of variables. We have reduced it slightly to make it easier to work with for this assignment and you will see the data you need in your assignment zip folder, called Happiness2019.csv. If you want to explore the dataset fully then you can do so later through Kaggle by clicking this link.

There are 11 tasks in total to attempt and answer.

Before starting let’s check:

  1. The Happiness2019.csv file is saved into a folder on your computer and you have manually set this folder as your working directory. Manually set the working directory using the Session >> Set Working Directory options. If you are using the RServer you may not have to set your working directory if the .Rmd file and .csv file are uploaded to the same directory.

  2. The .Rmd file is saved in the same folder as the Happiness2019.csv files.

  3. Explore your data a little and become familiar with the variable names using View() or glimpse(). Note any capital letters and full-stops. Type these functions only in the console and not in the exercise .Rmd file. It is imperative that you check the spelling of variables. For example, Social.Support, social.Support, and social.support are all different names. Likewise, Spain and spain are different. Check everything, as it is normally the case that only one option is correct. Being reproducible means giving people the variable names, etc, as they expect them to be. Consistency is important, so be careful.

Let’s Begin

Task 1 - Library

In the T01 code chunk, type in code that will load the tidyverse into your library.

  • hint: library() in your tidyverse in this block
library(tidyverse)

Task 2 - Read in data

In the T02 code chunk, using read_csv, replace the NULL to read in the Happiness2019.csv dataset. Store the data in happy19.

  • hint: Use read_csv() and not read.csv()! You must use read_csv()!
  • hint: Remember the csv file should be in the same folder as the Rmd file. Do not use an absolute path; name just the csv file and not any folders.
happy19 <- read_csv("Happiness2019.csv")

Task 3 - Scatterplot 1

In the T03 code chunk, replace the NULLs to create a scatterplot depicting Happiness Score (y-axis) as a function of Social Support (x-axis). Create this as a scatterplot with a black and white themed background to the figure, and with every data point shown as a red filled dot. Each data point must be the same shade of color and not one lighter or darker than the other.

  • hint: The figure has been started for you; replace the NULLs
  • hint: No quotes on the variables names. E.g. Social.Support, not “Social.Support”
  • hint: geom_point
  • hint: color with aes or color without aes - which makes them all the same?
  • hint: ?_bw is a nice theme
ggplot(happy19, aes(x = Social.Support, y = Happiness.Score)) + geom_point(color = "red") + theme_bw()

Task 4 - Interpretation 1

In the T04 code chunk, replace the NULL with the number of the statement below that is true, storing your answer in answers_t4. Using your skills of interpretation of figures, it would appear that:

  1. In 2019, in general, overall happiness scores increase as social support increases.
  2. In 2019, in general, overall happiness scores increase as social support decreases.
  3. In 2019, in general, overall happiness scores decrease as social support increases.
  4. In 2019, in general, there is no relationship between social support and happiness.
answers_t4 <- 1

Task 5 - Replicating Figures

In the T05 code chunk, replace the NULLs to exactly replicate the figure shown below depicting Happiness Score as a function of Social Support. Each individual country should be shown as a downward-pointing triangle with each region represented by a separate color (i.e. countries from the same region have the same color).

  • hint: geom_point
  • hint: color with aes or color without aes - which makes them all the same?
  • hint: the shape is less than 10
  • hint: The figure has been started for you; replace the NULLs
ggplot(happy19, 
       aes(x = Social.Support, y = Happiness.Score)) + 
  geom_point(aes(color = Region), shape = 6) + 
  theme_bw()

Task 6 - Individual panels

In the T06 code chunk, using one line of code, adapt the figure in Task 5 so that this time you have individual scatterplots for each region. You should have 10 individual scatterplots in your Figure, with each scatterplot an individual panels. Don’t worry if the names don’t fit the panel size. Keep the same theme and color as stated in Task 5, but feel free to change the shape.

  • hint: Task5 + facet_?
  • hint: The figure has been started for you; replace the NULLs
ggplot(happy19, aes(x = Social.Support, y = Happiness.Score)) + 
  geom_point(aes(color = Region), shape = 6) + 
  theme_bw() + 
  facet_wrap(~Region)

Task 7 - Regions with not so many countries

In the T07 code chunk, replace the NULL with three lines of code that are piped together (i.e. %>%) so that happy19_big_regions contains the mean_happiness scores for only the Regions with less than 8 countries in them. The three columns of happy19_big_regions MUST be titled Region, n, and mean_happiness, but in any order is fine right now. The column n must show the number of countries in each region.

  • hint: group_by
  • hint: summarise
  • hint: filter
  • hint: two pipes (%>%) for three lines
  • hint: check the names of the columns match what is asked for. Check your output!
happy19_big_regions <- group_by(happy19, Region) %>% 
  summarise(n = n(), mean_happiness = mean(Happiness.Score)) %>% 
  filter(n < 8) 

happy19_big_regions
## # A tibble: 4 x 3
##   Region                        n mean_happiness
##   <chr>                     <int>          <dbl>
## 1 Australia and New Zealand     2           7.27
## 2 Eastern Asia                  6           5.69
## 3 North America                 2           7.08
## 4 Southern Asia                 7           4.53

Task 8 - Interpretation 2

In the T08 code chunk, replace the NULL, with the number from the list below, of the region with the lowest mean_happiness score in happy19_big_regions, storing your answer in answers_t8. For example, if you thought Central and Eastern Europe had the lowest mean happiness then you would change the NULL to 1.

  1. Central and Eastern Europe
  2. Eastern Asia
  3. Latin America and Caribbean
  4. Middle East and Northern Africa
  5. Australia and New Zealand
  6. North America
  7. Southeastern Asia
  8. Southern Asia
  9. Sub-Saharan Africa
  10. Western Europe
answers_t8 <- 8

Task 9 - Happy Neighbours Near and Far

Let’s finish by looking at the Happiness Score of some neighbouring countries and compare them to the UK. Using one line of code with two functions joined by a pipe (%>%), in the T09 code chunk, replace the NULL to first filter the five countries named below into happy19_neighs, and then select only the columns Country and Happiness.Score in that order.

Keep only the following five countries and in this order: Australia, Canada, New Zealand, United Kingdom, United States

  • hint: filter %in%
  • hint: filter() %>% select()
  • hint: if you have done this correctly your resulting tibble should have two columns (Country, Happiness.Score) and five rows, one for each country.
happy19_neighs <- filter(happy19, Country %in% c("Australia", 
                                             "Canada",
                                             "New Zealand", 
                                             "United Kingdom",
                                             "United States")) %>%
  select(Country, Happiness.Score)

happy19_neighs
## # A tibble: 5 x 2
##   Country        Happiness.Score
##   <chr>                    <dbl>
## 1 Australia                 7.23
## 2 Canada                    7.28
## 3 New Zealand               7.31
## 4 United Kingdom            7.05
## 5 United States             6.89

Task 10 - Plotting Happy Neighbours

In the T10 code chunk, replace the NULLs so that the figure uses the happy19_neighs tibble to show a barchart depicting Happiness Score (y-axis) versus Country (x-axis). Feel free to color the bars and theme the figure if you wish, but make sure it is a barchart with one column for each country.

  • hint: which geom with you _col?
  • hint: make sure that the country names are on the X axis and that you have only 5 bars in your figure - one for each country.
  • hint: The figure has been started for you; replace the NULLs
ggplot(happy19_neighs, aes(x = Country, y = Happiness.Score)) + geom_col()

Task 11 - Interpretation 3

In the T11 code chunk, replace the NULL with the absolute difference in Happiness Score between the United Kingdom and Canada as shown in Task 9 - happy19_neighs. “Absolute difference” means to ignore whether the difference is positive or negative and enter just the value, not the sign. State the answer to two decimal places and store it in answers_t11.

You can answer this question by entering a single value (e.g. answers_t11 <- 7), by entering a sum (e.g. answers_t11 <- 10-5), or by using reproducible code but keep in mind that the output must be a single value so if you use code you will need the pull() function.

answers_t11 <- happy19_neighs %>% 
  filter(Country %in% c("United Kingdom", "Canada")) %>%
  select(Happiness.Score) %>% 
  mutate(diff = max(Happiness.Score) - Happiness.Score) %>%
  filter(Happiness.Score == min(Happiness.Score)) %>%
  pull(diff) %>%
  round(2)

answers_t11
## [1] 0.22

Job Done