End hunger, achieve food security and improved nutrition and promote sustainable agriculture

Setup Code
library(tidyverse)
library(readxl)
theme_set(theme_bw())

Source: The Global Hunger Index

Each country’s GHI score is calculated based on a formula that combines four indicators that together capture the multidimensional nature of hunger:

Read more about the methods and measures.

2.1 Simplified Subsets

2.1.1 Clean Data

This dataset needs to be cleaned a bit before you can use it. This can be a good exercise for students, or you can use the pre-cleaned version below.

Some of the things to fix are:

  • the desired data is on the third sheet
  • the original column names are not ideal
  • the spreadsheet has notes below the data
  • uses “—” to represent missing values
  • contains text cells in numeric columns like “<5”
  • some numeric values import like “35.700000000000003”
  • the country names are unlikely to match other data

Here, we choose to use the value of 5 for all cells with “<5”, but you might choose another solution.

Data Cleaning Code
cn <- c("country", "2000", "2008", "2016", "2024")
ghi_2024 <- readxl::read_xlsx("data/02/ghi-2024.xlsx", 
                             sheet = 3, range = "A4:E139", 
                             col_names = cn, na = "—") |>
  pivot_longer(-country) |>
  mutate(value = ifelse(value == "<5", "5", value),
         value = as.numeric(value) |> round(1)) |>
  pivot_wider()

One thing generative AI is pretty good at is helping with tedious rote tasks. We need to get the 3-letter country code for all of the countries. We can use the code below to get a quoted and comma-separated list of the country names, and ask genAI:

Give me the 3-letter ISO country code for each country in this list. Return the data in R vector format, like c(‘AFG’, ALB’, …)

Get Country Names
paste0("'", ghi_2024$country, "'", collapse = ", ") |> cat()
'Afghanistan', 'Albania', 'Algeria', 'Angola', 'Argentina', 'Armenia', 'Azerbaijan', 'Bahrain', 'Bangladesh', 'Belarus', 'Benin', 'Bhutan', 'Bolivia (Plurinat. State of)', 'Bosnia & Herzegovina', 'Botswana', 'Brazil', 'Bulgaria', 'Burkina Faso', 'Burundi', 'Cabo Verde', 'Cambodia', 'Cameroon', 'Central African Republic', 'Chad', 'Chile', 'China', 'Colombia', 'Comoros', 'Congo (Republic of)', 'Costa Rica', 'Côte d'Ivoire', 'Croatia', 'Dem. Rep. of the Congo', 'Djibouti', 'Dominican Republic', 'Ecuador', 'Egypt', 'El Salvador', 'Equatorial Guinea', 'Eritrea', 'Estonia', 'Eswatini', 'Ethiopia', 'Fiji', 'Gabon', 'Gambia', 'Georgia', 'Ghana', 'Guatemala', 'Guinea', 'Guinea-Bissau', 'Guyana', 'Haiti', 'Honduras', 'Hungary', 'India', 'Indonesia', 'Iran (Islamic Republic of)', 'Iraq', 'Jamaica', 'Jordan', 'Kazakhstan', 'Kenya', 'Korea (DPR)', 'Kuwait', 'Kyrgyzstan', 'Lao PDR', 'Latvia', 'Lebanon', 'Lesotho', 'Liberia', 'Libya', 'Lithuania', 'Madagascar', 'Malawi', 'Malaysia', 'Maldives', 'Mali', 'Mauritania', 'Mauritius', 'Mexico', 'Moldova (Rep. of)', 'Mongolia', 'Montenegro', 'Morocco', 'Mozambique', 'Myanmar', 'Namibia', 'Nepal', 'Nicaragua', 'Niger', 'Nigeria', 'North Macedonia', 'Oman', 'Pakistan', 'Panama', 'Papua New Guinea', 'Paraguay', 'Peru', 'Philippines', 'Qatar', 'Romania', 'Russian Federation', 'Rwanda', 'Saudi Arabia', 'Senegal', 'Serbia', 'Sierra Leone', 'Slovakia', 'Solomon Islands', 'Somalia', 'South Africa', 'South Sudan', 'Sri Lanka', 'Sudan', 'Suriname', 'Syrian Arab Republic', 'Tajikistan', 'Tanzania (United Rep. of)', 'Thailand', 'Timor-Leste', 'Togo', 'Trinidad & Tobago', 'Tunisia', 'Türkiye', 'Turkmenistan', 'Uganda', 'Ukraine', 'United Arab Emirates', 'Uruguay', 'Uzbekistan', 'Venezuela (Boliv. Rep. of)', 'Viet Nam', 'Yemen', 'Zambia', 'Zimbabwe'

ChatGPT returned the following, which I added as a new column to the data:

Add Country Codes
ghi_2024$country_code <- c(
  'AFG', 'ALB', 'DZA', 'AGO', 'ARG', 'ARM', 'AZE', 'BHR', 'BGD', 'BLR', 'BEN', 'BTN',
  'BOL', 'BIH', 'BWA', 'BRA', 'BGR', 'BFA', 'BDI', 'CPV', 'KHM', 'CMR', 'CAF', 'TCD',
  'CHL', 'CHN', 'COL', 'COM', 'COG', 'CRI', 'CIV', 'HRV', 'COD', 'DJI', 'DOM', 'ECU',
  'EGY', 'SLV', 'GNQ', 'ERI', 'EST', 'SWZ', 'ETH', 'FJI', 'GAB', 'GMB', 'GEO', 'GHA',
  'GTM', 'GIN', 'GNB', 'GUY', 'HTI', 'HND', 'HUN', 'IND', 'IDN', 'IRN', 'IRQ', 'JAM',
  'JOR', 'KAZ', 'KEN', 'PRK', 'KWT', 'KGZ', 'LAO', 'LVA', 'LBN', 'LSO', 'LBR', 'LBY',
  'LTU', 'MDG', 'MWI', 'MYS', 'MDV', 'MLI', 'MRT', 'MUS', 'MEX', 'MDA', 'MNG', 'MNE',
  'MAR', 'MOZ', 'MMR', 'NAM', 'NPL', 'NIC', 'NER', 'NGA', 'MKD', 'OMN', 'PAK', 'PAN',
  'PNG', 'PRY', 'PER', 'PHL', 'QAT', 'ROU', 'RUS', 'RWA', 'SAU', 'SEN', 'SRB', 'SLE',
  'SVK', 'SLB', 'SOM', 'ZAF', 'SSD', 'LKA', 'SDN', 'SUR', 'SYR', 'TJK', 'TZA', 'THA',
  'TLS', 'TGO', 'TTO', 'TUN', 'TUR', 'TKM', 'UGA', 'UKR', 'ARE', 'URY', 'UZB', 'VEN',
  'VNM', 'YEM', 'ZMB', 'ZWE'
)

Since I never blindly trust genAI, I’m going to sense check this against the country codes from a trusted source, and visually inspect all of the instances where the country in ghi_2024 doesn’t match the country name.

Check Country Codes
countries <- read_csv("data/countries.csv", show_col_types = FALSE)

check <- left_join(ghi_2024, countries, by = c(country_code = "alpha-3")) |>
  select(country, country_code, name)

filter(check, country != name)
country country_code name
Bolivia (Plurinat. State of) BOL Bolivia, Plurinational State of
Bosnia & Herzegovina BIH Bosnia and Herzegovina
Congo (Republic of) COG Congo
Dem. Rep. of the Congo COD Congo, Democratic Republic of the
Iran (Islamic Republic of) IRN Iran, Islamic Republic of
Korea (DPR) PRK Korea, Democratic People’s Republic of
Lao PDR LAO Lao People’s Democratic Republic
Moldova (Rep. of) MDA Moldova, Republic of
Tanzania (United Rep. of) TZA Tanzania, United Republic of
Trinidad & Tobago TTO Trinidad and Tobago
Venezuela (Boliv. Rep. of) VEN Venezuela, Bolivarian Republic of

You can see that they’re all just alternative ways of writing the country name.

This data set is good for practicing data reshaping from wide to long. You can also join it with data from other sets that have 3-letter ISO country codes.

Table 2.1: The first rows of ghi_2024
country 2000 2008 2016 2024 country_code
Afghanistan 49.6 35.7 27.1 30.8 AFG
Albania 16.0 15.5 6.2 7.9 ALB
Algeria 14.5 11.0 8.5 6.7 DZA
Angola 63.8 42.7 25.9 26.6 AGO
Argentina 6.6 5.4 5.2 6.6 ARG
Armenia 19.2 11.7 6.4 5.1 ARM

2.1.2 Long format

The original data is in wide format, with one column per year. We can also provide the data in long format. It is almost always better to have data in long format for things like wrangling and plotting.

Code to create ghi_long
ghi_long <- ghi_2024 |>
  pivot_longer(-c(country, country_code), 
               names_to = "year",
               names_transform = as.integer,
               values_to = "ghi")
Table 2.2: The first rows of ghi_long
country country_code year ghi
Afghanistan AFG 2000 49.6
Afghanistan AFG 2008 35.7
Afghanistan AFG 2016 27.1
Afghanistan AFG 2024 30.8
Albania ALB 2000 16.0
Albania ALB 2008 15.5
Plot Code
ggplot(ghi_long, aes(x = as.factor(year), y = ghi)) +
  geom_violin(fill = "#DDA63A") +
  scale_y_continuous(breaks = seq(0, 100, 10), expand = c(0,0)) +
  coord_cartesian(ylim = c(0, 70)) +
  labs(x = NULL, y = "Global Hunger Index")
Figure 2.1: The distribution of the global hunger index for four years

2.1.3 Indicators

The second sheet of the excel file contains indicators used to calculate the GHI. This also needs a fair bit of cleaning.

Code
cn <- c("country", 
        "undernourishment_2000", 
        "undernourishment_2008", 
        "undernourishment_2016", 
        "undernourishment_2024", 
        "wasting_2000", NA,
        "wasting_2008",  NA,
        "wasting_2016",  NA,
        "wasting_2024",  NA,
        "stunting_2000",  NA,
        "stunting_2008",  NA,
        "stunting_2016",  NA,
        "stunting_2024",  NA,
        "mortality_2000", 
        "mortality_2008", 
        "mortality_2016", 
        "mortality_2024")
ct <- c("text",
        rep("text", 4),
        rep(c("text", "skip"), 8),
        rep("text", 4))
  
ghi_indicators <- readxl::read_xlsx(
  "data/02/ghi-2024.xlsx",
  sheet = 2, range = "B4:Z139", na = "—", 
  col_names = cn, col_types = ct) |>
  pivot_longer(-country) |>
  separate(name, c("indicator", "year")) |>
  mutate(value = ifelse(value == "< 2.5", 2.5, value),
         value = as.numeric(value) |> round(1)) |>
  pivot_wider(names_from = indicator)

This data set is in intermediate format (long for years and wide for measures) and is good for teaching about data reshaping and multivariate correlations.

Table 2.3: The first rows of ghi_indicators
country year undernourishment wasting stunting mortality
Afghanistan 2000 46.0 8.9 54.4 13.2
Afghanistan 2008 25.1 7.2 50.8 9.6
Afghanistan 2016 20.5 5.1 38.2 7.0
Afghanistan 2024 30.4 3.6 44.6 5.8
Albania 2000 4.9 6.5 32.8 2.7
Albania 2008 7.4 9.6 23.2 1.6
Plot Code
ggplot(ghi_indicators, aes(x = undernourishment, y = mortality)) +
  facet_wrap(~ year) +
  geom_point(alpha = 0.5) +
  theme(strip.background = element_rect(fill = "#DDA63A"))
Figure 2.2: The relationship between undernourishment and child mortality for 4 years of data across all countries

2.2 Resources