Appendix H — Dates and Times
Working with dates and times can be a little tricky, but the package is there to help. Their website has a helpful [cheatsheet](https://rawgit.com/rstudio/cheatsheets/main/lubridate.pdf) and you can view a tutorial by typing
vignette(“lubridate”)` in the console pane. The Dates and Times in R for Data Science also gives a helpful overview.
This appendix is a quick intro to some of the most useful functions for making reproducible reports.
H.1 Parsing
Dates can be in many formats. The ymd
functions can deal with almost all of them, regardless of the punctuation used in the format. All of the examples below produce a date in the standard format “2022-01-03”.
There are similar functions for date/times, too.
[1] "2022-01-03 18:05:20 UTC"
[1] "2022-01-03 18:00:00 UTC"
The date/time functions can also take a timezone argument. If you don’t specify it, it defaults to “UTC”.
H.2 Get Parts
You frequently need to extract parts of a date/time for plotting. The following functions extract specific parts of a date or datetime object. This is a godsend for those of us who never have a clue what week of the year it is today.
# get the date and time when this function is run
now <- now(tzone = "GMT")
# get separate parts
time_parts <- list(
second = second(now),
minute = minute(now),
hour = hour(now),
day = day(now), # day of the month (same as mday())
wday = wday(now), # day of the week
yday = yday(now), # day of the year
week = week(now),
isoweek = isoweek(now), # ISO 8501 week calendar (Monday start)
epiweek = epiweek(now), # CDC epidemiological week (Sunday Start)
month = month(now),
year = year(now),
tz = tz(now)
)
str(time_parts)
List of 12
$ second : num 49.2
$ minute : int 49
$ hour : int 14
$ day : int 21
$ wday : num 4
$ yday : num 355
$ week : num 51
$ isoweek: num 51
$ epiweek: num 51
$ month : num 12
$ year : num 2022
$ tz : chr "GMT"
The month()
and wday()
functions can return factor labels.
jan1 <- ymd(20220101)
wday(jan1, label = TRUE)
wday(jan1, label = TRUE, abbr = TRUE)
month(jan1, label = TRUE)
month(jan1, label = TRUE, abbr = TRUE)
[1] Sat
Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat
[1] Sat
Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat
[1] Jan
12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < ... < Dec
[1] Jan
12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < ... < Dec
H.3 Date Arithmetic
You can add and subtract dates. For example, you can get the dates two weeks from today by adding weeks(2)
to today()
. You can probably guess how to add and subtract seconds, minutes, days, months, and years.
H.3.1 First and last of month
For things like billing, you might need to find the first or last days of the current, previous, or next month. The rollback()
and rollforward()
functions are easier than trying to parse dates.
d <- ymd("2022-01-24")
rollback(d) # last day of the previous month
rollforward(d) # last day of the current month
rollback(d, roll_to_first = TRUE) # first day of the current month
rollforward(d, roll_to_first = TRUE) # first day of the next month
[1] "2021-12-31"
[1] "2022-01-31"
[1] "2022-01-01"
[1] "2022-02-01"
H.3.2 Rounding
You can round dates and times to the nearest unit. This can be useful when you have, for example, time measured to the nearest second, but want to group data by the nearest hour, rather than extract the hour component.
H.4 Internationalisation
You may need to work with dates from a different locale than your computer’s defaults, such as dates written in French or Russian. Or your computer may have a non-English locale. Set the locale
argument to the relevant language code.
ymd("2022 January 24", locale = "en_GB")
ymd("2022 Janvier 24", locale = "fr_FR")
wday("2022-01-03", label = TRUE, locale = "ru_RU")
[1] "2022-01-24"
[1] "2022-01-24"
[1] пн
Levels: вс < пн < вт < ср < чт < пт < сб
Some of the locale functions only work on unix-based machines, like Macs or machines running linux.
<locale>
Numbers: 123,456.78
Formats: %AD / %AT
Timezone: UTC
Encoding: UTF-8
<date_names>
Days: Sunday (Sun), Monday (Mon), Tuesday (Tue), Wednesday (Wed), Thursday
(Thu), Friday (Fri), Saturday (Sat)
Months: January (Jan), February (Feb), March (Mar), April (Apr), May (May),
June (Jun), July (Jul), August (Aug), September (Sep), October
(Oct), November (Nov), December (Dec)
AM/PM: AM/PM
H.5 Example
Let’s work through some examples with downloaded tweets from the class data.
The time
column is already in date/time (POSIXct) format, but what if we wanted to plot tweets by hour for each day of the week?
tweets %>%
mutate(weekday = wday(time, label = TRUE),
hour = hour(time)) %>%
ggplot(aes(x = hour, fill = weekday)) +
geom_bar(size = 1, alpha = 0.5, show.legend = FALSE) +
facet_grid(~weekday) +
scale_fill_manual(values = rainbow(7)) +
scale_x_continuous(breaks = seq(0, 24, 4))
A nice side-effect of using the lubridate function to get days of the week or months of the year is that the results are an ordered factor, so display correctly in a plot. Let’s display the months in Greek (if that’s available on your system).