layout: true <a class="footer-link" href="https://sisteranalyst.org">sisteranalyst.org</a> --- class: title-slide, center, bottom # Visualing Time Series ## Data Literacy in R ### Tatjana Kecojevic --- class: freight-slide, center, middle, inverse # .shadow-text[ecdc's covid-19 data] **Covid-19 data** is a set of deily observations on the number of newly infected and death cases in different countries around the world. ## .emphasis[<https://www.ecdc.europa.eu/en/publications-data>] --- # .shadow-text[Grammar of Graphics] The [grammar of graphics](http://vita.had.co.nz/papers/layered-grammar.html) enables a structured way of creating a plot by adding the components as layers, making it look effective and attractive. - data - aesthetic mapping - geometric object - statistical transformations - scales - coordinate system - position adjustments - faceting Imagine talking about baking a cake and adding a cherry on the top. 🎂🍒 This philosophy has been built into the [`ggplot`](https://ggplot2.tidyverse.org/reference/) package by [Hadley Wickham](http://hadley.nz) for creating elegant and complex plots in R. --- # .shadow-text[Build a plot layer by layer] Built up on the 'Grammar of Graphics' philosophy the [`ggplot2`](https://cran.r-project.org/web/packages/ggplot2/index.html) package enables you to construct complex plot by iteration of discrete layers. It allows for the layers with different data sets and aesthetic mapping to be put together to create sophisticated plots at a high level of abstraction that encapsulate data from multiple sources. All **ggplot2 plots** begin with a call to `ggplot()`, supplying default data and aesthethic mappings, specified by aes(). You then add layers, scales, coords and facets with `+`. To explore and learn more: - [ggplot cheatsheet](https://rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf) - [reference: layers](https://ggplot2.tidyverse.org/reference/) .footnote[ 💡 [ggplot2](https://ggplot2.tidyverse.org/) is a part of the [tidyverse](https://www.tidyverse.org/), an ecosystem of packages designed with common APIs and a shared philosophy. ] --- # What we need to do The best way to master it is by practising. So let us create an awesome `ggplot`. 😃 What we need to do is the following: **i.** Read [covid-19](https://en.wikipedia.org/wiki/COVID-19_pandemic) [ecdc](https://www.ecdc.europa.eu/en/publications-data) data **ii.** Wrangle the data in the format suitable for visualisation. **iii.** "Initialise" a plot with `ggplot()`: **ggplot(<span style="color:blue">dataframe</span>, aes(<span style="color:orangered">x = explanatory variable</span>, <span style="color:green">y = response variable</span>))** and add a few layers to make it informative and captivating. --- name: astroboy background-image: url(images/astro_boy.jpg) background-size: contain background-color: #f6f6f6 # .emphasis[packages to install and upload] ##### `install.packages("readxl")` ##### `install.packages("httr")` ##### `install.packages("lubridate")` ##### `install.packages("tidyr")` ##### `install.packages("ggplot2")` ##### `install.packages("dplyr")` .footnote[ 💡 Check the list of the packages assembeld into the [tidyverse](https://www.tidyverse.org/packages/). ] --- ## **Task 1:** Get **ecdc** data ```r library(readxl) library(httr) library(lubridate) library(tidyr) library(ggplot2) library(dplyr) # access ecdc data url2 <- "https://www.ecdc.europa.eu/sites/default/files/documents/COVID-19-geographic-disbtribution-worldwide-2020-05-06.xlsx" GET(url2, write_disk(tf <- tempfile(fileext = ".xlsx"))) covid_ecdc <- read_excel(tf) ``` .footnote[ 💡 Check <https://importdata2r.netlify.app> to learn how to read data in R. ] --- ## Create ex-YU data subset ```r covid_yu <- covid_ecdc %>% filter(countriesAndTerritories %in% c("Bosnia_and_Herzegovina", "Croatia", "Montenegro", "North_Macedonia", "Serbia", "Slovenia")) # remove unneeded columns covid_yu <- covid_yu[, -c(2:4, 8)] ``` --- ## **Task 2:** Data Wrangling ```r covid_yu <- covid_yu %>% separate(dateRep, c("dateRep"), sep = "T") %>% group_by(countriesAndTerritories) %>% arrange(dateRep) %>% mutate(total_cases = cumsum(cases), total_deaths = cumsum(deaths)) %>% # cumulative data mutate(Diff_cases = total_cases - lag(total_cases), Rate_pc_cases = round(Diff_cases/lag(total_cases) * 100, 2)) %>% # growt rate mutate(second_der = Diff_cases - lag(Diff_cases)) %>% # 2nd derivative rename(country = countriesAndTerritories) %>% rename(country_code = countryterritoryCode) covid_yu$dateRep <- as.Date(covid_yu$dateRep) # set `dateRep` as date type covid_sr <- covid_yu %>% filter(country_code == "SRB") # subset SRB data ``` --- ## **Task 3:** ggplot: `\(F^{''}(x)\)` vs date of recording .pull-left[ ```r covid_sr %>% filter(!is.na(second_der)) %>% * ggplot(aes(x = dateRep, y = second_der)) ``` ] .pull-right[ <img src="images/ggplot1.png" width="726" /> ] --- ## add the time serie .pull-left[ ```r covid_sr %>% filter(!is.na(second_der)) %>% ggplot(aes(x = dateRep, y = second_der)) + * geom_line() ``` ] .pull-right[ <img src="images/ggplot2.png" width="726" /> ] --- ## add data points .pull-left[ ```r covid_sr %>% filter(!is.na(second_der)) %>% ggplot(aes(x = dateRep, y = second_der)) + * geom_line() + geom_point(col = "#00688B") ``` ] .pull-right[ <img src="images/ggplot3.png" width="726" /> ] --- ## remove axes labels .pull-left[ ```r covid_sr %>% filter(!is.na(second_der)) %>% ggplot(aes(x = dateRep, y = second_der)) + geom_line() + geom_point(col = "#00688B") + * xlab("") + ylab("") ``` ] .pull-right[ <img src="images/ggplot4.png" width="726" /> ] --- ## add title and caption .pull-left[ ```r covid_sr %>% filter(!is.na(second_der)) %>% ggplot(aes(x = dateRep, y = second_der)) + geom_line() + geom_point(col = "#00688B") + xlab("") + ylab("") + labs (title = "2nd derivative of F(x)", * caption = "Data from: https://www.ecdc.europa.eu") ``` ] .pull-right[ <img src="images/ggplot5.png" width="726" /> ] --- ## use built-in [theme](https://ggplot2.tidyverse.org/reference/ggtheme.html) .pull-left[ ```r covid_sr %>% filter(!is.na(second_der)) %>% ggplot(aes(x = dateRep, y = second_der)) + geom_line() + geom_point(col = "#00688B") + xlab("") + ylab("") + labs (title = "2nd derivative of F(x)", caption = "Data from: https://www.ecdc.europa.eu") + * theme_minimal() ``` ] .pull-right[ <img src="images/ggplot6.png" width="726" /> ] --- ## align title and remove grid lines .pull-left[ ```r covid_sr %>% filter(!is.na(second_der)) %>% ggplot(aes(x = dateRep, y = second_der)) + geom_line() + geom_point(col = "#00688B") + xlab("") + ylab("") + labs (title = "2nd derivative of F(x)", caption = "Data from: https://www.ecdc.europa.eu") + theme_minimal() + * theme(plot.title = element_text(size = 14, vjust = 2, hjust=0.5), * panel.grid.major.x = element_blank(), * panel.grid.minor.x = element_blank()) ``` ] .pull-right[ <img src="images/ggplot7.png" width="726" /> ] --- ## add vertical reference lines .pull-left[ ```r covid_sr %>% filter(!is.na(second_der)) %>% ggplot(aes(x = dateRep, y = second_der)) + geom_line() + geom_point(col = "#00688B") + xlab("") + ylab("") + labs (title = "2nd derivative of F(x)", caption = "Data from: https://www.ecdc.europa.eu") + theme_minimal() + theme(plot.title = element_text(size = 14, vjust = 2, hjust=0.5), panel.grid.major.x = element_blank(), panel.grid.minor.x = element_blank()) + * geom_vline(xintercept = as.numeric(as.Date(c("2020-03-16", "2020-03-22", "2020-03-28", "2020-04-04", "2020-04-10", "2020-04-17", "2020-04-24", "2020-04-30"))), linetype=4, colour="red", alpha = 0.5) ``` ] .pull-right[ <img src="images/ggplot8.png" width="660" /> ] --- ## anotate refrence lines with text .pull-left[ ```r ts_plot + *annotate(geom="text", x=as.Date("2020-03-16"), y = 150, label="state of\nemergency", col = "dodgerblue4") + * annotate(geom="text", x=as.Date("2020-03-22"), y = 70, label="curfew\n5pm-5am", col = "dodgerblue4") + * annotate(geom="text", x=as.Date("2020-03-28"), y = -140, label="weekend curfew\n3pm-5am", col = "dodgerblue4") + * annotate(geom="text", x=as.Date("2020-04-04"), y = 220, label="weekend curfew\n1pm Sat-5am Mon ", col = "dodgerblue4") + * annotate(geom="text", x=as.Date("2020-04-10"), y = -90, label="Easter curfew\n5pm Fri-5am Mon ", col = "dodgerblue4") + * annotate(geom="text", x=as.Date("2020-04-17"), y = 200, label="Easter curfew\n5pm Fri-5am Tue ", col = "dodgerblue4") + * annotate(geom="text", x=as.Date("2020-04-24"), y = -120, label="weekend curfew\n5pm Fri-5am Mon ", col = "dodgerblue4") + * annotate(geom="text", x=as.Date("2020-04-30"), y = 160, label="May Day curfew\n6pm Thu-5am Sat ", col = "dodgerblue4") ``` ] .pull-right[ <img src="images/ggplot9.png" width="726" /> ] .footnote[ 💡 Note that the plot from the previous slide has been saved as `ts_plot`! ] --- name: astroboy background-image: url(images/ggplot9.png) background-size: contain background-color: #f6f6f6 # .shadow-text[Nice... 😃] --- class: freight-slide, center, middle, inverse # .shadow-text[To learn more visit: <https://dataliteracy.rbind.io>] .emphasis[To see it in action visit: <http://covid19sr.rbind.io>] [
@Tatjana_Kec](https://twitter.com/Tatjana_Kec) [
@TanjaKec](https://github.com/TanjaKec)