Mirai Solutions :: Live COVID-19 Swiss vaccination analysis

A new live R Shiny application in our gallery: COVID-19 vaccination breakthroughs in Switzerland.

In the past we had written a couple of articles (on 20/10/2021 and 06/12/2021) about the COVID-19 Vaccination breakthroughs in Switzerland with the promise to publish them periodically on our site. We have decided instead to make this analysis a live dashboard article integrated in our gallery, that reads every day the data from BAG (Swiss Federal Office for Public Health) to report always the most up-to-date vaccination figures.

Compared to the previous articles, the Vaccinated group is now split into 3 categories to account for the addition of Booster vaccinations:

Fully Vaccinated with Booster
Fully Vaccinated without Booster
Partially Vaccinated

The categories above are compared against the Unvaccinated group to evaluate the vaccination benefit.

Hospitalizations and Death rates within the 4 populations are compared to derive who is more at risk. The following measures are shown in the article:

Hospitalizations and Deaths counts
Hospitalizations and Deaths counts per 100’000 people
ratio of the latter measure between the Unvaccinated and Vaccinated groups.

Rather than focusing on the content of the article, in this post we would like to describe the process and architecture of the deployment that allows us to:

update data constantly in a controlled way
use interactive Shiny components in an R Markdown document
use shinyapps.io for hosting the live version of the article
safely deploy with a process orchestrated by CI/CD workflow using GitHub Actions.

For a better illustration and understanding, the source code is publicly available in our GitHub repository covid19-vaccination-ch.

Reading BAG data

We are interested in collecting the weekly BAG reports about vaccination breakthroughs.

Thanks to the well maintained data documentation we can easily identify what we want to read. The R package jsonlite is all we need to read from the exposed API.

bag_api_url <- "https://www.covid19.admin.ch/api/data/context/"
bag_sources <- jsonlite::fromJSON(bag_api_url)
str(bag_sources, max.level = 2, strict.width = "cut")
## List of 3
##  $ sourceDate : chr "2022-03-08T06:04:50.000+01:00"
##  $ dataVersion: chr "20220308-cyc99ifc"
##  $ sources    :List of 6
##   ..$ comment   : chr "OpenData DCAT-AP-CH metadata is now available as well"..
##   ..$ opendata  :List of 3
##   ..$ schema    :List of 2
##   ..$ readme    : chr "https://www.covid19.admin.ch/api/data/documentation/"
##   ..$ zip       :List of 2
##   ..$ individual:List of 2

The object bag_sources is an R list containing all links to the JSON sources mentioned in the documentation. As an example, the code below shows how to read weekly Hospitalizations by vaccination status for different age classes, which can be found in ...$sources$individual$json$weekly$byAge$hospVaccPersons.

source_weekly_by_age <- bag_sources$sources$individual$json$weekly$byAge
str(source_weekly_by_age, strict.width = "cut")
## List of 8
##  $ cases           : chr "https://www.covid19.admin.ch/api/data/20220308-cyc"..
##  $ casesVaccPersons: chr "https://www.covid19.admin.ch/api/data/20220308-cyc"..
##  $ hosp            : chr "https://www.covid19.admin.ch/api/data/20220308-cyc"..
##  $ hospReason      : chr "https://www.covid19.admin.ch/api/data/20220308-cyc"..
##  $ hospVaccPersons : chr "https://www.covid19.admin.ch/api/data/20220308-cyc"..
##  $ death           : chr "https://www.covid19.admin.ch/api/data/20220308-cyc"..
##  $ deathVaccPersons: chr "https://www.covid19.admin.ch/api/data/20220308-cyc"..
##  $ test            : chr "https://www.covid19.admin.ch/api/data/20220308-cyc"..

source_weekly_hosp_by_age_vacc <- source_weekly_by_age$hospVaccPersons
weekly_hosp_by_age_vacc <- jsonlite::fromJSON(source_weekly_hosp_by_age_vacc)
str(weekly_hosp_by_age_vacc, strict.width = "cut")
## 'data.frame':	3828 obs. of  14 variables:
##  $ date                : int  202104 202104 202104 202104 202104 202104 20210..
##  $ altersklasse_covid19: chr  "0 - 9" "0 - 9" "0 - 9" "0 - 9" ...
##  $ vaccination_status  : chr  "fully_vaccinated" "partially_vaccinated" "not"..
##  $ entries             : int  0 0 6 2 0 0 1 1 0 0 ...
##  $ sumTotal            : int  0 0 6 2 0 0 1 1 0 0 ...
##  $ pop                 : int  6 10 880571 NA 51 988 852336 NA 419 7590 ...
##  $ inz_entries         : num  0 0 0.68 NA 0 0 0.12 NA 0 0 ...
##  $ geoRegion           : chr  "CHFL" "CHFL" "CHFL" "CHFL" ...
##  $ type                : chr  "COVID19Hosp" "COVID19Hosp" "COVID19Hosp" "COV"..
##  $ type_variant        : chr  "vaccine" "vaccine" "vaccine" "vaccine" ...
##  $ vaccine             : chr  "all" "all" "all" "all" ...
##  $ data_completeness   : chr  "limited" "limited" "limited" "limited" ...
##  $ version             : chr  "2022-03-08_06-04-50" "2022-03-08_06-04-50" "2"..
##  $ timeframe_all       : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...

For our scope we must also read Infections per age group and Deaths entries per vaccination status and age group. They are available from other elements of the bag_sources list.

The article should show the data from the latest weekly reports from BAG, updated as of today and then aggregated over the past 4 weeks.

The R package `covid19vaccinationch`

The source code is structured as an R package called covid19vaccinationch. The R Markdown article (inst/report/index.Rmd) is part of the installed package and utilizes its functions.

The package can be installed locally by executing

remotes::install_github("miraisolutions/covid19vaccinationch")

and exposes a function run_report() that renders index.Rmd via rmarkdown::run() and generates the HTML report

covid19vaccinationch::run_report()

The data are stored in 3 RDS files in the inst/bag_data source folder, and installed alongside the R Markdown article as part of the package. covid19vaccinationch uses renv to control the set of package dependencies.

Data update and CI/CD

BAG releases new data every day around 1:30pm CET/CEST, this daily update would also report with delay older cases from the past weeks and therefore update the results of our article. For this reason there is the need to query the data from source every day to show always the most up-to-date report. Furthermore, we would like to avoid the data reading and processing steps every time in order to load the report faster for the users.

The package contains a function build_data() that constructs the 3 main data sets required by the article storing them in inst/bag_data as RDS files.

The GitHub Action workflow (defined in .github/workflows/workflow.yml) executes build_data() on the main branch every day at 1PM UTC (GitHub Action scheduling is based on UTC time), and, if new data from the past weeks are found, the updated RDS files are pushed to the repository, making the latest data available to the deployed application. We must also consider that a non-backwards compatible data structure change from BAG may compromise the rendering of the article, for this reason, upon any new data introduction, the package must be checked as part of the Continuous Integration / Deployment GitHub Actions workflow before pushing the data to the repository. In such a broken case the “R CMD check” step of the workflow will fail preventing any deployment to shinyapps.io, and the report will show the latest working data until the package has been made compatible with the new data structure.

The main steps executed sequentially by the workflow are:

Execute covid19vaccination::build_data() on schedule to fetch and build updated data
Continuous Integration: tests via R CMD check, verifying that new data are compatible and work as expected
Continuous Deployment upon successful R CMD check:
- Commit and push RDS files if changes are found
- Deploy to shinyapps.io

Going more into details, the step “Fetch and rebuild latest BAG data” and “Commit and push updated BAG data” of the GitHub Action reacts on a schedule event:

on:
  schedule:
    - cron: "0 13 * * 1-5"  # 13 because UTC, it corresponds to 14 CET

The 5 required entries of cron define the minutes, hours, days of month, months and day of week of the scheduled event, where an * indicates no constraint on a certain time. Our schedule triggers the workflow at 1PM UTC every day excluding Saturday and Sunday (6 and 0 in cron), when BAG provides no update. More patterns can be created with the schedule event, see the corresponding guide.

Rendering R Markdown

The article contains both ggplot2 / plotly graphs and shiny interactive charts (the line plots). R Markdown allows using Shiny widgets to create an interactive report using runtime: shiny. However, this requires the full re-rendering of the document (including the non-interactive parts) for each user session, and can therefore result in a slow performance.

The special runtime: shiny_prerendered is available since v1.2 of rmarkdown and has major performance advantages compared to runtime: shiny. Using shiny_prerendered allows to split rendering of UI elements and load / manipulation of data from the interactive server logic for end users. As a result, most of the code is run only once when the document is (pre-)rendered (R Markdown, UI elements, data caching) and only some code is run for every user interaction (Shiny server logic).

Deployment to shinyapps.io

Deploying to shinyapps.io usually requires in the project directory an app.R file that runs the Shiny App, however it’s also possible to deploy an Rmd file called index.Rmd that will be served as the default document for the directory and recognized by shinyapps.io (see documentation).

Conclusion

We have provided a public repository where we show an example of how to safely deploy to shinyapps.io the automated analysis of COVID-19 vaccination breakthroughs in Switzerland by means of an R package containing an R Markdown document and up-to-date data. We have highlighted the benefits of making use of the shiny_prerendered runtime for R Markdown, and of programmatically fetching / updating the data as part of a GitHub Actions CI-CD workflow, with the goal to save reading time when loading the page and to have always the latest and compatible data available in a controlled fashion.