Automated Reporting on Lacking Rainfall in Germany

Climate change causes severe disturbances in what we used to call a more or less stable climate. Data journalists are, thus, increasingly focusing on quantifying the effects of the climate crisis. This is an example on how to do so.

Yannik yannikbuhl.de
06-13-2021

Climate change causes severe disturbances in what we used to call a more or less stable climate. Germany, just as many other countries around the globe, suffers from an increasing lack of rainfall, which in turn causes situations close to what is called a draught. Data journalists are, thus, increasingly focusing on quantifying the effects of the climate crisis. When I used to work as an Editor at Stuttgarter Zeitung, I contributed to this journalistic goal by writing and automating a script that would help us tell the readers which region of Baden-Württemberg, Germany, has suffered from the longest absence of rainfall.

It is a more or less easy way of automating reports on the climate’s effects on our weather. In the following, I demonstrate how my script looks like. The data come from the German National Weather Service (DWD, Deutscher Wetterdienst).

First, what you need is a bunch of packages:

library(pacman)
pacman::p_load(rdwd, magrittr, dplyr, here, bit64, lubridate, 
               readxl, purrr, backports, remotes)
remotes::install_github("munichrocker/DatawRappr")
library(DatawRappr)

Second, you need a list of all the DWD stations (including their IDs) that provide the desired measures (in this case, amount of rainfall, but this could also be temperature, etc.):

# Set path
path <- "_posts/2021-06-13-automated-reporting-on-lacking-rainfall-in-germany/"

# Get stations
stationen <- readxl::read_excel(here::here(path, "stationen_bw.xlsx"))
head(stationen)
# A tibble: 6 x 6
  stations_id stationshoehe stationsname          lufttemperatur  wind
        <dbl>         <dbl> <chr>                          <dbl> <dbl>
1          11           680 Donaueschingen Lande…              0     1
2        1013           309 Dogern                             0     1
3        1346          1490 Feldberg/Schwarzwald               0     1
4        1443           237 Freiburg                           0     1
5        1468           797 Freudenstadt                       0     1
6        1490           394 Friedrichshafen                    0     1
# … with 1 more variable: niederschlag <dbl>

For an analysis on rainfall, we need to extract those stations that measure it:

# Get all stations
niederschlag <- stationen$stations_id[stationen$niederschlag == 1]

In case you want to automate this script, create a folder where you want to store the data that the {rdwd} package downloads:

# Check if folder for zip data exists & create it if necessary
if (dir.exists(here::here(path, "wetter")) == FALSE) {
  dir.create(here::here(path, "wetter"))}

After these initial steps, we can start to download and process recent rainfall data from all stations involved. I download the data on a daily resolution first. Doing so, we can simply count the days since the last rainfall:

# Get URLs for DWD zip files
download <- rdwd::selectDWD(id = niederschlag, 
                            res = "daily", 
                            outvec = TRUE, 
                            var = "more_precip", 
                            per = "recent")

# Download actual zip files and extract data as lists
res1 <- rdwd::dataDWD(url = download, 
                      dir = here::here(path, "wetter"), 
                      force = TRUE, 
                      quiet = TRUE, 
                      overwrite = TRUE)

# Create general data frame from all lists
res1 %>% purrr::map_dfr(as.data.frame) -> res2
res2 %>% filter(MESS_DATUM > as.POSIXct("2020-01-01")) -> res2

# Process and find last day of niederschlag
results <- res2 %>% group_by(STATIONS_ID) %>% 
  mutate(MESS_DATUM = as.POSIXct(MESS_DATUM)) %>% 
  filter(RS > 0) %>% 
  summarise(niederschlag = last(RS), time = last(MESS_DATUM)) %>% 
  mutate(days = as.integer(Sys.Date() - as.Date(time))) %>% 
  arrange(desc(days)) %>% 
  rename(stations_id = STATIONS_ID) %>% 
  mutate(time = format(time, format = "%d.%m.%Y"))

# Check if folder with zip files exists and delete if present
if (dir.exists(here::here(path, "wetter")) == TRUE) {
      unlink(here::here(path, "wetter"), recursive = TRUE)
}

What is necessary now is to check whether there has been rainfall on this present day. If so, we have to set the counter of days without rain to 0:

recent1 <- rdwd::selectDWD(id = niederschlag, 
                           res = "10_minutes", 
                           outvec = TRUE, 
                           var = "precipitation", 
                           per = "now")

if (dir.exists(here::here(path, "wetter")) == FALSE) {
  dir.create(here::here(path, "wetter"))}

# Download actual zip files and extract data as lists
recent2 <- rdwd::dataDWD(url = recent1, 
                         dir = here::here(path, "wetter"), 
                         force = TRUE, 
                         quiet = TRUE, 
                         overwrite = TRUE)

# Create general data frame from all lists
recent2 %>% purrr::map_dfr(as.data.frame) -> recent3

# Delete unused columns and delete all data before yesterday to prevent errors
recent3 %>% filter(MESS_DATUM > Sys.Date() - 1) %>% 
  dplyr::select(STATIONS_ID, MESS_DATUM, RWS_10) -> recent3

# Wrangle data frame with weather data to get last value
plausible <- recent3 %>% group_by(STATIONS_ID) %>% 
  summarise(sum = sum(RWS_10)) %>% 
  filter(sum > 0)

if (sum(results$stations_id %in% plausible$STATIONS_ID) != 0) {
  
  for (i in seq_along(plausible$STATIONS_ID)) {
    
    id <- plausible$STATIONS_ID[i]
    
    results$time[results$stations_id == id] <- 
      format(Sys.Date(), format = "%d.%m.%Y")
    
    results$days[results$stations_id == id] <- 0
    
    results$niederschlag[results$stations_id == id] <- 
      plausible$sum[plausible$STATIONS_ID == id]
    
    rm(id)
    
  }
  
}

results <- results %>% arrange(desc(days))
results_regen <- results %>% arrange(desc(niederschlag))

What is left now is to create the final data set:

# Get names and station information
stationen2 <- stationen %>% filter(niederschlag == 1) %>% 
  select(-lufttemperatur, -niederschlag, -wind)

# Join weather dataset with stations dataset to get stations' names
results2 <- inner_join(results, stationen2, by = "stations_id") %>% 
  select(stationsname, days, niederschlag, time, stationshoehe) %>% 
  mutate(days = paste(days, "Tag/en")) %>% 
  rename(Station = stationsname, 
         `Letzter Niederschlag vor` = days,
         `Niederschlag (in Litern)` = niederschlag,
         `Letzter Niederschlag` = time,
         `Stationshöhe (Meter)` = stationshoehe)

if (dir.exists(here::here(path, "wetter")) == TRUE) {
  unlink(here::here(path, "wetter"), recursive = TRUE)
}

The result looks like this:

head(results2)
# A tibble: 6 x 5
  Station     `Letzter Niedersch… `Niederschlag (in… `Letzter Nieders…
  <chr>       <chr>                            <dbl> <chr>            
1 Dietenheim  8 Tag/en                           4   07.06.2021       
2 Ihringen    7 Tag/en                          13.9 08.06.2021       
3 Konstanz    7 Tag/en                          20.7 08.06.2021       
4 Stuttgart … 7 Tag/en                           0.7 08.06.2021       
5 Aulendorf-… 7 Tag/en                           0.8 08.06.2021       
6 Baden-Bade… 6 Tag/en                           2.1 09.06.2021       
# … with 1 more variable: Stationshöhe (Meter) <dbl>

As a data journalist, I sent these data to a Datawrapper chart (which is essentially a table) that displays all weather stations involved, the days since the last rainfall and the amount of rainfall the last time it rained:

# Capture output which tells the URL of the chart
capture.output(dw_data_to_chart(x = results2, chart_id = "abcdef"), 
               file = "/dev/null")

## Send data to datawrapper
log <- capture.output(dw_publish_chart(chart_id = "abcdef"))

On a side note, my R script triggers a python script on my Raspberry Pi if an error occurred and no valid URL was put out; the script will send an E-mail notifying me of the error:

# If there was no valid URL in the output, trigger python script.
if (grepl(pattern = "https://datawrapper.dwcdn.net/abcdef", log[6]) == FALSE) {

    system("python3 ~Documents/scripts/duerre.py")

} else {

    print(paste0(log[6], " //// ", Sys.time()))

}

This whole script runs on my Raspberry Pi 3 using a Cronjob.