Climate change causes severe disturbances in what we used to call a more or less stable climate. Data journalists are, thus, increasingly focusing on quantifying the effects of the climate crisis. This is an example on how to do so.
Climate change causes severe disturbances in what we used to call a more or less stable climate. Germany, just as many other countries around the globe, suffers from an increasing lack of rainfall, which in turn causes situations close to what is called a draught. Data journalists are, thus, increasingly focusing on quantifying the effects of the climate crisis. When I used to work as an Editor at Stuttgarter Zeitung, I contributed to this journalistic goal by writing and automating a script that would help us tell the readers which region of Baden-Württemberg, Germany, has suffered from the longest absence of rainfall.
It is a more or less easy way of automating reports on the climate’s effects on our weather. In the following, I demonstrate how my script looks like. The data come from the German National Weather Service (DWD, Deutscher Wetterdienst).
First, what you need is a bunch of packages:
library(pacman)
::p_load(rdwd, magrittr, dplyr, here, bit64, lubridate,
pacman readxl, purrr, backports, remotes)
::install_github("munichrocker/DatawRappr")
remoteslibrary(DatawRappr)
Second, you need a list of all the DWD stations (including their IDs) that provide the desired measures (in this case, amount of rainfall, but this could also be temperature, etc.):
# Set path
<- "_posts/2021-06-13-automated-reporting-on-lacking-rainfall-in-germany/"
path
# Get stations
<- readxl::read_excel(here::here(path, "stationen_bw.xlsx"))
stationen head(stationen)
# A tibble: 6 x 6
stations_id stationshoehe stationsname lufttemperatur wind
<dbl> <dbl> <chr> <dbl> <dbl>
1 11 680 Donaueschingen Lande… 0 1
2 1013 309 Dogern 0 1
3 1346 1490 Feldberg/Schwarzwald 0 1
4 1443 237 Freiburg 0 1
5 1468 797 Freudenstadt 0 1
6 1490 394 Friedrichshafen 0 1
# … with 1 more variable: niederschlag <dbl>
For an analysis on rainfall, we need to extract those stations that measure it:
# Get all stations
<- stationen$stations_id[stationen$niederschlag == 1] niederschlag
In case you want to automate this script, create a folder where you want to store the data that the {rdwd} package downloads:
# Check if folder for zip data exists & create it if necessary
if (dir.exists(here::here(path, "wetter")) == FALSE) {
dir.create(here::here(path, "wetter"))}
After these initial steps, we can start to download and process recent rainfall data from all stations involved. I download the data on a daily resolution first. Doing so, we can simply count the days since the last rainfall:
# Get URLs for DWD zip files
<- rdwd::selectDWD(id = niederschlag,
download res = "daily",
outvec = TRUE,
var = "more_precip",
per = "recent")
# Download actual zip files and extract data as lists
<- rdwd::dataDWD(url = download,
res1 dir = here::here(path, "wetter"),
force = TRUE,
quiet = TRUE,
overwrite = TRUE)
# Create general data frame from all lists
%>% purrr::map_dfr(as.data.frame) -> res2
res1 %>% filter(MESS_DATUM > as.POSIXct("2020-01-01")) -> res2
res2
# Process and find last day of niederschlag
<- res2 %>% group_by(STATIONS_ID) %>%
results mutate(MESS_DATUM = as.POSIXct(MESS_DATUM)) %>%
filter(RS > 0) %>%
summarise(niederschlag = last(RS), time = last(MESS_DATUM)) %>%
mutate(days = as.integer(Sys.Date() - as.Date(time))) %>%
arrange(desc(days)) %>%
rename(stations_id = STATIONS_ID) %>%
mutate(time = format(time, format = "%d.%m.%Y"))
# Check if folder with zip files exists and delete if present
if (dir.exists(here::here(path, "wetter")) == TRUE) {
unlink(here::here(path, "wetter"), recursive = TRUE)
}
What is necessary now is to check whether there has been rainfall on this present day. If so, we have to set the counter of days without rain to 0:
<- rdwd::selectDWD(id = niederschlag,
recent1 res = "10_minutes",
outvec = TRUE,
var = "precipitation",
per = "now")
if (dir.exists(here::here(path, "wetter")) == FALSE) {
dir.create(here::here(path, "wetter"))}
# Download actual zip files and extract data as lists
<- rdwd::dataDWD(url = recent1,
recent2 dir = here::here(path, "wetter"),
force = TRUE,
quiet = TRUE,
overwrite = TRUE)
# Create general data frame from all lists
%>% purrr::map_dfr(as.data.frame) -> recent3
recent2
# Delete unused columns and delete all data before yesterday to prevent errors
%>% filter(MESS_DATUM > Sys.Date() - 1) %>%
recent3 ::select(STATIONS_ID, MESS_DATUM, RWS_10) -> recent3
dplyr
# Wrangle data frame with weather data to get last value
<- recent3 %>% group_by(STATIONS_ID) %>%
plausible summarise(sum = sum(RWS_10)) %>%
filter(sum > 0)
if (sum(results$stations_id %in% plausible$STATIONS_ID) != 0) {
for (i in seq_along(plausible$STATIONS_ID)) {
<- plausible$STATIONS_ID[i]
id
$time[results$stations_id == id] <-
resultsformat(Sys.Date(), format = "%d.%m.%Y")
$days[results$stations_id == id] <- 0
results
$niederschlag[results$stations_id == id] <-
results$sum[plausible$STATIONS_ID == id]
plausible
rm(id)
}
}
<- results %>% arrange(desc(days))
results <- results %>% arrange(desc(niederschlag)) results_regen
What is left now is to create the final data set:
# Get names and station information
<- stationen %>% filter(niederschlag == 1) %>%
stationen2 select(-lufttemperatur, -niederschlag, -wind)
# Join weather dataset with stations dataset to get stations' names
<- inner_join(results, stationen2, by = "stations_id") %>%
results2 select(stationsname, days, niederschlag, time, stationshoehe) %>%
mutate(days = paste(days, "Tag/en")) %>%
rename(Station = stationsname,
`Letzter Niederschlag vor` = days,
`Niederschlag (in Litern)` = niederschlag,
`Letzter Niederschlag` = time,
`Stationshöhe (Meter)` = stationshoehe)
if (dir.exists(here::here(path, "wetter")) == TRUE) {
unlink(here::here(path, "wetter"), recursive = TRUE)
}
The result looks like this:
head(results2)
# A tibble: 6 x 5
Station `Letzter Niedersch… `Niederschlag (in… `Letzter Nieders…
<chr> <chr> <dbl> <chr>
1 Dietenheim 8 Tag/en 4 07.06.2021
2 Ihringen 7 Tag/en 13.9 08.06.2021
3 Konstanz 7 Tag/en 20.7 08.06.2021
4 Stuttgart … 7 Tag/en 0.7 08.06.2021
5 Aulendorf-… 7 Tag/en 0.8 08.06.2021
6 Baden-Bade… 6 Tag/en 2.1 09.06.2021
# … with 1 more variable: Stationshöhe (Meter) <dbl>
As a data journalist, I sent these data to a Datawrapper chart (which is essentially a table) that displays all weather stations involved, the days since the last rainfall and the amount of rainfall the last time it rained:
# Capture output which tells the URL of the chart
capture.output(dw_data_to_chart(x = results2, chart_id = "abcdef"),
file = "/dev/null")
## Send data to datawrapper
<- capture.output(dw_publish_chart(chart_id = "abcdef")) log
On a side note, my R script triggers a python script on my Raspberry Pi if an error occurred and no valid URL was put out; the script will send an E-mail notifying me of the error:
# If there was no valid URL in the output, trigger python script.
if (grepl(pattern = "https://datawrapper.dwcdn.net/abcdef", log[6]) == FALSE) {
system("python3 ~Documents/scripts/duerre.py")
else {
}
print(paste0(log[6], " //// ", Sys.time()))
}
This whole script runs on my Raspberry Pi 3 using a Cronjob.