Supporting tools for the integrated management of drinking water reservoirs contaminated by Cyanobacteria and cyanotoxins

Progress in testing machine learning methods for predicting algal blooms in Lake Erken

Bloowater project partner Uppsala University presented some of their progress in testing machine learning methods for predicting algal blooms in Lake Erken. 

They presented the work at the 2nd Workshop on Knowledge Guided Machine Learning (KGML2021), a three-day virtual workshop held on August 9-11, 2021

Chlorophyll prediction in Lake Erken via a two-step data-driven approach based on the machine-learning model (GRB) and the deep learning model (LSTM)

 By Lin S, Pierson D, Mesman J


The development of harmful algal bloom is a highly interactive process, involving multiple environmental factors and complex biogeochemical processes. As a first step in developing methods to predict harmful algal blooms, we tested several approaches to predicting seasonal changes in chlorophyll concentration using data from Lake Erken, Sweden. A two-step approach was developed and applied using two different machine learning approaches: a Gradient Boost Regressor (GBR); and, a deep learning model Long-short Term Memory Recurrent Neural Network (LSTM).

In the first step, lake nutrient concentrations used for algal bloom prediction were estimated using both GBR and LSTM models, respectively, based on daily environmental factors, including meteorological data, thermal structure data generated by process-based model, ice duration, time from ice off date and inflow data.

In the next step, an input dataset, including the daily meteorological and hydrothermal data and the pre-generated daily lake nutrient concentrations, were fed into GBR and LSTM models to predict chlorophyll concentration in surface waters.

Both models were trained and cross-validated with the data from 1999 to 2014 and tested with data from 2015 to 2020. The timing of peaks in chlorophyll concentration was well predicted but with underestimation in magnitude using the GBR model.

In contrast, LSTM model predicted more accurate bloom magnitudes, but in 2020 the forecasted spring bloom was ahead of actual time, presumably resulting from the abnormal ice duration in this year.

Both data-driven models based on this two-step approach have shown comparable accuracy with the process-based model (i.e., GOTM-SELMA).

We expect this approach, which includes a step to predict lake nutrients conditions will be valuable for algal bloom prediction in the lakes with less nutrient monitoring.