HGS RESEARCH HIGHLIGHT – Predicting Watershed Scale Surface Water Quality Targets With a Combined Fully-Integrated Groundwater-Surface Water Model and Machine Learning Approach
Frey, S.K., Shamalisham, N., Khader, O., Lapen, D.R., Russell, H.A.J., Stonebridge, G., Erler, A.R. & Sudicky, E.A. Predicting Watershed Scale Surface Water Quality Targets With a Combined Fully-Integrated Groundwater-Surface Water Model and Machine Learning Approach. Poster presented at: The 2021 American Geophysical Union Fall Meeting; December 15, 2021; online.
Did you miss us at the annual AGU Fall Meeting back in December? Don’t worry, our poster presentation is still available for viewing!
The poster highlights some very interesting research at the nexus of physics based integrated hydrologic modelling and machine learning/artificial intelligence techniques. Here the authors have paired a HydroGeoSphere model of the South Nation Watershed (SNW) with a Random Forest (RF) algorithm trained to predict spatially varying concentrations of nitrate and E. Coli throughout the watershed. For a completely novel approach toward large scale water quality prediction, the results were very encouraging!
Abstract:
Predicting surface water quality at large scales with numerical flow and transport models is difficult, in part because of challenges with: i) solving the advection-dispersion equation on relatively coarse finite element meshes, ii) defining source-loading boundary conditions, and iii) establishing initial concentration conditions. However, relationships between hydrologic conditions and water quality risks are well established, suggesting that water flow solutions from numerical hydrologic models can provide indirect insight on water quality. In this work, a fully-integrated groundwater – surface water (GW–SW) model of a 3830 km2 mixed use watershed in eastern Canada is used with a Random Forest (RF) machine learning (ML) model to predict nitrate and Escherichia coli (E. Coli) concentrations at 24 stream monitoring locations. The GW–SW model is constructed with seven subsurface layers (three soil, three Quaternary, and one bedrock), Strahler order 2 (and greater) rivers and streams, and horizontal spatial resolution that varies from 100 to 300 m. The GW-SW model was validated for its ability to reproduce daily surface water flow and groundwater levels over the 2003 to 2018 interval. GW–SW model output, weather variables, physiographic and landcover data, seasonality, and 3300 water quality measurements were assembled into a database that consisted of 81 preliminary training features for the RF model. After feature importance was assessed, 16 of the original features were retained for final RF model training, of which seven were derived from the GW–SW model for both the nitrate and E. Coli RF models, including those related to surface water flow, soil moisture, and groundwater. For RF model training, 80% of the water quality observation data were used, while the remaining 20% were used for validation. The nitrate RF model performance during training and validation yielded respective R2 values of 0.97 and 0.73. In comparison, the E. Coli RF model provided training and validation R2 values of 0.94 and 0.51, respectively. This work demonstrates a novel and robust approach towards large scale prediction of spatially varying water quality indicators via a combination of fully-integrated GW–SW modeling and ML.