Improving the robustness of beach water quality modeling using an ensemble machine learning approach Journal Article


Authors: Wang, L.; Zhu, Z.; Sassoubre, L.; Yu, G.; Liao, C.; Hu, Q.; Wang, Y.
Article Title: Improving the robustness of beach water quality modeling using an ensemble machine learning approach
Abstract: Microbial pollution of beach water can expose swimmers to harmful pathogens. Predictive modeling provides an alternative method for beach management that addresses several limitations associated with traditional culture-based methods of assessing water quality. Widely-used machine learning methods often suffer from high variability in performance from one year or beach to another. Therefore, the best machine learning method varies between beaches and years, making method selection difficult. This study proposes an ensemble machine learning approach referred to as model stacking that has a two-layered learning structure, where the outputs of five widely-used individual machine learning models (multiple linear regression, partial least square, sparse partial least square, random forest, and Bayesian network) are taken as input features for another model that produces the final prediction. Applying this approach to three beaches along eastern Lake Erie, New York, USA, we show that generally the model stacking approach was able to generate reliably good predictions compared to all of the five base models. The accuracy rankings of the stacking model consistently stayed 1st or 2nd every year, with yearly-average accuracy of 78%, 81%, and 82.3% at the three studied beaches, respectively. This study highlights the value of the model stacking approach in predicting beach water quality and solving other pressing environmental problems. Ā© 2020 Elsevier B.V.
Keywords: prediction; e. coli; escherichia coli; forecasting; new york; feces; decision trees; seashore; water quality; linear regression; machine learning; water pollution; human; article; bayesian networks; random forest; lakes; machine learning approaches; machine learning methods; partial least square (pls); bayesian network; predictive analytics; machine learning models; fecal indicator bacteria; machine learning model; model stacking; beaches; beach water qualities; environmental problems; multiple linear regressions; sparse partial least squares; least square analysis; indicator indicator
Journal Title: Science of the Total Environment
Volume: 765
ISSN: 0048-9697
Publisher: Elsevier B.V.  
Date Published: 2021-04-15
Start Page: 142760
Language: English
DOI: 10.1016/j.scitotenv.2020.142760
PUBMED: 33131841
PROVIDER: scopus
DOI/URL:
Notes: Article -- Export Date: 1 March 2021 -- Source: Scopus
Altmetric
Citation Impact
BMJ Impact Analytics
MSK Authors
  1. Chen Liao
    19 Liao