Estimating nitrogen and phosphorus concentrations in streams and rivers across the contiguous United States: a machine learning framework

Department of Zoology, University of Cambridge, Cambridge, United Kingdom
School of Forestry & Environmental Studies - Center for Research Computing, Yale University, New Haven, CT, United States
Spatial Ecology, Redruth, UK
School of Forestry & Environmental Studies, Yale University, New Haven, CT, United States
Department of Ecosystem Research, Leibniz-Institute of Freshwater Ecology and Inland Fisheries, Berlin, Germany
DOI
10.7287/peerj.preprints.27585v1
Subject Areas
Biochemistry, Freshwater Biology, Aquatic and Marine Chemistry, Environmental Contamination and Remediation, Spatial and Geographic Information Science
Keywords
Nitrogen, Phosphorus, freshwater quality, freshwater biochemistry, nutrients, machine learning
Copyright
© 2019 Shen et al.
Licence
This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Cite this article
Shen L, Amatulli G, Sethi T, Raymond P, Domisch S. 2019. Estimating nitrogen and phosphorus concentrations in streams and rivers across the contiguous United States: a machine learning framework. PeerJ Preprints 7:e27585v1

Abstract

Nitrogen (N) and Phosphorus (P) are essential nutrients for life processes in water bodies but in excessive quantities, they are a significant source of aquatic pollution. Eutrophication has now become widespread due to such an imbalance, and is largely attributed to anthropogenic activity. In view of this phenomenon, we present a new dataset and statistical method for estimating and mapping elemental and compound con- centrations of N and P at a resolution of 30 arc-seconds (∼1 km) for the conterminous US. The model is based on a Random Forest (RF) machine learning algorithm that was fitted with environmental variables and seasonal N and P concentration observations from 230,000 stations spanning across US stream networks. Accounting for spatial and temporal variability offers improved accuracy in the analysis of N and P cycles. The algorithm has been validated with an internal and external validation procedure that is able to explain 70-83% of the variance in the model. The dataset is ready for use as input in a variety of environmental models and analyses, and the methodological framework can be applied to large-scale studies on N and P pollution, which include water quality, species distribution and water ecology research worldwide.

Author Comment

Preprint of a manuscript submitted to Nature Scientific Data

Estimation of Nitrogen (N) and Phosphorus (P) concentration at a resolution of 30 arc-seconds (∼1 km) for the conterminous US. Assess predictors importance and spatial correlation among N & P vs agriculture and urban land cover.