This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
Cite this article
Shen L, Amatulli G, Sethi T, Raymond P, Domisch S.2019. Estimating nitrogen and phosphorus concentrations in streams and rivers across the contiguous United States: a machine learning framework. PeerJ Preprints7:e27585v1https://doi.org/10.7287/peerj.preprints.27585v1
Nitrogen (N) and Phosphorus (P) are essential nutrients for life processes in water bodies but in excessive quantities, they are a significant source of aquatic pollution. Eutrophication has now become widespread due to such an imbalance, and is largely attributed to anthropogenic activity. In view of this phenomenon, we present a new dataset and statistical method for estimating and mapping elemental and compound con- centrations of N and P at a resolution of 30 arc-seconds (∼1 km) for the conterminous US. The model is based on a Random Forest (RF) machine learning algorithm that was fitted with environmental variables and seasonal N and P concentration observations from 230,000 stations spanning across US stream networks. Accounting for spatial and temporal variability offers improved accuracy in the analysis of N and P cycles. The algorithm has been validated with an internal and external validation procedure that is able to explain 70-83% of the variance in the model. The dataset is ready for use as input in a variety of environmental models and analyses, and the methodological framework can be applied to large-scale studies on N and P pollution, which include water quality, species distribution and water ecology research worldwide.
Preprint of a manuscript submitted to Nature Scientific Data
Estimation of Nitrogen (N) and Phosphorus (P) concentration at a resolution of 30 arc-seconds (∼1 km) for the conterminous US. Assess predictors importance and spatial correlation among N & P vs agriculture and urban land cover.