Abstract Dissolved oxygen (DO) is a critical water quality constituent that governs habitat suitability for aquatic biota, biogeochemical reactions and solubility of metals in streams. Recently introduced high‐frequency sensors have increased our ability to measure DO, but we still lack the capacity to understand and predict DO concentrations at high spatial resolutions or in unmonitored locations. Machine learning (ML) has been a commonly used approach for modelling DO, however, conventional ML models have no representation of the limnological processes governing DO dynamics. Here we implement and evaluate two process‐guided deep learning (PGDL) approaches for predicting daily minimum, mean and maximum DO concentrations in rivers from the Delaware River Basin, USA. In both cases, a multi‐task approach was taken in which the PGDL models predicted stream metabolism and gas exchange rates in addition to the DO concentrations themselves. Our results showed that for these sites, the PGDL approaches did not improve upon baseline predictions in temporal and spatially similar holdout experiments. One of the approaches did, however, improve predictions when applied to spatially dissimilar sites. Although this particular PGDL approach did not improve predictive accuracy in most cases, our results suggest that process guidance, perhaps a more constrained approach, could benefit a data‐driven DO model.
Cyberinfrastructure needs to be advanced to enable open and reproducible environmental modeling research. Recent efforts toward this goal have focused on advancing online repositories for data and model sharing, online computational environments along with containerization technology and notebooks for capturing reproducible computational studies, and Application Programming Interfaces (APIs) for simulation models to foster intuitive programmatic control. The objective of this research is to show how these efforts can be integrated to support reproducible environmental modeling. We present first the high-level concept and general approach for integrating these three components. We then present one possible implementation that integrates HydroShare (an online repository), CUAHSI JupyterHub and CyberGIS-Jupyter for Water (computational environments), and pySUMMA (a model API) to support open and reproducible hydrologic modeling. We apply the example implementation for a hydrologic modeling use case to demonstrate how the approach can advance reproducible environmental modeling through the seamless integration of cyberinfrastructure services.
Abstract As lake and reservoir ecosystems transition across major environmental regimes (e.g., mixing regime) resulting from anthropogenic change, setting predictive expectations is imperative. We tested the hypothesis that (dissolved) oxygen is more predictable in monomictic reservoirs that thermally stratify throughout the summer (warm) season compared to polymictic reservoirs that stratify intermittently. Using two‐hourly vertical profiles of oxygen, we compared daily‐aggregated errors of oxygen predictions from random forests across and within two monomictic and two polymictic reservoirs in the south‐central (subtropical) USA. Although one monomictic reservoir was typically more predictable than the polymictic reservoirs, the hypereutrophic, small monomictic reservoir had less predictable oxygen patterns potentially related to rapid oxygen cycling and intrusions of oxygenated waters in the hypolimnion without mixing. Daily mixing did not relate strongly to model errors. Water temperature, depth, and wind were the most important predictors, but were not clearly related to season or mixing. Lastly, we compared multiple model types (regression, neural network, and process‐based) in one polymictic reservoir to test how our interpretations of oxygen predictability were sensitive to model type, finding that the models generally agreed; however, the process‐based model poorly predicted oxygen in the middle of the vertical profiles (5 m) where most models performed poorly due to a temporally unstable, vacillating metalimnion. Our results suggest predicting reservoir oxygen dynamics may be easier in stratified reservoirs, but eutrophication and complex hydrodynamics may cause forecasting surprises especially for those who use or manage water resources in mono‐ or dimictic reservoirs.
The Consortium of Universities for the Advancement of Hydrologic Science Inc. (CUAHSI) hydrologic information system (HIS) is a widely used service oriented system for time series data management. While this system is intended to empower the hydrologic sciences community with better data storage and distribution, it lacks support for the kind of ‘Web 2.0’ collaboration and social-networking capabilities being used in other fields. This paper presents the design, development, and testing of a software extension of CUAHSI's newest product, HydroShare. The extension integrates the existing CUAHSI HIS into HydroShare's social hydrology architecture. With this extension, HydroShare provides integrated HIS time series with efficient archiving, discovery, and retrieval of the data, extensive creator and science metadata, scientific discussion and collaboration around the data and other basic social media features. HydroShare provides functionality for online social interaction and collaboration while the existing HIS provides the distributed data management and web services framework. The extension is expected to enable scientists to access and share both national- and laboratory-scale hydrologic time series datasets in a standards-based web services architecture combined with social media functionality developed specifically for the hydrologic sciences.