Soil health has gained increasing attention under the rapid development of industrialization and the requirement for green agriculture. Therefore, up-to-date soil information related to soil health is urgently needed to ensure food security and biodiversity protection. Previous studies have shown the potential of proximal soil sensing in measuring soil information, while it remains challenging to get cost-efficient and robust estimates of multiple soil health indicators simultaneously via sensor fusion. In this study, we investigated the potential of visible near-infrared (vis-NIR), and mid-infrared (MIR) spectroscopy as well as three model averaging methods in predicting three soil health properties, including soil organic matter (SOM), pH, cation exchange capacity (CEC). The model averaging methods are not only used for model fusion but also for high-level sensor fusion, which include Granger-Ramanathan (GR), Bayesian Model Averaging and Spectral-Guided Ensemble Modelling (S-GEM). Here, S-GEM is a recently proposed algorithm that can improve soil spectroscopic prediction by including spectral information in ensemble modelling. Four widely used prediction models were evaluated, including partial least square regression, Cubist, memory based learning and convolutional neural network. For SOM, sensor fusion based on model averaging algorithms was comparable to that of Sensorsingle + Modelmultiple (MIR singly based on S-GEM) with R2 of 0.86. However, MIR only with S-GEM performed the best among all methods (LCCC of 0.92, RMSE of 3.66 g kg−1 and RPIQ of 3.68). The 10-fold cross-validation results indicated that Sensorsingle + Modelmultiple (MIR singly based on S-GEM) performed best among all methods for pH, with R2 of 0.84, LCCC of 0.90, RMSE of 0.45 and RPIQ of 3.65. For CEC, Sensormultiple + Modelmultiple based on GR performed best with R2 of 0.66, LCCC of 0.80, RMSE of 3.48 cmol + kg−1 and RPIQ was 2.22. Our results also showed that sensor fusion failed to improve spectral prediction of soil information when the performance among sensors differed a lot (△R2 > 0.2), and the use of a single best sensor is therefore suggested in this case. When the sensors have a close model performance (△R2 < 0.2), Sensormultiple + Modelmultiple based on GR was recommended. The outcome of this study can provide a reference for determining the validity domain of sensor fusion methods in improving the accuracy of soil health prediction.
Abstract Various machine‐learning models have been extensively applied to predict soil properties using infrared spectroscopy. Beyond the interpretability and transparency of these models, there is an ongoing discussion on the reliability of the prediction of soil properties generated from soil spectra. In this review, we contribute to this discussion by advocating for the integration of soil knowledge into machine‐learning models. By doing so, researchers can delve deeper into the underlying soil constituents, ultimately enhancing prediction accuracy. Our review explores the soil information present in spectral data, the fallacy of model interpretability, methods to incorporate soil knowledge into machine‐learning techniques, and the ways in which machine learning and soil spectroscopy can assist soil science. The combination of machine learning and domain knowledge is recommended to develop more meaningful models for predicting soil properties within the field of soil science.
Pedology focuses on understanding soil genesis in the field and includes soil classification and mapping. Digital soil mapping (DSM) has evolved from traditional soil classification and mapping to the creation and population of spatial soil information systems by using field and laboratory observations coupled with environmental covariates. Pedological knowledge of soil distribution and processes can be useful for digital soil mapping. Conversely, digital soil mapping can bring new insights to pedogenesis, detailed information on vertical and lateral soil variation, and can generate research questions that were not considered in traditional pedology. This review highlights the relevance and synergy of pedology in soil spatial prediction through the expansion of pedological knowledge. We also discuss how DSM can support further advances in pedology through improved representation of spatial soil information. Some major findings of this review are as follows: (a) soil classes can be mapped accurately using DSM, (b) the occurrence and thickness of soil horizons, whole soil profiles and soil parent material can be predicted successfully with DSM techniques, (c) DSM can provide valuable information on pedogenic processes (e.g. addition, removal, transformation and translocation), (d) pedological knowledge can be incorporated into DSM, but DSM can also lead to the discovery of knowledge, and (e) there is the potential to use process‐based soil–landscape evolution modelling in DSM. Based on these findings, the combination of data‐driven and knowledge‐based methods promotes even greater interactions between pedology and DSM. Highlights Demonstrates relevance and synergy of pedology in soil spatial prediction, and links pedology and DSM. Indicates the successful application of DSM in mapping soil classes, profiles, pedological features and processes. Shows how DSM can help in forming new hypotheses and gaining new insights about soil and soil processes. Combination of data‐driven and knowledge‐based methods recommended to promote greater interactions between DSM and pedology.