Abstract Spatial information on soil organic carbon (SOC) in the Indian Himalayan region is vital for sustainable land management and conser
Abstract Spatial information on soil organic carbon (SOC) in the Indian Himalayan region is vital for sustainable land management and conservation efforts, as it helps to identify areas vulnerable to erosion and degradation. However, the difficult Himalayan terrain is a major roadblock to the very high spatial resolution mapping of SOC. Previous studies mapped SOC in Himalayan region with resolutions varying from 250 to 90 m. The increase in spatial resolution definitely enhances the data quality and supports towards decision making for sustainable soil management. The present study tried to overcome this challenge and mapping of SOC was done at a resolution of 30 m by integrating various machine learning (ML) techniques i.e. random forest regression (RF), support vector regression (SVR) and extreme gradient boosting (XGB).Surface soil samples were strategically collected from 421 georeferenced locations representing the dominant elevation zones, geology and land use land cover (LULC) types to develop spatial models for predicting SOC. Environmental covariates representing various pedogenic factors, namely climatic variables, terrain attributes, spectral indices, LULC as well as lithological information were generated using diverse data sources employing geospatial data analysis and google earth engine (GEE) platform. A feature ranking and variable selection protocol was used for the selection of optimal set of covariates prior to model development. The three validated models were used for mapping the spatial distribution of SOC in the study area. The pixel wise uncertainty in SOC prediction by different models were also spatially mapped by generating the lower (LL) and upper limits (UL) of the 90% prediction interval. Results revealed that the RF model (R 2 train: 0.92, R 2 test: 0.72) performed better compared to XGB (R 2 train: 0.59, R 2 test: 0.37) and SVR (R 2 train: 0.53, R 2 test: 0.35) models during both training and testing phases indicated by various evaluation metrics. Covariates representing vegetation, climate as well as topography were found to equally dominate (03 nos. each) among the first 10 important predictors governing the better prediction performance of RF model. The SOC maps indicated soils in the north, north-east and north-western parts of the study area were exhibited comparatively higher SOC contents than rest of the study area. Among the predominant land use types, evergreen forest was found to exhibit the highest SOC values (2.88%) compared to others. The study demonstrated the potential of digital soil mapping (DSM) techniques, enhanced by RS and ML, for mapping SOC in the hilly tracts of Himalayas with a 30 m spatial resolution. This detailed database can prove beneficial in devising effective land management and resource conservation strategies in the fragile mountain ecosystems of Indian Himalayas.