skip to main content

Search for: All records

Creators/Authors contains: "Kumar, Vipin"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available August 1, 2023
  2. Machine Learning is beginning to provide state-of-the-art performance in a range of environmental applications such as streamflow prediction in a hydrologic basin. However, building accurate broad-scale models for streamflow remains challenging in practice due to the variability in the dominant hydrologic processes, which are best captured by sets of process-related basin characteristics. Existing basin characteristics suffer from noise and uncertainty, among many other things, which adversely impact model performance. To tackle the above challenges, in this paper, we propose a novel Knowledge-guided Self-Supervised Learning (KGSSL) inverse framework to extract system characteristics from driver(input) and response(output) data. This first-of-its-kind framework achieves robust performance even when characteristics are corrupted or missing. We evaluate the KGSSL framework in the context of stream flow modeling using CAMELS (Catchment Attributes and MEteorology for Large-sample Studies) which is a widely used hydrology benchmark dataset. Specifically, KGSSL outperforms baseline by 16% in predicting missing characteristics. Furthermore, in the context of forward modelling, KGSSL inferred characteristics provide a 35% improvement in performance over a standard baseline when the static characteristic are unknown.
    Free, publicly-accessible full text available August 14, 2023
  3. ABSTRACT

    The star-forming activity in the H ii region RCW 42 is investigated using multiple wavebands, from near-infrared to radio wavelengths. Located at a distance of 5.8 kpc, this southern region has a bolometric luminosity of 1.8 × 106 L⊙. The ionized gas emission has been imaged at low radio frequencies of 610 and 1280 MHz using the Giant Metrewave Radio Telescope, India, and shows a large expanse of the H ii region, spanning 20 × 15 pc2. The average electron number density in the region is estimated to be ∼70 cm−3, which suggests an average ionization fraction of the cloud to be 11 % . An extended green object EGO G274.0649-01.1460 and several young stellar objects have been identified in the region using data from the 2MASS and Spitzer surveys. The dust emission from the associated molecular cloud is probed using Herschel Space Telescope, which reveals the presence of five clumps, C1-C5, in this region. Two millimetre emission cores of masses 380 and 390 M⊙ towards the radio emission peak have been identified towards C1 from the ALMA map at 1.4 mm. The clumps are investigated for their evolutionary stages based on association with various star-formation tracers, and we find that all the clumps are in active/evolved stage.

  4. Abstract

    Lakes and reservoirs, as most humans experience and use them, are dynamic bodies of water, with surface extents that increase and decrease with seasonal precipitation patterns, long-term changes in climate, and human management decisions. This paper presents a new global dataset that contains the location and surface area variations of 681,137 lakes and reservoirs larger than 0.1 square kilometers (and south of 50 degree N) from 1984 to 2015, to enable the study of the impact of human actions and climate change on freshwater availability. Within its scope for size and region covered, this dataset is far more comprehensive than existing datasets such as HydroLakes. While HydroLAKES only provides a static shape, the proposed dataset also has a timeseries of surface area and a shapefile containing monthly shapes for each lake. The paper presents the development and evaluation of this dataset and highlights the utility of novel machine learning techniques in addressing the inherent challenges in transforming satellite imagery to dynamic global surface water maps.

  5. There is a growing consensus that solutions to complex science and engineering problems require novel methodologies that are able to integrate traditional physics-based modeling approaches with state-of-the-art machine learning (ML) techniques. This paper provides a structured overview of such techniques. Application-centric objective areas for which these approaches have been applied are summarized, and then classes of methodologies used to construct physics-guided ML models and hybrid physics-ML frameworks are described. We then provide a taxonomy of these existing techniques, which uncovers knowledge gaps and potential crossovers of methods between disciplines that can serve as ideas for future research.
    Free, publicly-accessible full text available January 3, 2023
  6. Mapping and monitoring crops is a key step towards the sustainable intensification of agriculture and addressing global food security. A dataset like ImageNet that revolutionized computer vision applications can accelerate the development of novel crop mapping techniques. Currently, the United States Department of Agriculture (USDA) annually releases the Cropland Data Layer (CDL) which contains crop labels at 30m resolution for the entire United States of America. While CDL is state of the art and is widely used for a number of agricultural applications, it has a number of limitations (e.g., pixelated errors, labels carried over from previous years, and errors in the classification of minor crops). In this work, we create a new semantic segmentation benchmark dataset, which we call CalCROP21, for the diverse crops in the Central Valley region of California at 10m spatial resolution using a Google Earth Engine based robust image processing pipeline and a novel attention-based spatio-temporal semantic segmentation algorithm STATT. STATT uses re-sampled (interpolated) CDL labels for training but is able to generate a better prediction than CDL by leveraging spatial and temporal patterns in Sentinel2 multi-spectral image series to effectively capture phenologic differences amongst crops and uses attention to reduce the impact of cloudsmore »and other atmospheric disturbances. We also present a comprehensive evaluation to show that STATT has significantly better results when compared to the resampled CDL labels. We have released the dataset and the processing pipeline code for generating the benchmark dataset.« less
    Free, publicly-accessible full text available December 15, 2022
  7. The availability of massive earth observing satellite data provides huge opportunities for land use and land cover mapping. However, such mapping effort is challenging due to the existence of various land cover classes, noisy data, and the lack of proper labels. Also, each land cover class typically has its own unique temporal pattern and can be identified only during certain periods. In this article, we introduce a novel architecture that incorporates the UNet structure with a Bidirectional LSTM and Attention mechanism to jointly exploit the spatial and temporal nature of satellite data and to better identify the unique temporal patterns of each land cover class. We compare our method with other state-of-the-art methods both quantitatively and qualitatively on two real-world datasets which involve multiple land cover classes. We also visualize the attention weights to study its effectiveness in mitigating noise and in identifying discriminative time periods of different classes. The code and dataset used in this work are made publicly available for reproducibility.
    Free, publicly-accessible full text available December 15, 2022
  8. Free, publicly-accessible full text available April 1, 2023
  9. Collecting large annotated datasets in Remote Sensing is often expensive and thus can become a major obstacle for training advanced machine learning models. Common techniques of addressing this issue, based on the underlying idea of pre-training the Deep Neural Networks (DNN) on freely available large datasets, cannot be used for Remote Sensing due to the unavailability of such large-scale labeled datasets and the heterogeneity of data sources caused by the varying spatial and spectral resolution of different sensors. Self-supervised learning is an alternative approach that learns feature representation from unlabeled images without using any human annotations. In this paper, we introduce a new method for land cover mapping by using a clustering-based pretext task for self-supervised learning. We demonstrate the effectiveness of the method on two societally relevant applications from the aspect of segmentation performance, discriminative feature representation learning, and the underlying cluster structure. We also show the effectiveness of the active sampling using the clusters obtained from our method in improving the mapping accuracy given a limited budget of annotating.