Search for: All records

Award ID contains: 1838159

« Prev Next »

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Combining Satellite and Weather Data for Crop Type Mapping: An Inverse Modelling Approach

Ravirathinam, Praveen; Ghosh, Rahul; Khandelwal, Ankush; Jia, Xiaowei; Mulla, David; Kumar, Vipin (April 2024, SIAM)
Shekhar, Shashi; Papalexakis, Vagelis; Gao, Jing; Jiang, Zhe; Riondato, Matteo (Ed.)
Accurate and timely crop mapping is essential for yield estimation, insurance claims, and conservation efforts. Over the years, many successful machine learning models for crop mapping have been developed that use just the multispectral imagery from satellites to predict crop type over the area of interest. However, these traditional methods do not account for the physical processes that govern crop growth. At a high level, crop growth can be envisioned as physical parameters, such as weather and soil type, acting upon the plant, leading to crop growth which can be observed via satellites. In this paper, we propose a weather-based Spatio-Temporal segmentation network with ATTention (WSTATT), a deep learning model that leverages this understanding of crop growth by formulating it as an inverse model that combines weather (Daymet) and satellite imagery (Sentinel-2) to generate accurate crop maps. We show that our approach provides significant improvements over existing algorithms that solely rely on spectral imagery by comparing segmentation maps and F1 classification scores. Furthermore, effective use of attention in WSTATT architecture enables the detection of crop types earlier in the season (up to 5 months in advance), which is very useful for improving food supply projections. We finally discuss the impact of weather by correlating our results with crop phenology to show that WST
more » « less
Full Text Available
Uncertainty Quantification in Inverse Models in Hydrology

Chatterjee, Somya Sharma; Ghosh, Rahul; Renganathan, Arvind; Li, Xiang; Chatterjee, Snigdhansu; Nieber, John; Duffy, Christopher; Kumar, Vipin (August 2023, ACM)

In hydrology, modeling streamflow remains a challenging task due to the limited availability of basin characteristics information such as soil geology and geomorphology. These characteristics may be noisy due to measurement errors or may be missing altogether. To overcome this challenge, we propose a knowledge-guided, probabilistic inverse modeling method for recovering physical characteristics from streamflow and weather data, which are more readily available. We compare our framework with state-of-the-art inverse models for estimating river basin characteristics. We also show that these estimates offer improvement in streamflow modeling as opposed to using the original basin characteristic values. Our inverse model offers a 3% improvement in R2 for the inverse model (basin characteristic estimation) and 6% for the forward model (streamflow prediction). Our framework also offers improved explainability since it can quantify uncertainty in both the inverse and the forward model. Uncertainty quantification plays a pivotal role in improving the explainability of machine learning models by providing additional insights into the reliability and limitations of model predictions. In our analysis, we assess the quality of the uncertainty estimates. Compared to baseline uncertainty quantification methods, our framework offers a 10% improvement in the dispersion of epistemic uncertainty and a 13% improvement in coverage rate. This information can help stakeholders understand the level of uncertainty associated with the predictions and provide a more comprehensive view of the potential outcomes.
more » « less
Full Text Available
Spatiotemporal classification with limited labels using constrained clustering for large datasets

https://doi.org/10.1137/1.9781611977653.ch55

Ravirathinam, Praveen; Ghosh, Rahul; Wang, Ke; Xuan, Keyang; Khandelwal, Ankush; Dugan, Hilary; Hanson, Paul; Kumar, Vipin (April 2023, SIAM)
Shekhar, Shashi; Zhou, Zhi-Hua; Chiang, Yao-Yi; Stiglic, Gregor (Ed.)
Creating separable representations via representation learning and clustering is critical in analyzing large unstructured datasets with only a few labels. Separable representations can lead to supervised models with better classification capabilities and additionally aid in generating new labeled samples. Most unsupervised and semisupervised methods to analyze large datasets do not leverage the existing small amounts of labels to get better representations. In this paper, we propose a spatiotemporal clustering paradigm that uses spatial and temporal features combined with a constrained loss to produce separable representations. We show the working of this method on the newly published dataset ReaLSAT, a dataset of surface water dynamics for over 680,000 lakes across the world, making it an essential dataset in terms of ecology and sustainability. Using this large unlabelled dataset, we first show how a spatiotemporal representation is better compared to just spatial or temporal representation. We then show how we can learn even better representations using a constrained loss with few labels. We conclude by showing how our method, using few labels, can pick out new labeled samples from the unlabeled data, which can be used to augment supervised methods leading to better classification.
more » « less
ReaLSAT, a global dataset of reservoir and lake surface area variations

https://doi.org/10.1038/s41597-022-01449-5

Khandelwal, Ankush; Karpatne, Anuj; Ravirathinam, Praveen; Ghosh, Rahul; Wei, Zhihao; Dugan, Hilary A.; Hanson, Paul C.; Kumar, Vipin (December 2022, Scientific Data)

Abstract Lakes and reservoirs, as most humans experience and use them, are dynamic bodies of water, with surface extents that increase and decrease with seasonal precipitation patterns, long-term changes in climate, and human management decisions. This paper presents a new global dataset that contains the location and surface area variations of 681,137 lakes and reservoirs larger than 0.1 square kilometers (and south of 50 degree N) from 1984 to 2015, to enable the study of the impact of human actions and climate change on freshwater availability. Within its scope for size and region covered, this dataset is far more comprehensive than existing datasets such as HydroLakes. While HydroLAKES only provides a static shape, the proposed dataset also has a timeseries of surface area and a shapefile containing monthly shapes for each lake. The paper presents the development and evaluation of this dataset and highlights the utility of novel machine learning techniques in addressing the inherent challenges in transforming satellite imagery to dynamic global surface water maps.
more » « less
Full Text Available
Big data, data privacy, and plant and animal disease research using GEMS

https://doi.org/10.1002/agj2.20933

Senay, Senait D.; Shurson, Gerald C.; Cardona, Carol; Silverstein, Kevin A.T. (October 2022, Agronomy journal)

One of the major challenges in ensuring global food security is the ever-changing biotic risk affecting the productivity and efficiency of the global food supply system. Biotic risks that threaten food security include pests and diseases that affect pre- and postharvest terrestrial agriculture and aquaculture. Strategies to minimize this risk depend heavily on plant and animal disease research. As data collected at high spatial and temporal resolutions become increasingly available, epidemiological models used to assess and predict biotic risks have become more accurate and, thus, more useful. However, with the advent of Big Data opportunities, a number of challenges have arisen that limit researchers’ access to complex, multi-sourced, multi-scaled data collected on pathogens, and their associated environments and hosts. Among these challenges, one of the most limiting factors is data privacy concerns from data owners and collectors. While solutions, such as the use of de-identifying and anonymizing tools that protect sensitive information are recognized as effective practices for use by plant and animal disease researchers, there are comparatively few platforms that include data privacy by design that are accessible to researchers. We describe how the general thinking and design used for data sharing and analysis platforms can intrinsically address a number of these data privacy-related challenges that are a barrier to researchers wanting to access data. We also describe how some of the data privacy concerns confronting plant and animal disease researchers are addressed by way of the GEMS informatics platform.
more » « less
Full Text Available
CalCROP21: A Georeferenced multi-spectral dataset of Satellite Imagery and Crop Labels

https://doi.org/10.1109/BigData52589.2021.9671569

Ghosh, Rahul; Ravirathinam, Praveen; Jia, Xiaowei; Khandelwal, Ankush; Mulla, David; Kumar, Vipin (December 2021, 2021 IEEE International Conference on Big Data (Big Data))

Mapping and monitoring crops is a key step towards the sustainable intensification of agriculture and addressing global food security. A dataset like ImageNet that revolutionized computer vision applications can accelerate the development of novel crop mapping techniques. Currently, the United States Department of Agriculture (USDA) annually releases the Cropland Data Layer (CDL) which contains crop labels at 30m resolution for the entire United States of America. While CDL is state of the art and is widely used for a number of agricultural applications, it has a number of limitations (e.g., pixelated errors, labels carried over from previous years, and errors in the classification of minor crops). In this work, we create a new semantic segmentation benchmark dataset, which we call CalCROP21, for the diverse crops in the Central Valley region of California at 10m spatial resolution using a Google Earth Engine based robust image processing pipeline and a novel attention-based spatio-temporal semantic segmentation algorithm STATT. STATT uses re-sampled (interpolated) CDL labels for training but is able to generate a better prediction than CDL by leveraging spatial and temporal patterns in Sentinel2 multi-spectral image series to effectively capture phenologic differences amongst crops and uses attention to reduce the impact of clouds and other atmospheric disturbances. We also present a comprehensive evaluation to show that STATT has significantly better results when compared to the resampled CDL labels. We have released the dataset and the processing pipeline code for generating the benchmark dataset.
more » « less
Full Text Available
Attention-augmented Spatio-Temporal Segmentation for Land Cover Mapping

https://doi.org/10.1109/BigData52589.2021.9671974

Ghosh, Rahul; Ravirathinam, Praveen; Jia, Xiaowei; Lin, Chenxi; Jin, Zhenong; Kumar, Vipin (December 2021, 2021 IEEE International Conference on Big Data (Big Data))

The availability of massive earth observing satellite data provides huge opportunities for land use and land cover mapping. However, such mapping effort is challenging due to the existence of various land cover classes, noisy data, and the lack of proper labels. Also, each land cover class typically has its own unique temporal pattern and can be identified only during certain periods. In this article, we introduce a novel architecture that incorporates the UNet structure with a Bidirectional LSTM and Attention mechanism to jointly exploit the spatial and temporal nature of satellite data and to better identify the unique temporal patterns of each land cover class. We compare our method with other state-of-the-art methods both quantitatively and qualitatively on two real-world datasets which involve multiple land cover classes. We also visualize the attention weights to study its effectiveness in mitigating noise and in identifying discriminative time periods of different classes. The code and dataset used in this work are made publicly available for reproducibility.
more » « less
Full Text Available
Clustering Augmented Self-Supervised Learning: An Application to Land Cover Mapping

Ghosh, Rahul; Jia, Xiaowei; Chenxi, Lin; Jin, Zhenong; Kumar, Vipin (August 2021, DEEPSPATIA 2021: 2nd ACM SIGKDD Workshop on Deep Learning for Spatiotemporal Data, Applications, and Systems)
null (Ed.)
Collecting large annotated datasets in Remote Sensing is often expensive and thus can become a major obstacle for training advanced machine learning models. Common techniques of addressing this issue, based on the underlying idea of pre-training the Deep Neural Networks (DNN) on freely available large datasets, cannot be used for Remote Sensing due to the unavailability of such large-scale labeled datasets and the heterogeneity of data sources caused by the varying spatial and spectral resolution of different sensors. Self-supervised learning is an alternative approach that learns feature representation from unlabeled images without using any human annotations. In this paper, we introduce a new method for land cover mapping by using a clustering-based pretext task for self-supervised learning. We demonstrate the effectiveness of the method on two societally relevant applications from the aspect of segmentation performance, discriminative feature representation learning, and the underlying cluster structure. We also show the effectiveness of the active sampling using the clusters obtained from our method in improving the mapping accuracy given a limited budget of annotating.
more » « less
Full Text Available
Attention-augmented Spatio-Temporal Segmentation for Land Cover Mapping

Ghosh, Rahul; Ravirathinam, Praveen; Jia, Xiaowei; Lin, Chenxi; Lin, Zhenong; Kumar, Vipin (August 2021, DSSG-21: The 3rd Workshop on Data Science for Social Good)
null (Ed.)
The availability of massive earth observing satellite data provides huge opportunities for land use and land cover mapping. However, such mapping effort is challenging due to the existence of various land cover classes, noisy data, and the lack of proper labels. Also, each land cover class typically has its own unique temporal pattern and can be identified only during certain periods. In this article, we introduce a novel architecture that incorporates the UNet structure with Bidirectional LSTM and Attention mechanism to jointly exploit the spatial and temporal nature of satellite data and to better identify the unique temporal patterns of each land cover class. We compare our method with other state-of-the-art methods both quantitatively and qualitatively on two real-world datasets which involve multiple land cover classes. We also visualize the attention weights to study its effectiveness in mitigating noise and in identifying discriminative time periods of different classes. The code and dataset used in this work are made publicly available for reproducibility.
more » « less
Full Text Available
Model-agnostic Methods for Text Classification with Inherent Noise

https://doi.org/10.18653/v1/2020.coling-industry.19

Tayal, Kshitij; Ghosh, Rahul; Kumar, Vipin (December 2020, 28th International Conference on Computational Linguistics: Industry Track)
null (Ed.)
Text classification is a fundamental problem, and recently, deep neural networks (DNN) have shown promising results in many natural language tasks. However, their human-level performance relies on high-quality annotations, which are time-consuming and expensive to collect. As we move towards large inexpensive datasets, the inherent label noise degrades the generalization of DNN. While most machine learning literature focuses on building complex networks to handle noise, in this work, we evaluate model-agnostic methods to handle inherent noise in large scale text classification that can be easily incorporated into existing machine learning workflows with minimal interruption. Specifically, we conduct a point-by-point comparative study between several noise-robust methods on three datasets encompassing three popular classification models. To our knowledge, this is the first time such a comprehensive study in text classification encircling popular models and model-agnostic loss methods has been conducted. In this study, we describe our learning and demonstrate the application of our approach, which outperformed baselines by up to 10% in classification accuracy while requiring no network modifications.
more » « less
Full Text Available

« Prev Next »