skip to main content


Title: CalCROP21: A Georeferenced multi-spectral dataset of Satellite Imagery and Crop Labels
Mapping and monitoring crops is a key step towards the sustainable intensification of agriculture and addressing global food security. A dataset like ImageNet that revolutionized computer vision applications can accelerate the development of novel crop mapping techniques. Currently, the United States Department of Agriculture (USDA) annually releases the Cropland Data Layer (CDL) which contains crop labels at 30m resolution for the entire United States of America. While CDL is state of the art and is widely used for a number of agricultural applications, it has a number of limitations (e.g., pixelated errors, labels carried over from previous years, and errors in the classification of minor crops). In this work, we create a new semantic segmentation benchmark dataset, which we call CalCROP21, for the diverse crops in the Central Valley region of California at 10m spatial resolution using a Google Earth Engine based robust image processing pipeline and a novel attention-based spatio-temporal semantic segmentation algorithm STATT. STATT uses re-sampled (interpolated) CDL labels for training but is able to generate a better prediction than CDL by leveraging spatial and temporal patterns in Sentinel2 multi-spectral image series to effectively capture phenologic differences amongst crops and uses attention to reduce the impact of clouds and other atmospheric disturbances. We also present a comprehensive evaluation to show that STATT has significantly better results when compared to the resampled CDL labels. We have released the dataset and the processing pipeline code for generating the benchmark dataset.  more » « less
Award ID(s):
1838159
NSF-PAR ID:
10346433
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
2021 IEEE International Conference on Big Data (Big Data)
Page Range / eLocation ID:
1625 to 1632
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. GIS data layer on crop field boundary has many applications in agricultural research, ecosystem study, crop monitoring, and land management. Crop field boundary mapping through field survey is not time and cost effective for vast agriculture areas. Onscreen digitization on fine-resolution satellite image is also labor-intensive and error-prone. The recent development in image segmentation based on their spectral characteristics is promising for cropland boundary detection. However, processing of large volume multi-band satellite images often required high-performance computation systems. This study utilized crop rotation information for the delineation of field boundaries. In this study, crop field boundaries of Iowa in the United States are extracted using multi-year (2007-2018) CDL data. The process is simple compared to boundary extraction from multi-date remote sensing data. Although this process was unable to distinguish some adjacent fields, the overall accuracy is promising. Utilization of advanced geoprocessing algorithms and tools on polygon correction may improve the result significantly. Extracted field boundaries are validated by superimposing on fine resolution Google Earth images. The result shows that crop field boundaries can easily be extracted with reasonable accuracy using crop rotation information. 
    more » « less
  2. Plant diseases are one of the grand challenges that face the agriculture sector worldwide. In the United States, crop diseases cause losses of one-third of crop production annually. Despite the importance, crop disease diagnosis is challenging for limited-resources farmers if performed through optical observation of plant leaves’ symptoms. Therefore, there is an urgent need for markedly improved detection, monitoring, and prediction of crop diseases to reduce crop agriculture losses. Computer vision empowered with Machine Learning (ML) has tremendous promise for improving crop monitoring at scale in this context. This paper presents an ML-powered mobile-based system to automate the plant leaf disease diagnosis process. The developed system uses Convolutional Neural networks (CNN) as an underlying deep learning engine for classifying 38 disease categories. We collected an imagery dataset containing 96,206 images of plant leaves of healthy and infected plants for training, validating, and testing the CNN model. The user interface is developed as an Android mobile app, allowing farmers to capture a photo of the infected plant leaves. It then displays the disease category along with the confidence percentage. It is expected that this system would create a better opportunity for farmers to keep their crops healthy and eliminate the use of wrong fertilizers that could stress the plants. Finally, we evaluated our system using various performance metrics such as classification accuracy and processing time. We found that our model achieves an overall classification accuracy of 94% in recognizing the most common 38 disease classes in 14 crop species. 
    more » « less
  3. Crop type information at the field level is vital for many types of research and applications. The United States Department of Agriculture (USDA) provides information on crop types for US cropland as a Cropland Data Layer (CDL). However, CDL is only available at the end of the year after the crop growing season. Therefore, CDL is unable to support in-season research and decision-making regarding crop loss estimation, yield estimation, and grain pricing. The USDA mostly relies on field survey and farmers’ reports for the ground truth to train image classification models, which is one of the major reasons for the delayed release of CDL. This research aims to use trusted pixels as ground truth to train classification models. Trusted pixels are pixels which follow a specific crop rotation pattern. These trusted pixels are used to train image classification models for the classification of in-season Landsat images to identify major crop types. Six different classification algorithms are investigated and tested to select the best algorithm for this study. The Random Forest algorithm stands out among selected algorithms. This study classified Landsat scenes between May and mid-August for Iowa. The overall agreements of classification results with CDL in 2017 are 84%, 94%, and 96% for May, June, and July, respectively. The classification accuracies have been assessed through 683 ground truth data points collected from the fields. The overall accuracies of single date multi-band image classification are 84%, 89% and 92% for May, June, and July, respectively. The result also shows higher accuracy (94–95%) can be achieved through multi-date image classification compared to single date image classification. 
    more » « less
  4. Abstract

    Agriculture is the largest user of water in the United States. Yet, we do not understand the spatially resolved sources of irrigation water use (IWU) by crop. The goal of this study is to estimate crop‐specific IWU from surface water withdrawals (SWW), total groundwater withdrawals (GWW), and nonrenewable groundwater depletion (GWD). To do this, we employ the PCR‐GLOBWB 2 global hydrology model to partition irrigation information from the U.S. Geological Survey Water Use Database to specific crops across the Continental United States (CONUS). We incorporate high‐resolution input data on agricultural production and climate within the CONUS to obtain crop‐specific irrigation estimates for SWW, GWW, and GWD for 20 crops and crop groups from 2008 to 2020 at county spatial resolution. Over the study period, SWW decreased by 20%, while both GWW and GWD increased by 3%. On average, animal feed (alfalfa/hay) uses the most irrigation water across all water sources: 33 from SWW, 13 from GWW, and 10 km3/yr from GWD. Produce used less SWW (43%), but more GWW (57%), and GWD (27%) over the study time‐period. The largest changes in IWU for each water source between the years 2008 and 2020 are: rice (SWW decreased by 71%), sugar beets (GWW increased by 232%), and rapeseed (GWD increased by 405%). These results present the first national‐scale assessment of irrigation by crop, water source, and year. In total, we contribute nearly 2.5 million data points to the literature (3,142 counties; 13 years; 3 water sources; and 20 crops).

     
    more » « less
  5. Osteoarthritis of the knee is increasingly prevalent as our population ages, representing an increasing financial burden, and severely impacting quality of life. The invasiveness of in vivo procedures and the high cost of cadaveric studies has left computational tools uniquely suited to study knee biomechanics. Developments in deep learning have great potential for efficiently generating large-scale datasets to enable researchers to perform population-sized investigations, but the time and effort associated with producing robust hexahedral meshes has been a limiting factor in expanding finite element studies to encompass a population. Here we developed a fully automated pipeline capable of taking magnetic resonance knee images and producing a working finite element simulation. We trained an encoder-decoder convolutional neural network to perform semantic image segmentation on the Imorphics dataset provided through the Osteoarthritis Initiative. The Imorphics dataset contained 176 image sequences with varying levels of cartilage degradation. Starting from an open-source swept-extrusion meshing algorithm, we further developed this algorithm until it could produce high quality meshes for every sequence and we applied a template-mapping procedure to automatically place soft-tissue attachment points. The meshing algorithm produced simulation-ready meshes for all 176 sequences, regardless of the use of provided (manually reconstructed) or predicted (automatically generated) segmentation labels. The average time to mesh all bones and cartilage tissues was less than 2 min per knee on an AMD Ryzen 5600X processor, using a parallel pool of three workers for bone meshing, followed by a pool of four workers meshing the four cartilage tissues. Of the 176 sequences with provided segmentation labels, 86% of the resulting meshes completed a simulated flexion-extension activity. We used a reserved testing dataset of 28 sequences unseen during network training to produce simulations derived from predicted labels. We compared tibiofemoral contact mechanics between manual and automated reconstructions for the 24 pairs of successful finite element simulations from this set, resulting in mean root-mean-squared differences under 20% of their respective min-max norms. In combination with further advancements in deep learning, this framework represents a feasible pipeline to produce population sized finite element studies of the natural knee from subject-specific models. 
    more » « less