skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: CalCROP21: A Georeferenced multi-spectral dataset of Satellite Imagery and Crop Labels
Mapping and monitoring crops is a key step towards the sustainable intensification of agriculture and addressing global food security. A dataset like ImageNet that revolutionized computer vision applications can accelerate the development of novel crop mapping techniques. Currently, the United States Department of Agriculture (USDA) annually releases the Cropland Data Layer (CDL) which contains crop labels at 30m resolution for the entire United States of America. While CDL is state of the art and is widely used for a number of agricultural applications, it has a number of limitations (e.g., pixelated errors, labels carried over from previous years, and errors in the classification of minor crops). In this work, we create a new semantic segmentation benchmark dataset, which we call CalCROP21, for the diverse crops in the Central Valley region of California at 10m spatial resolution using a Google Earth Engine based robust image processing pipeline and a novel attention-based spatio-temporal semantic segmentation algorithm STATT. STATT uses re-sampled (interpolated) CDL labels for training but is able to generate a better prediction than CDL by leveraging spatial and temporal patterns in Sentinel2 multi-spectral image series to effectively capture phenologic differences amongst crops and uses attention to reduce the impact of clouds and other atmospheric disturbances. We also present a comprehensive evaluation to show that STATT has significantly better results when compared to the resampled CDL labels. We have released the dataset and the processing pipeline code for generating the benchmark dataset.  more » « less
Award ID(s):
1838159
PAR ID:
10346433
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
2021 IEEE International Conference on Big Data (Big Data)
Page Range / eLocation ID:
1625 to 1632
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. GIS data layer on crop field boundary has many applications in agricultural research, ecosystem study, crop monitoring, and land management. Crop field boundary mapping through field survey is not time and cost effective for vast agriculture areas. Onscreen digitization on fine-resolution satellite image is also labor-intensive and error-prone. The recent development in image segmentation based on their spectral characteristics is promising for cropland boundary detection. However, processing of large volume multi-band satellite images often required high-performance computation systems. This study utilized crop rotation information for the delineation of field boundaries. In this study, crop field boundaries of Iowa in the United States are extracted using multi-year (2007-2018) CDL data. The process is simple compared to boundary extraction from multi-date remote sensing data. Although this process was unable to distinguish some adjacent fields, the overall accuracy is promising. Utilization of advanced geoprocessing algorithms and tools on polygon correction may improve the result significantly. Extracted field boundaries are validated by superimposing on fine resolution Google Earth images. The result shows that crop field boundaries can easily be extracted with reasonable accuracy using crop rotation information. 
    more » « less
  2. Abstract The United States is a major producer and exporter of agricultural goods, fulfilling global demands for food, fiber, and fuel while generating substantial economic benefits. Agriculture in the U.S. not only dominates land use but also ranks as the largest water-consuming sector. High-resolution cropland mapping and insights into cultivation trends are essential to enhance sustainable management of land and water resources. Existing data sources present a trade-off between temporal breadth and spatial resolution, leading to gaps in detailed geographic crop distribution. To bridge this gap, we adopted a data-fusion methodology that leverages the advantages of various data sources, including county-level data from the U.S. Department of Agriculture, along with several gridded land use datasets. This approach enabled us to create annual maps, termed HarvestGRID, of irrigated and harvested areas for 30 key crops across the U.S. from 1981 to 2019 at a resolution of 2.5 arc minutes. Over the past four decades, irrigated harvested area has remained relatively stable nationally; however, several western states exhibit a declining trend, while some eastern states show an upward trend. Notably, more than 50% of the irrigated land in the U.S. lies above three major aquifers: the High Plains, Central Valley, and Mississippi Embayment Aquifers. We assessed the accuracy of HarvestGRID by comparing it with other large-scale gridded cropland databases, identifying both consistencies and discrepancies across different years, regions, and crops. This dataset is pivotal for analyzing long-term cropland use patterns and supports the advancement of more sustainable agricultural practices. 
    more » « less
  3. The Cropland Data Layer (CDL) is currently the only subfield level high resolution crop-specific land cover data product over the entire conterminous United States (CONUS). It has been widely used in agricultural industry, business decision support, research, and education worldwide. However, CDL data has its limitations. It is an end-of-season land cover map which is not available within growing season. Moreover, CDLs in early years have many misclassified pixels (relatively low accuracy) due to cloud cover and lack of satellite images. This paper will present the studies of using machine learning technique to address these issues in CDL data. Specifically, we will present the design and implementation of a machine learning model for agro-geoinformation discovery from CDL. Several application scenarios of the proposed model, including prediction of crop cover, crop acreage estimation, in-season crop mapping, and refinement of the earlyyear CDL data, are demonstrated and discussed. 
    more » « less
  4. Crop type information at the field level is vital for many types of research and applications. The United States Department of Agriculture (USDA) provides information on crop types for US cropland as a Cropland Data Layer (CDL). However, CDL is only available at the end of the year after the crop growing season. Therefore, CDL is unable to support in-season research and decision-making regarding crop loss estimation, yield estimation, and grain pricing. The USDA mostly relies on field survey and farmers’ reports for the ground truth to train image classification models, which is one of the major reasons for the delayed release of CDL. This research aims to use trusted pixels as ground truth to train classification models. Trusted pixels are pixels which follow a specific crop rotation pattern. These trusted pixels are used to train image classification models for the classification of in-season Landsat images to identify major crop types. Six different classification algorithms are investigated and tested to select the best algorithm for this study. The Random Forest algorithm stands out among selected algorithms. This study classified Landsat scenes between May and mid-August for Iowa. The overall agreements of classification results with CDL in 2017 are 84%, 94%, and 96% for May, June, and July, respectively. The classification accuracies have been assessed through 683 ground truth data points collected from the fields. The overall accuracies of single date multi-band image classification are 84%, 89% and 92% for May, June, and July, respectively. The result also shows higher accuracy (94–95%) can be achieved through multi-date image classification compared to single date image classification. 
    more » « less
  5. This dataset provides estimates of total Irrigation Water Use (IWU) by crop, county, water source, and year for the Continental United States. Total irrigation from Surface Water Withdrawals (SWW), total Groundwater Withdrawals (GWW), and nonrenewable Groundwater Depletion (GWD) is provided for 20 crops and crop groups from 2008 to 2020 at the county spatial resolution. In total, there are nearly 2.5 million data points in this dataset (3,142 counties; 13 years; 3 water sources; and 20 crops). This dataset supports the paper by Ruess et al (2024) "Total irrigation by crop in the Continental United States from 2008 to 2020", Scientific Data, doi: 10.1038/s41597-024-03244-w When using, please cite as: Ruess, P.J., Konar, M., Wanders, N., and Bierkens, M.F.P. (2024) Total irrigation by crop in the Continental United States from 2008 to 2020, Scientific Data, doi: 10.1038/s41597-024-03244-w 
    more » « less