skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 10:00 PM ET on Friday, February 6 until 10:00 AM ET on Saturday, February 7 due to maintenance. We apologize for the inconvenience.


Title: Cyberinfrastructure for machine learning applications in agriculture: experiences, analysis, and vision
IntroductionAdvancements in machine learning (ML) algorithms that make predictions from data without being explicitly programmed and the increased computational speeds of graphics processing units (GPUs) over the last decade have led to remarkable progress in the capabilities of ML. In many fields, including agriculture, this progress has outpaced the availability of sufficiently diverse and high-quality datasets, which now serve as a limiting factor. While many agricultural use cases appear feasible with current compute resources and ML algorithms, the lack of reusable hardware and software components, referred to as cyberinfrastructure (CI), for collecting, transmitting, cleaning, labeling, and training datasets is a major hindrance toward developing solutions to address agricultural use cases. This study focuses on addressing these challenges by exploring the collection, processing, and training of ML models using a multimodal dataset and providing a vision for agriculture-focused CI to accelerate innovation in the field. MethodsData were collected during the 2023 growing season from three agricultural research locations across Ohio. The dataset includes 1 terabyte (TB) of multimodal data, comprising Unmanned Aerial System (UAS) imagery (RGB and multispectral), as well as soil and weather sensor data. The two primary crops studied were corn and soybean, which are the state's most widely cultivated crops. The data collected and processed from this study were used to train ML models to make predictions of crop growth stage, soil moisture, and final yield. ResultsThe exercise of processing this dataset resulted in four CI components that can be used to provide higher accuracy predictions in the agricultural domain. These components included (1) a UAS imagery pipeline that reduced processing time and improved image quality over standard methods, (2) a tabular data pipeline that aggregated data from multiple sources and temporal resolutions and aligned it with a common temporal resolution, (3) an approach to adapting the model architecture for a vision transformer (ViT) that incorporates agricultural domain expertise, and (4) a data visualization prototype that was used to identify outliers and improve trust in the data. DiscussionFurther work will be aimed at maturing the CI components and implementing them on high performance computing (HPC). There are open questions as to how CI components like these can best be leveraged to serve the needs of the agricultural community to accelerate the development of ML applications in agriculture.  more » « less
Award ID(s):
2112606
PAR ID:
10577776
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
Frontiers in Artificial Intelligence, Sec. AI in Food, Agriculture and Water
Date Published:
Journal Name:
Frontiers in Artificial Intelligence
Volume:
7
ISSN:
2624-8212
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Mapping and monitoring crops is a key step towards the sustainable intensification of agriculture and addressing global food security. A dataset like ImageNet that revolutionized computer vision applications can accelerate the development of novel crop mapping techniques. Currently, the United States Department of Agriculture (USDA) annually releases the Cropland Data Layer (CDL) which contains crop labels at 30m resolution for the entire United States of America. While CDL is state of the art and is widely used for a number of agricultural applications, it has a number of limitations (e.g., pixelated errors, labels carried over from previous years, and errors in the classification of minor crops). In this work, we create a new semantic segmentation benchmark dataset, which we call CalCROP21, for the diverse crops in the Central Valley region of California at 10m spatial resolution using a Google Earth Engine based robust image processing pipeline and a novel attention-based spatio-temporal semantic segmentation algorithm STATT. STATT uses re-sampled (interpolated) CDL labels for training but is able to generate a better prediction than CDL by leveraging spatial and temporal patterns in Sentinel2 multi-spectral image series to effectively capture phenologic differences amongst crops and uses attention to reduce the impact of clouds and other atmospheric disturbances. We also present a comprehensive evaluation to show that STATT has significantly better results when compared to the resampled CDL labels. We have released the dataset and the processing pipeline code for generating the benchmark dataset. 
    more » « less
  2. Abstract ContextUnoccupied aerial systems/vehicles (UAS/UAV, a.k.a. drones) have become an increasingly popular tool for ecological research. But much of the recent research is concerned with developing mapping and detection approaches, with few studies attempting to link UAS data to ecosystem processes and function. Landscape ecologists have long used high resolution imagery and spatial analyses to address ecological questions and are therefore uniquely positioned to advance UAS research for ecological applications. ObjectivesThe review objectives are to: (1) provide background on how UAS are used in landscape ecological studies, (2) identify major advancements and research gaps, and (3) discuss ways to better facilitate the use of UAS in landscape ecology research. MethodsWe conducted a systematic review based on PRISMA guidelines using key search terms that are unique to landscape ecology research. We reviewed only papers that applied UAS data to investigate questions about ecological patterns, processes, or function. ResultsWe summarize metadata from 161 papers that fit our review criteria. We highlight and discuss major research themes and applications, sensors and data collection techniques, image processing, feature extraction and spatial analysis, image fusion and satellite scaling, and open data and software. ConclusionWe observed a diversity of UAS methods, applications, and creative spatial modeling and analysis approaches. Key aspects of UAS research in landscape ecology include modeling wildlife micro-habitats, scaling of ecosystem functions, landscape and geomorphic change detection, integrating UAS with historical aerial and satellite imagery, and novel applications of spatial statistics. 
    more » « less
  3. The successful implementation of vision-based navigation in agricultural fields hinges upon two critical components: 1) the accurate identification of key components within the scene, and 2) the identification of lanes through the detection of boundary lines that separate the crops from the traversable ground. We propose Agronav, an end-to-end vision-based autonomous navigation framework, which outputs the centerline from the input image by sequentially processing it through semantic segmentation and semantic line detection models. We also present Agroscapes, a pixel-level annotated dataset collected across six different crops, captured from varying heights and angles. This ensures that the framework trained on Agroscapes is generalizable across both ground and aerial robotic platforms. Codes, models and dataset will be publicly released. 
    more » « less
  4. BackgroundPredicting the likelihood of success of weight loss interventions using machine learning (ML) models may enhance intervention effectiveness by enabling timely and dynamic modification of intervention components for nonresponders to treatment. However, a lack of understanding and trust in these ML models impacts adoption among weight management experts. Recent advances in the field of explainable artificial intelligence enable the interpretation of ML models, yet it is unknown whether they enhance model understanding, trust, and adoption among weight management experts. ObjectiveThis study aimed to build and evaluate an ML model that can predict 6-month weight loss success (ie, ≥7% weight loss) from 5 engagement and diet-related features collected over the initial 2 weeks of an intervention, to assess whether providing ML-based explanations increases weight management experts’ agreement with ML model predictions, and to inform factors that influence the understanding and trust of ML models to advance explainability in early prediction of weight loss among weight management experts. MethodsWe trained an ML model using the random forest (RF) algorithm and data from a 6-month weight loss intervention (N=419). We leveraged findings from existing explainability metrics to develop Prime Implicant Maintenance of Outcome (PRIMO), an interactive tool to understand predictions made by the RF model. We asked 14 weight management experts to predict hypothetical participants’ weight loss success before and after using PRIMO. We compared PRIMO with 2 other explainability methods, one based on feature ranking and the other based on conditional probability. We used generalized linear mixed-effects models to evaluate participants’ agreement with ML predictions and conducted likelihood ratio tests to examine the relationship between explainability methods and outcomes for nested models. We conducted guided interviews and thematic analysis to study the impact of our tool on experts’ understanding and trust in the model. ResultsOur RF model had 81% accuracy in the early prediction of weight loss success. Weight management experts were significantly more likely to agree with the model when using PRIMO (χ2=7.9; P=.02) compared with the other 2 methods with odds ratios of 2.52 (95% CI 0.91-7.69) and 3.95 (95% CI 1.50-11.76). From our study, we inferred that our software not only influenced experts’ understanding and trust but also impacted decision-making. Several themes were identified through interviews: preference for multiple explanation types, need to visualize uncertainty in explanations provided by PRIMO, and need for model performance metrics on similar participant test instances. ConclusionsOur results show the potential for weight management experts to agree with the ML-based early prediction of success in weight loss treatment programs, enabling timely and dynamic modification of intervention components to enhance intervention effectiveness. Our findings provide methods for advancing the understandability and trust of ML models among weight management experts. 
    more » « less
  5. Abstract ContextSoil resource heterogeneity drives plant species diversity patterns at local and landscape scales. In drylands, biocrusts are patchily distributed and contribute to soil resource heterogeneity important for plant establishment and growth. Yet, we have a limited understanding of how such heterogeneity may relate to patterns of plant diversity and community structure. ObjectivesWe explored relationships between biocrust-associated soil cover heterogeneity and plant diversity patterns in a cool desert ecosystem. We asked: (1) does biocrust-associated soil cover heterogeneity predict plant diversity and community composition? and (2) can we use high-resolution remote sensing data to calculate soil cover heterogeneity metrics that could be used to extrapolate these patterns across landscapes? MethodsWe tested associations among field-based measures of plant diversity and soil cover heterogeneity. We then used a Support Vector Machine classification to map soil, plant and biocrust cover from sub-centimeter resolution Unoccupied Aerial System (UAS) imagery and compared the mapped results to field-based measures. ResultsField-based soil cover heterogeneity and biocrust cover were positively associated with plant diversity and predicted community composition. The accuracy of UAS-mapped soil cover classes varied across sites due to variation in timing and quality of image collections, but the overall results suggest that UAS are a promising data source for generating detailed, spatially explicit soil cover heterogeneity metrics. ConclusionsResults improve understanding of relationships between biocrust-associated soil cover heterogeneity and plant diversity and highlight the promise of high-resolution UAS data to extrapolate these patterns over larger landscapes which could improve conservation planning and predictions of dryland responses to soil degradation under global change. 
    more » « less