skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on December 1, 2026

Title: High-resolution phenomics dataset collected on a field-grown, EMS-mutagenized sorghum population evaluated in hot, arid conditions
Abstract ObjectivesThe University of Arizona Field Scanner (FS) is capable of generating massive amounts of data from a variety of instruments at high spatial and temporal resolution. The accompanying field infrastructure beneath the system offers capacity for controlled irrigation regimes in a hot, arid environment. Approximately 194 terabytes of raw and processed phenotypic image data were generated over two growing seasons (2020 and 2022) on a population of 434 sequence-indexed, EMS-mutagenized sorghum lines in the genetic background BTx623; the population was grown under well-watered and water-limited conditions. Collectively, these data enable links between genotype and dynamic, drought-responsive phenotypes, which can accelerate crop improvement efforts. However, analysis of these data can be challenging for researchers without background knowledge of the system and preliminary processing. Data descriptionThis dataset contains formatted tabular data generated from sensing system outputs suitable for a wide range of end-users and includes plant-level bounding areas, temperatures, and point cloud characteristics, as well as plot-level photosynthetic parameters and accompanying weather data. The dataset includes approximately 422 megabytes of tabular data totaling 1,903,412 unique unfiltered rows of FS data, 526,917 cleaned rows of FS data, and 285 rows of weather data from the two field seasons.  more » « less
Award ID(s):
2102120
PAR ID:
10625326
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Springer Nature
Date Published:
Journal Name:
BMC Research Notes
Volume:
18
Issue:
1
ISSN:
1756-0500
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. IntroductionAI fairness seeks to improve the transparency and explainability of AI systems by ensuring that their outcomes genuinely reflect the best interests of users. Data augmentation, which involves generating synthetic data from existing datasets, has gained significant attention as a solution to data scarcity. In particular, diffusion models have become a powerful technique for generating synthetic data, especially in fields like computer vision. MethodsThis paper explores the potential of diffusion models to generate synthetic tabular data to improve AI fairness. The Tabular Denoising Diffusion Probabilistic Model (Tab-DDPM), a diffusion model adaptable to any tabular dataset and capable of handling various feature types, was utilized with different amounts of generated data for data augmentation. Additionally, reweighting samples from AIF360 was employed to further enhance AI fairness. Five traditional machine learning models—Decision Tree (DT), Gaussian Naive Bayes (GNB), K-Nearest Neighbors (KNN), Logistic Regression (LR), and Random Forest (RF)—were used to validate the proposed approach. Results and discussionExperimental results demonstrate that the synthetic data generated by Tab-DDPM improves fairness in binary classification. 
    more » « less
  2. IntroductionAdvancements in machine learning (ML) algorithms that make predictions from data without being explicitly programmed and the increased computational speeds of graphics processing units (GPUs) over the last decade have led to remarkable progress in the capabilities of ML. In many fields, including agriculture, this progress has outpaced the availability of sufficiently diverse and high-quality datasets, which now serve as a limiting factor. While many agricultural use cases appear feasible with current compute resources and ML algorithms, the lack of reusable hardware and software components, referred to as cyberinfrastructure (CI), for collecting, transmitting, cleaning, labeling, and training datasets is a major hindrance toward developing solutions to address agricultural use cases. This study focuses on addressing these challenges by exploring the collection, processing, and training of ML models using a multimodal dataset and providing a vision for agriculture-focused CI to accelerate innovation in the field. MethodsData were collected during the 2023 growing season from three agricultural research locations across Ohio. The dataset includes 1 terabyte (TB) of multimodal data, comprising Unmanned Aerial System (UAS) imagery (RGB and multispectral), as well as soil and weather sensor data. The two primary crops studied were corn and soybean, which are the state's most widely cultivated crops. The data collected and processed from this study were used to train ML models to make predictions of crop growth stage, soil moisture, and final yield. ResultsThe exercise of processing this dataset resulted in four CI components that can be used to provide higher accuracy predictions in the agricultural domain. These components included (1) a UAS imagery pipeline that reduced processing time and improved image quality over standard methods, (2) a tabular data pipeline that aggregated data from multiple sources and temporal resolutions and aligned it with a common temporal resolution, (3) an approach to adapting the model architecture for a vision transformer (ViT) that incorporates agricultural domain expertise, and (4) a data visualization prototype that was used to identify outliers and improve trust in the data. DiscussionFurther work will be aimed at maturing the CI components and implementing them on high performance computing (HPC). There are open questions as to how CI components like these can best be leveraged to serve the needs of the agricultural community to accelerate the development of ML applications in agriculture. 
    more » « less
  3. Abstract Many disease epidemics recur seasonally, and such seasonal epidemics can be shaped by species interactions among parasites, pathogens, or other microbes. Field experiments are a classic approach for understanding species interactions but are rarely used to study seasonal epidemics. Our research objective was to help fill this gap by manipulating the seasonal timing of the establishment of infectious diseases while tracking epidemics and other ecological responses. To do this, we conducted a multiyear field experiment in an old field in the Piedmont of North Carolina, USA, dominated by the grass species tall fescue (Lolium arundinaceum(Schreb). Darbysh). In the field, tall fescue experienced seasonal epidemics of multiple foliar fungal diseases: anthracnose in spring, brown patch in mid‐summer, and crown rust in late summer to fall. In a fully randomized design, we applied four fungicide treatments to replicate plots of intact vegetation in specific seasons to manipulate the timing of disease epidemics. One treatment was designed to delay the establishment of anthracnose until mid‐summer, and another to delay the establishment of both anthracnose and brown patch until fall. In a third treatment, fungicide was applied year‐round, and, in a fourth treatment, fungicide was never applied. The experiment comprised 64 plots, each 2 m × 2 m, surveyed from May 2017 to February 2020. Here, we report a dataset documenting responses in the community structure of both plants and foliar fungi. To track disease prevalence in the host population across seasons and years, this dataset includes monthly leaf‐level observations of the disease status of over 100,000 leaves. To quantify transmission and investigate within‐host pathogen interactions, we longitudinally surveyed disease status in host individuals of known age at least weekly over two growing seasons. Finally, the dataset includes annual data on infection prevalence of the systemic fungal endophyteEpichloë coenophiala, community‐level aboveground plant biomass, and plant community cover. These data can be used for meta‐analyses, comparisons, and syntheses across systems as ecologists seek to predict and mechanistically understand seasonal disease epidemics. There are no copyrights on the dataset, and we request that users of this dataset cite this paper in all publications resulting from its use. 
    more » « less
  4. Abstract BackgroundSince the 1980s, Pacific Black Brant (Branta bernicla nigricans, hereafter brant) have shifted their winter distribution northward from Mexico to Alaska (approximately 4500 km) with changes in climate. Alongside this shift, the primary breeding population of brant has declined. To understand the population-level implications of the changing migration strategy of brant, it is important to connect movement and demographic data. Our objectives were to calculate migratory connectivity, a measure of spatial and temporal overlap during the non-breeding period, for Arctic and subarctic breeding populations of brant, and to determine if variation in migration strategies affected nesting phenology and nest survival. MethodsWe derived a migratory network using light-level geolocator migration tracks from an Arctic site (Colville River Delta) and a subarctic site (Tutakoke River) in Alaska. Using this network, we quantified the migratory connectivity of the two populations during the winter. We also compared nest success rates among brant that used different combinations of winter sites and breeding sites. ResultsThe two breeding populations were well mixed during the winter, as indicated by a migratory connectivity score close to 0 (− 0.06) at the primary wintering sites of Izembek Lagoon, Alaska (n = 11 brant) and Baja California, Mexico (n = 48). However, Arctic birds were more likely to migrate the shorter distance to Izembek (transition probability = 0.24) compared to subarctic birds (transition probability = 0.09). Nest survival for both breeding populations was relatively high (0.88–0.92), and we did not detect an effect of wintering site on nest success the following year. ConclusionsNest survival of brant did not differ among brant that used wintering sites despite a 4500 km difference in migration distances. Our results also suggested that the growing Arctic breeding population is unlikely to compensate for declines in the larger breeding population of brant in the subarctic. However, this study took place in 2011–2014 and wintering at Izembek Lagoon may have greater implications for reproductive success under future climate conditions. 
    more » « less
  5. ABSTRACT MotivationSNAPSHOT USA is an annual, multicontributor camera trap survey of mammals across the United States. The growing SNAPSHOT USA dataset is intended for tracking the spatial and temporal responses of mammal populations to changes in land use, land cover and climate. These data will be useful for exploring the drivers of spatial and temporal changes in relative abundance and distribution, as well as the impacts of species interactions on daily activity patterns. Main Types of Variables ContainedSNAPSHOT USA 2019–2023 contains 987,979 records of camera trap image sequence data and 9694 records of camera trap deployment metadata. Spatial Location and GrainData were collected across the United States of America in all 50 states, 12 ecoregions and many ecosystems. Time Period and GrainData were collected between 1st August and 29th December each year from 2019 to 2023. Major Taxa and Level of MeasurementThe dataset includes a wide range of taxa but is primarily focused on medium to large mammals. Software FormatSNAPSHOT USA 2019–2023 comprises two .csv files. The original data can be found within the SNAPSHOT USA Initiative in the Wildlife Insights platform. 
    more » « less