Abstract Trait-based approaches are revolutionizing our understanding of high-diversity ecosystems by providing insights into the principles underlying key ecological processes, such as community assembly, species distribution, resilience, and the relationship between biodiversity and ecosystem functioning. In 2016, the Coral Trait Database advanced coral reef science by centralizing trait information for stony corals (i.e., Subphylum Anthozoa, Class Hexacorallia, Order Scleractinia). However, the absence of trait data for soft corals, gorgonians, and sea pens (i.e., Class Octocorallia) limits our understanding of ecosystems where these organisms are significant members and play pivotal roles. To address this gap, we introduce the Octocoral Trait Database, a global, open-source database of curated trait data for octocorals. This database houses species- and individual-level data, complemented by contextual information that provides a relevant framework for analyses. The inaugural dataset, OctocoralTraits v2.2, contains over 97,500 global trait observations across 98 traits and over 3,500 species. The database aims to evolve into a steadily growing, community-led resource that advances future marine science, with a particular emphasis on coral reef research.
more »
« less
This content will become publicly available on June 15, 2026
Fish-Vista: A Multi-Purpose Dataset for Understanding & Identification of Traits from Images
The availability of large datasets of organism images combined with advances in artificial intelligence (AI) has significantly enhanced the study of organisms through images, unveiling biodiversity patterns and macro-evolutionary trends. However, existing machine learning (ML)-ready organism datasets have several limitations. First, these datasets often focus on species classification only, overlooking tasks involving visual traits of organisms. Second, they lack detailed visual trait annotations, like pixel-level segmentation, that are crucial for in-depth biological studies. Third, these datasets predominantly feature organisms in their natural habitats, posing challenges for aquatic species like fish, where underwater images often suffer from poor visual clarity, obscuring critical biological traits. This gap hampers the study of aquatic biodiversity patterns which is necessary for the assessment of climate change impacts, and evolutionary research on aquatic species morphology. To address this, we introduce the Fish-Visual Trait Analysis (Fish-Vista) dataset—a large, annotated collection of about 80K fish images spanning 3000 different species, supporting several challenging and biologically relevant tasks including species classification, trait identification, and trait segmentation. These images have been curated through a sophisticated data processing pipeline applied to a cumulative set of images obtained from various museum collections. Fish-Vista ensures that visual traits of images are clearly visible, and provides fine-grained labels of various visual traits present in each image. It also offers pixel-level annotations of 9 different traits for about 7000 fish images, facilitating additional trait segmentation and localization tasks. The ultimate goal of Fish-Vista is to provide a clean, carefully curated, high-resolution dataset that can serve as a foundation for accelerating biological discoveries using advances in AI. Finally, we provide a comprehensive analysis of state-of-the-art deep learning techniques on Fish-Vista.
more »
« less
- Award ID(s):
- 2118240
- PAR ID:
- 10611524
- Publisher / Repository:
- CVPR
- Date Published:
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
The Segment Anything Model (SAM) is a recently proposed prompt-based segmentation model in a generic zero-shot segmentation approach. With the zero-shot segmentation capacity, SAM achieved impressive flexibility and precision on various segmentation tasks. However, the current pipeline requires manual prompts during the inference stage, which is still resource intensive for biomedical image segmentation. In this paper, instead of using prompts during the inference stage, we introduce a pipeline that utilizes the SAM, called all-in-SAM, through the entire AI development workflow (from annotation generation to model finetuning) without requiring manual prompts during the inference stage. Specifically, SAM is first employed to generate pixel-level annotations from weak prompts (e.g., points, bounding box). Then, the pixel-level annotations are used to finetune the SAM segmentation model rather than training from scratch. Our experimental results reveal two key findings: 1) the proposed pipeline surpasses the state-of-the-art (SOTA) methods in a nuclei segmentation task on the public Monuseg dataset, and 2) the utilization of weak and few annotations for SAM finetuning achieves competitive performance compared to using strong pixel-wise annotated data.more » « less
-
We introduce VISOR, a new dataset of pixel annotations and a benchmark suite for segmenting hands and active objects in egocentric video. VISOR annotates videos from EPIC-KITCHENS, which comes with a new set of challenges not encountered in current video segmentation datasets. Specifically, we need to ensure both short- and long-term consistency of pixel-level annotations as objects undergo transformative interactions, e.g. an onion is peeled, diced and cooked - where we aim to obtain accurate pixel-level annotations of the peel, onion pieces, chopping board, knife, pan, as well as the acting hands. VISOR introduces an annotation pipeline, AI-powered in parts, for scalability and quality. In total, we publicly release 272K manual semantic masks of 257 object classes, 9.9M interpolated dense masks, 67K hand-object relations, covering 36 hours of 179 untrimmed videos. Along with the annotations, we introduce three challenges in video object segmentation, interaction understanding and long-term reasoning. For data, code and leaderboards: http://epic-kitchens.github.io/VISORmore » « less
-
Abstract Trait-based frameworks are increasingly used for predicting how ecological communities respond to ongoing global change. As species range shifts result in novel encounters between predators and prey, identifying prey ‘guilds’, based on a suite of shared traits, can distill complex species interactions, and aid in predicting food web dynamics. To support advances in trait-based research in open-ocean systems, we present the Pelagic Species Trait Database, an extensive resource documenting functional traits of 529 pelagic fish and invertebrate species in a single, open-source repository. We synthesized literature sources and online resources, conducted morphometric analysis of species images, as well as laboratory analyses of trawl-captured specimens to collate traits describing 1) habitat use and behavior, 2) morphology, 3) nutritional quality, and 4) population status information. Species in the dataset primarily inhabit the California Current system and broader NE Pacific Ocean, but also includes pelagic species known to be consumed by top ocean predators from other ocean basins. The aim of this dataset is to enhance the use of trait-based approaches in marine ecosystems and for predator populations worldwide.more » « less
-
Abstract PremiseQuantitative plant traits play a crucial role in biological research. However, traditional methods for measuring plant morphology are time consuming and have limited scalability. We present LeafMachine2, a suite of modular machine learning and computer vision tools that can automatically extract a base set of leaf traits from digital plant data sets. MethodsLeafMachine2 was trained on 494,766 manually prepared annotations from 5648 herbarium images obtained from 288 institutions and representing 2663 species; it employs a set of plant component detection and segmentation algorithms to isolate individual leaves, petioles, fruits, flowers, wood samples, buds, and roots. Our landmarking network automatically identifies and measures nine pseudo‐landmarks that occur on most broadleaf taxa. Text labels and barcodes are automatically identified by an archival component detector and are prepared for optical character recognition methods or natural language processing algorithms. ResultsLeafMachine2 can extract trait data from at least 245 angiosperm families and calculate pixel‐to‐metric conversion factors for 26 commonly used ruler types. DiscussionLeafMachine2 is a highly efficient tool for generating large quantities of plant trait data, even from occluded or overlapping leaves, field images, and non‐archival data sets. Our project, along with similar initiatives, has made significant progress in removing the bottleneck in plant trait data acquisition from herbarium specimens and shifted the focus toward the crucial task of data revision and quality control.more » « less
An official website of the United States government
