skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on June 22, 2026

Title: AI-Driven Mussel Behavior Monitoring Using An Accessible 3D Imaging System
Freshwater mussels are essential parts of our ecosystems to reduce water pollution. As natural bio-filters, they deactivate pollutants such as heavy metals, providing a sustainable method for water decontamination. This project will enable the use of Artificial Intelligence (AI) to monitor mussel behavior, particularly their gaping activity, to use them as bio-indicators for early detection of water contamination. In this paper, we employ advanced 3D reconstruction techniques to create detailed models of mussels to improve the accuracy of AI-based analysis. Specifically, we use a state-of-the-art 3D reconstruction tool, Neural Radiance Fields (NeRF), to create 3D models of mussel valve configurations and behavioral patterns. NeRF enables 3D reconstruction of scenes and objects from a sparse set of 2D images. To capture these images, we developed a data collection system capable of imaging mussels from multiple viewpoints. The system featured a turntable made of foam board with markers around the edges and a designated space in the center for mounting the mussels. The turntable was attached to a servo motor controlled by an ESP32 microcontroller. It rotated in a few degree increments, with the ESP32 camera capturing an image at each step. The images, along with degree information and timestamps, are stored on a Secure Digital (SD) memory card. Several components, such as the camera holder and turntable base, are 3D printed. These images are used to train a NeRF model using the Python-based Nerfstudio framework, and the resulting 3D models were viewed via the Nerfstudio API. The setup was designed to be user-friendly, making it easy for educational outreach engagements and to involve secondary education by replicating and operating 3D reconstructions of their chosen objects. We validated the accessibility and the impact of this platform in a STEM education summer program. A team of high school students from the Juntos Summer Academy at NC State University worked on this platform, gaining hands-on experience in embedded hardware development, basic machine learning principles, and 3D reconstruction from 2D images. We also report on their feedback on the activity.  more » « less
Award ID(s):
2319389
PAR ID:
10610354
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
2025 ASEE Annual Conference & Exposition, Montreal, Canada.
Date Published:
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Virtual reality is progressively more widely used to support embodied AI agents, such as robots, which frequently engage in ‘sim-to-real’ based learning approaches. At the same time, tools such as large vision-and-language models offer new capabilities that tie into a wide variety of tasks and capabilities. In order to understand how such agents can learn from simulated environments, we explore a language model’s ability to recover the type of object represented by a photorealistic 3D model as a function of the 3D perspective from which the model is viewed. We used photogrammetry to create 3D models of commonplace objects and rendered 2D images of these models from an fixed set of 420 virtual camera perspectives. A well-studied image and language model (CLIP) was used to generate text (i.e., prompts) corresponding to these images. Using multiple instances of various object classes, we studied which camera perspectives were most likely to return accurate text categorizations for each class of object. 
    more » « less
  2. Recent advances in Neural Radiance Field (NeRF)-based methods have enabled high-fidelity novel view synthesis for video with dynamic elements. However, these methods often require expensive hardware, take days to process a second-long video and do not scale well to longer videos. We create an end-to-end pipeline for creating dynamic 3D video from a monocular video that can be run on consumer hardware in minutes per second of footage, not days. Our pipeline handles the estimation of the camera parameters, depth maps, 3D reconstruction of dynamic foreground and static background elements, and the rendering of the 3D video on a computer or VR headset. We use a state-of-the-art visual transformer model to estimate depth maps which we use to scale COLMAP poses and enable RGB-D fusion with estimated depth data. In our preliminary experiments, we rendered the output in a VR headset and visually compared the method against ground-truth datasets and state-of-the-art NeRF-based methods. 
    more » « less
  3. Internet image collections containing photos captured by crowds of photographers show promise for enabling digital exploration of large‐scale tourist landmarks. However, prior works focus primarily on geometric reconstruction and visualization, neglecting the key role of language in providing a semantic interface for navigation and fine‐grained understanding. In more constrained 3D domains, recent methods have leveraged modern vision‐and‐language models as a strong prior of 2D visual semantics. While these models display an excellent understanding of broad visual semantics, they struggle with unconstrained photo collections depicting such tourist landmarks, as they lack expert knowledge of the architectural domain and fail to exploit the geometric consistency of images capturing multiple views of such scenes. In this work, we present a localization system that connects neural representations of scenes depicting large‐scale landmarks with text describing a semantic region within the scene, by harnessing the power of SOTA vision‐and‐language models with adaptations for understanding landmark scene semantics. To bolster such models with fine‐grained knowledge, we leverage large‐scale Internet data containing images of similar landmarks along with weakly‐related textual information. Our approach is built upon the premise that images physically grounded in space can provide a powerful supervision signal for localizing new concepts, whose semantics may be unlocked from Internet textual metadata with large language models. We use correspondences between views of scenes to bootstrap spatial understanding of these semantics, providing guidance for 3D‐compatible segmentation that ultimately lifts to a volumetric scene representation. To evaluate our method, we present a new benchmark dataset containing large‐scale scenes with ground‐truth segmentations for multiple semantic concepts. Our results show that HaLo‐NeRF can accurately localize a variety of semantic concepts related to architectural landmarks, surpassing the results of other 3D models as well as strong 2D segmentation baselines. Our code and data are publicly available at https://tau‐vailab.github.io/HaLo‐NeRF/ 
    more » « less
  4. Neural Radiance Fields (NeRF) have become an increasingly popular representation to capture high-quality appearance and shape of scenes and objects. However, learning generalizable NeRF priors over categories of scenes or objects has been challenging due to the high dimensionality of network weight space. To address the limitations of existing work on generalization, multi-view consistency and to improve quality, we propose HyP-NeRF, a latent conditioning method for learning generalizable category-level NeRF priors using hypernetworks. Rather than using hypernetworks to estimate only the weights of a NeRF, we estimate both the weights and the multi-resolution hash encodings resulting in significant quality gains. To improve quality even further, we incorporate a denoise and finetune strategy that denoises images rendered from NeRFs estimated by the hypernetwork and finetunes it while retaining multiview consistency. These improvements enable us to use HyP-NeRF as a generalizable prior for multiple downstream tasks including NeRF reconstruction from single-view or cluttered scenes and text-to-NeRF. We provide qualitative comparisons and evaluate HyP-NeRF on three tasks: generalization, compression, and retrieval, demonstrating our state-of-the-art results. 
    more » « less
  5. Neural Radiance Fields (NeRF) have become an increasingly popular representation to capture high-quality appearance and shape of scenes and objects. However, learning generalizable NeRF priors over categories of scenes or objects has been challenging due to the high dimensionality of network weight space. To address the limitations of existing work on generalization, multi-view consistency and to improve quality, we propose HyP-NeRF, a latent conditioning method for learning generalizable category-level NeRF priors using hypernetworks. Rather than using hypernetworks to estimate only the weights of a NeRF, we estimate both the weights and the multi-resolution hash encodings resulting in significant quality gains. To improve quality even further, we incorporate a denoise and finetune strategy that denoises images rendered from NeRFs estimated by the hypernetwork and finetunes it while retaining multiview consistency. These improvements enable us to use HyP-NeRF as a generalizable prior for multiple downstream tasks including NeRF reconstruction from single-view or cluttered scenes and text-to-NeRF. We provide qualitative comparisons and evaluate HyP-NeRF on three tasks: generalization, compression, and retrieval, demonstrating our state-of-the-art results. 
    more » « less