Virtual reality is progressively more widely used to support embodied AI agents, such as robots, which frequently engage in ‘sim-to-real’ based learning approaches. At the same time, tools such as large vision-and-language models offer new capabilities that tie into a wide variety of tasks and capabilities. In order to understand how such agents can learn from simulated environments, we explore a language model’s ability to recover the type of object represented by a photorealistic 3D model as a function of the 3D perspective from which the model is viewed. We used photogrammetry to create 3D models of commonplace objects and rendered 2D images of these models from an fixed set of 420 virtual camera perspectives. A well-studied image and language model (CLIP) was used to generate text (i.e., prompts) corresponding to these images. Using multiple instances of various object classes, we studied which camera perspectives were most likely to return accurate text categorizations for each class of object.
more »
« less
Programmatic 3D Printing of a Revolving Camera Track to Automatically Capture Dense Images for 3D Scanning of Objects
Low-cost 3D scanners and automatic photogrammetry software have brought digitization of objects into 3D models to the level of the consumer. However, the digitization techniques are either tedious, disruptive to the scanned object, or expensive. We create a novel 3D scanning system using consumer grade hardware that revolves a camera around the object of interest. Our approach does not disturb the object during capture and allows us to scan delicate objects that can deform under motion, such as potted plants. Our system consists of a Raspberry Pi camera and computer, stepper motor, 3D printed camera track, and control software. Our 3D scanner allows the user to gather image sets for 3D model reconstruction using photogrammetry software with minimal effort. We scale 3D scanning to objects of varying sizes by designing our scanner using programmatic modeling, and allowing the user to change the physical dimensions of the scanner without redrawing each part.
more »
« less
- Award ID(s):
- 1730183
- PAR ID:
- 10056162
- Date Published:
- Journal Name:
- Multimedia Modeling (MMM) 2018
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)This protocol describes the process of phenotyping branching coral using the 3D model editing software, MeshLab. MeshLab is a free, straightforward software to analyze 3D models of corals that is especially useful in its ability to import color from Agisoft Metashape models. This protocol outlines the steps used by the Kenkel lab to noninvasively phenotype Acropora cervicornis colonies for total linear extension (TLE), surface area, volume, and volume of interstitial space. We incorporate Agisoft Metashape markers with our Tomahawk scaling system (see Image Capture Protocol) in our workflow which is useful for scaling and to improve model building. Other scaling objects can be used, however these markers provide a consistent scale that do not obstruct the coral during image capture. MeshLab measurements of TLE have been groundtruthed to field measures of TLE. 3D surface area and volume have not yet been compared to traditional methods of wax dipping, for surface area, and water displacement, for volume. However, in tests with shapes of known dimensions, i.e. cubes, MeshLab produced accurate measures of 3D surface area and volume when compared to calculated surface area and volume. For directions to photograph coral for 3D photogrammetry see our Image Capture Protocol. For a walkthrough and scripts to run Agisoft Metashape on the command line, see https://github.com/wyattmillion/Coral3DPhotogram. These protocols, while created for branching coral, can be applied to 3D models of any coral morphology or any object really. Our goal is to make easy-to-use protocols using accessible softwares in the hopes of creating a standardized method for 3D photogrammetry in coral biology. Go to http://www.meshlab.net/#download to download the appropriate software for your operating system. P. Cignoni, M. Callieri, M. Corsini, M. Dellepiane, F. Ganovelli, G. Ranzuglia MeshLab: an Open-Source Mesh Processing Tool Sixth Eurographics Italian Chapter Conference, page 129-136, 2008 DOI dx.doi.org/10.17504/protocols.io.bgbpjsmnmore » « less
-
Neural rendering is fuelling a unification of learning, 3D geometry and video understanding that has been waiting for more than two decades. Progress, however, is still hampered by a lack of suitable datasets and benchmarks. To address this gap, we introduce EPIC Fields, an augmentation of EPIC-KITCHENS with 3D camera information. Like other datasets for neural rendering, EPIC Fields removes the complex and expensive step of reconstructing cameras using photogrammetry, and allows researchers to focus on modelling problems. We illustrate the challenge of photogrammetry in egocentric videos of dynamic actions and propose innovations to address them. Compared to other neural rendering datasets, EPIC Fields is better tailored to video understanding because it is paired with labelled action segments and the recent VISOR segment annotations. To further motivate the community, we also evaluate three benchmark tasks in neural rendering and segmenting dynamic objects, with strong baselines that showcase what is not possible today. We also highlight the advantage of geometry in semi-supervised video object segmentations on the VISOR annotations. EPIC Fields reconstructs 96% of videos in EPICKITCHENS, registering 19M frames in 99 hours recorded in 45 kitchens, and is available from: http://epic-kitchens.github.io/epic-fieldsmore » « less
-
Synopsis Acquiring accurate 3D biological models efficiently and economically is important for morphological data collection and analysis in organismal biology. In recent years, structure-from-motion (SFM) photogrammetry has become increasingly popular in biological research due to its flexibility and being relatively low cost. SFM photogrammetry registers 2D images for reconstructing camera positions as the basis for 3D modeling and texturing. However, most studies of organismal biology still relied on commercial software to reconstruct the 3D model from photographs, which impeded the adoption of this workflow in our field due the blocking issues such as cost and affordability. Also, prior investigations in photogrammetry did not sufficiently assess the geometric accuracy of the models reconstructed. Consequently, this study has two goals. First, we presented an affordable and highly flexible SFM photogrammetry pipeline based on the open-source package OpenDroneMap (ODM) and its user interface WebODM. Second, we assessed the geometric accuracy of the photogrammetric models acquired from the ODM pipeline by comparing them to the models acquired via microCT scanning, the de facto method to image skeleton. Our sample comprised 15 Aplodontia rufa (mountain beaver) skulls. Using models derived from microCT scans of the samples as reference, our results showed that the geometry of the models derived from ODM was sufficiently accurate for gross metric and morphometric analysis as the measurement errors are usually around or below 2%, and morphometric analysis captured consistent patterns of shape variations in both modalities. However, subtle but distinct differences between the photogrammetric and microCT-derived 3D models could affect the landmark placement, which in return affected the downstream shape analysis, especially when the variance within a sample is relatively small. At the minimum, we strongly advise not combining 3D models derived from these two modalities for geometric morphometric analysis. Our findings can be indictive of similar issues in other SFM photogrammetry tools since the underlying pipelines are similar. We recommend that users run a pilot test of geometric accuracy before using photogrammetric models for morphometric analysis. For the research community, we provide detailed guidance on using our pipeline for building 3D models from photographs.more » « less
-
Full surround 3D imaging for shape acquisition is essential for generating digital replicas of real-world objects. Surrounding an object we seek to scan with a kaleidoscope, that is, a configuration of multiple planar mirrors, produces an image of the object that encodes information from a combinatorially large number of virtual viewpoints. This information is practically useful for the full surround 3D reconstruction of the object, but cannot be used directly, as we do not know what virtual viewpoint each image pixel corresponds---the pixel label. We introduce a structured light system that combines a projector and a camera with a kaleidoscope. We then prove that we can accurately determine the labels of projector and camera pixels, for arbitrary kaleidoscope configurations, using the projector-camera epipolar geometry. We use this result to show that our system can serve as a multi-view structured light system with hundreds of virtual projectors and cameras. This makes our system capable of scanning complex shapes precisely and with full coverage. We demonstrate the advantages of the kaleidoscopic structured light system by scanning objects that exhibit a large range of shapes and reflectances.more » « less
An official website of the United States government

