Event cameras capture the world at high time resolution and with minimal bandwidth requirements. However, event streams, which only encode changes in brightness, do not contain sufficient scene information to support a wide variety of downstream tasks. In this work, we design generalized event cameras that inherently preserve scene intensity in a bandwidth-efficient manner. We generalize event cameras in terms of when an event is generated and what information is transmitted. To implement our designs, we turn to single-photon sensors that provide digital access to individual photon detections; this modality gives us the flexibility to realize a rich space of generalized event cameras. Our single-photon event cameras are capable of high-speed, high-fidelity imaging at low readout rates. Consequently, these event cameras can support plug-and-play downstream inference, without capturing new event datasets or designing specialized event-vision models. As a practical implication, our designs, which involve lightweight and near-sensor-compatible computations, provide a way to use single-photon sensors without exorbitant bandwidth costs.
more »
« less
This content will become publicly available on June 10, 2026
Event fields: Capturing light fields at high speed, resolution, and dynamic range
Event cameras, which feature pixels that independently respond to changes in brightness, are becoming increasingly popular in high- speed applications due to their lower latency, reduced bandwidth requirements, and enhanced dynamic range compared to traditional frame- based cameras. Numerous imaging and vision techniques have leveraged event cameras for high- speed scene understanding by capturing high- framerate, high- dynamic range videos, primarily utilizing the temporal advantages inherent to event cameras. Additionally, imaging and vision techniques have utilized the light field—a complementary dimension to temporal information—for enhanced scene understanding.In this work, we propose "Event Fields", a new approach that utilizes innovative optical designs for event cameras to capture light fields at high speed. We develop the underlying mathematical framework for Event Fields and introduce two foundational frameworks to capture them practically: spatial multiplexing to capture temporal derivatives and temporal multiplexing to capture angular derivatives. To realize these, we design two complementary optical setups— one using a kaleidoscope for spatial multiplexing and another using a galvanometer for temporal multiplexing. We evaluate the performance of both designs using a custom-built simulator and real hardware prototypes, showcasing their distinct benefits. Our event fields unlock the full advantages of typical light fields—like post- capture refocusing and depth estimation—now supercharged for high- speed and high- dynamic range scenes. This novel light- sensing paradigm opens doors to new applications in photography, robotics, and AR/VR, and presents fresh challenges in rendering and machine learning.
more »
« less
- PAR ID:
- 10647882
- Publisher / Repository:
- IEEE
- Date Published:
- Page Range / eLocation ID:
- 26910 to 26920
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Event and frame cameras capture the complemen-tary spatial and temporal details of a scene providing an accuracy vs. latency trade-off. Fusing these processing modalities using convolutional (CNN) and spiking neural networks (SNN) respectively has been shown for target tracking. We present our heterogeneous RRAM compute-in-memory (CIM) and SRAM compute-near-memory (CNM) SoC for simultaneous processing of CNN and SNN. We will show the advantage of using fused vision over frame-only vision and demonstrate python programmable data streaming. The visitors will be able to see the processing-dependent dynamic power gating of non-volatile RRAM and in-memory error correction capability.more » « less
-
Abstract Multidimensional photography can capture optical fields beyond the capability of conventional image sensors that measure only two-dimensional (2D) spatial distribution of light. By mapping a high-dimensional datacube of incident light onto a 2D image sensor, multidimensional photography resolves the scene along with other information dimensions, such as wavelength and time. However, the application of current multidimensional imagers is fundamentally restricted by their static optical architectures and measurement schemes—the mapping relation between the light datacube voxels and image sensor pixels is fixed. To overcome this limitation, we propose tunable multidimensional photography through active optical mapping. A high-resolution spatial light modulator, referred to as an active optical mapper, permutes and maps the light datacube voxels onto sensor pixels in an arbitrary and programmed manner. The resultant system can readily adapt the acquisition scheme to the scene, thereby maximising the measurement flexibility. Through active optical mapping, we demonstrate our approach in two niche implementations: hyperspectral imaging and ultrafast imaging.more » « less
-
Evolution has honed predatory skills in the natural world where localizing and intercepting fast-moving prey is required. The current generation of robotic systems mimics these biological systems using deep learning. High-speed processing of the camera frames using convolutional neural networks (CNN) (frame pipeline) on such constrained aerial edge-robots gets resource-limited. Adding more compute resources also eventually limits the throughput at the frame rate of the camera as frame-only traditional systems fail to capture the detailed temporal dynamics of the environment. Bio-inspired event cameras and spiking neural networks (SNN) provide an asynchronous sensor-processor pair (event pipeline) capturing the continuous temporal details of the scene for high-speed but lag in terms of accuracy. In this work, we propose a target localization system combining event-camera and SNN-based high-speed target estimation and frame-based camera and CNN-driven reliable object detection by fusing complementary spatio-temporal prowess of event and frame pipelines. One of our main contributions involves the design of an SNN filter that borrows from the neural mechanism for ego-motion cancelation in houseflies. It fuses the vestibular sensors with the vision to cancel the activity corresponding to the predator's self-motion. We also integrate the neuro-inspired multi-pipeline processing with task-optimized multi-neuronal pathway structure in primates and insects. The system is validated to outperform CNN-only processing using prey-predator drone simulations in realistic 3D virtual environments. The system is then demonstrated in a real-world multi-drone set-up with emulated event data. Subsequently, we use recorded actual sensory data from multi-camera and inertial measurement unit (IMU) assembly to show desired working while tolerating the realistic noise in vision and IMU sensors. We analyze the design space to identify optimal parameters for spiking neurons, CNN models, and for checking their effect on the performance metrics of the fused system. Finally, we map the throughput controlling SNN and fusion network on edge-compatible Zynq-7000 FPGA to show a potential 264 outputs per second even at constrained resource availability. This work may open new research directions by coupling multiple sensing and processing modalities inspired by discoveries in neuroscience to break fundamental trade-offs in frame-based computer vision 1 .more » « less
-
Abstract—Accurately capturing dynamic scenes with wideranging motion and light intensity is crucial for many vision applications. However, acquiring high-speed high dynamic range (HDR) video is challenging because the camera’s frame rate restricts its dynamic range. Existing methods sacrifice speed to acquire multi-exposure frames. Yet, misaligned motion in these frames can still pose complications for HDR fusion algorithms, resulting in artifacts. Instead of frame-based exposures, we sample the videos using individual pixels at varying exposures and phase offsets. Implemented on a monochrome pixel-wise programmable image sensor, our sampling pattern captures fast motion at a high dynamic range. We then transform pixel-wise outputs into an HDR video using end-to-end learned weights from deep neural networks, achieving high spatiotemporal resolution with minimized motion blurring. We demonstrate aliasing-free HDR video acquisition at 1000 FPS, resolving fast motion under low-light conditions and against bright backgrounds — both challenging conditions for conventional cameras. By combining the versatility of pixel-wise sampling patterns with the strength of deep neural networks at decoding complex scenes, our method greatly enhances the vision system’s adaptability and performance in dynamic conditions. Index Terms—High-dynamic-range video, high-speed imaging, CMOS image sensors, programmable sensors, deep learning, convolutional neural networks.more » « less
An official website of the United States government
