skip to main content


Title: EyeCoD: eye tracking system acceleration via flatcam-based algorithm & accelerator co-design
Eye tracking has become an essential human-machine interaction modality for providing immersive experience in numerous virtual and augmented reality (VR/AR) applications desiring high throughput (e.g., 240 FPS), small-form, and enhanced visual privacy. However, existing eye tracking systems are still limited by their: (1) large form-factor largely due to the adopted bulky lens-based cameras; (2) high communication cost required between the camera and backend processor; and (3) potentially concerned low visual privacy, thus prohibiting their more extensive applications. To this end, we propose, develop, and validate a lensless FlatCambased eye tracking algorithm and accelerator co-design framework dubbed EyeCoD to enable eye tracking systems with a much reduced form-factor and boosted system efficiency without sacrificing the tracking accuracy, paving the way for next-generation eye tracking solutions. On the system level, we advocate the use of lensless FlatCams instead of lens-based cameras to facilitate the small form-factor need in mobile eye tracking systems, which also leaves rooms for a dedicated sensing-processor co-design to reduce the required camera-processor communication latency. On the algorithm level, EyeCoD integrates a predict-then-focus pipeline that first predicts the region-of-interest (ROI) via segmentation and then only focuses on the ROI parts to estimate gaze directions, greatly reducing redundant computations and data movements. On the hardware level, we further develop a dedicated accelerator that (1) integrates a novel workload orchestration between the aforementioned segmentation and gaze estimation models, (2) leverages intra-channel reuse opportunities for depth-wise layers, (3) utilizes input feature-wise partition to save activation memory size, and (4) develops a sequential-write-parallel-read input buffer to alleviate the bandwidth requirement for the activation global buffer. On-silicon measurement and extensive experiments validate that our EyeCoD consistently reduces both the communication and computation costs, leading to an overall system speedup of 10.95×, 3.21×, and 12.85× over general computing platforms including CPUs and GPUs, and a prior-art eye tracking processor called CIS-GEP, respectively, while maintaining the tracking accuracy. Codes are available at https://github.com/RICE-EIC/EyeCoD.  more » « less
Award ID(s):
1934767 1937592
NSF-PAR ID:
10357328
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ;
Date Published:
Journal Name:
ISCA '22: Proceedings of the 49th Annual International Symposium on Computer Architecture
Page Range / eLocation ID:
610 to 622
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We present a first-of-its-kind ultra-compact intelligent camera system, dubbed i-FlatCam, including a lensless camera with a computational (Comp.) chip. It highlights (1) a predict-then-focus eye tracking pipeline for boosted efficiency without compromising the accuracy, (2) a unified compression scheme for single-chip processing and improved frame rate per second (FPS), and (3) dedicated intra-channel reuse design for depth-wise convolutional layers (DW-CONV) to increase utilization. i-FlatCam demonstrates the first eye tracking pipeline with a lensless camera and achieves 3.16 degrees of accuracy, 253 FPS, 91.49 µJ/Frame, and 6.7mm×8.9mm×1.2mm camera form factor, paving the way for next-generation Augmented Reality (AR) and Virtual Reality (VR) devices. 
    more » « less
  2. null (Ed.)
    We present a personalized, comprehensive eye-tracking solution based on tracking higher-order Purkinje images, suited explicitly for eyeglasses-style AR and VR displays. Existing eye-tracking systems for near-eye applications are typically designed to work for an on-axis configuration and rely on pupil center and corneal reflections (PCCR) to estimate gaze with an accuracy of only about 0.5°to 1°. These are often expensive, bulky in form factor, and fail to estimate monocular accommodation, which is crucial for focus adjustment within the AR glasses. Our system independently measures the binocular vergence and monocular accommodation using higher-order Purkinje reflections from the eye, extending the PCCR based methods. We demonstrate that these reflections are sensitive to both gaze rotation and lens accommodation and model the Purkinje images’ behavior in simulation. We also design and fabricate a user-customized eye tracker using cheap off-the-shelf cameras and LEDs. We use an end-to-end convolutional neural network (CNN) for calibrating the eye tracker for the individual user, allowing for robust and simultaneous estimation of vergence and accommodation. Experimental results show that our solution, specifically catering to individual users, outperforms state-of-the-art methods for vergence and depth estimation, achieving an accuracy of 0.3782°and 1.108 cm respectively. 
    more » « less
  3. The density and complexity of urban environments present significant challenges for autonomous vehicles. Moreover, ensuring pedestrians’ safety and protecting personal privacy are crucial considerations in these environments. Smart city intersections and AI-powered traffic management systems will be essential for addressing these challenges. Therefore, our research focuses on creating an experimental framework for the design of applications that support the secure and efficient management of traffic intersections in urban areas. We integrated two cameras (street-level and bird’s eye view), both viewing an intersection, and a programmable edge computing node, deployed within the COSMOS testbed in New York City, with a central management platform provided by Kentyou. We designed a pipeline to collect and analyze the video streams from both cameras and obtain real-time traffic/pedestrian-related information to support smart city applications. The obtained information from both cameras is merged, and the results are sent to a dedicated dashboard for real-time visualization and further assessment (e.g., accident prevention). The process does not require sending the raw videos in order to avoid violating pedestrians’ privacy. In this demo, we present the designed video analytic pipelines and their integration with Kentyou central management platform. Index Terms—object detection and tracking, camera networks, smart intersection, real-time visualization 
    more » « less
  4. null (Ed.)
    Lensless imaging has emerged as a potential solution towards realizing ultra-miniature cameras by eschewing the bulky lens in a traditional camera. Without a focusing lens, the lensless cameras rely on computational algorithms to recover the scenes from multiplexed measurements. However, the current iterative-optimization-based reconstruction algorithms produce noisier and perceptually poorer images. In this work, we propose a non-iterative deep learning-based reconstruction approach that results in orders of magnitude improvement in image quality for lensless reconstructions. Our approach, called FlatNet, lays down a framework for reconstructing high-quality photorealistic images from mask-based lensless cameras, where the camera's forward model formulation is known. FlatNet consists of two stages: (1) an inversion stage that maps the measurement into a space of intermediate reconstruction by learning parameters within the forward model formulation, and (2) a perceptual enhancement stage that improves the perceptual quality of this intermediate reconstruction. These stages are trained together in an end-to-end manner. We show high-quality reconstructions by performing extensive experiments on real and challenging scenes using two different types of lensless prototypes: one which uses a separable forward model and another, which uses a more general non-separable cropped-convolution model. Our end-to-end approach is fast, produces photorealistic reconstructions, and is easy to adopt for other mask-based lensless cameras. 
    more » « less
  5. Ultra-large mesoscopic imaging advances in the cortex open new pathways to develop neuroprosthetics to restore foveal vision in blind patients. Using targeted optogenetic activation, an optical prosthetic can focally stimulate spatially localized lateral geniculate nucleus (LGN) synaptic boutons within the primary visual cortex (V1). If we localize a cluster within a specific hypercolumn’s input layer, we will find that activation of a subset of these boutons is perceptually fungible with the activation of a different subset of boutons from the same hypercolumn input module. By transducing these LGN neurons with light-sensitive proteins, they are now sensitive to light and we can optogenetically stimulate them in a pattern mimicking naturalistic visual input. Optogenetic targeting of these purely glutamatergic inputs is free from unwanted co-activation of inhibitory neurons (a common problem in electrode-based prosthetic devices, which result in diminished contrast perception). We must prosthetically account for rapidly changing cortical activity and gain control, so our system integrates a real-time cortical read-out mechanism to continually assess and provide feedback to modify stimulation levels, just as the natural visual system does. We accomplish this by readingout a multi-colored array of genetically-encoded and transduced bioluminescent calcium responses in V1 neurons. This hyperspectral array of colors can achieve single-cell resolution. By tracking eye movements in the blind patients, we will account for oculomotor effects by adjusting the contemporaneous stimulation of the LGN boutons to mimic the effects of natural vision, including those from eye movements. This system, called the Optogenetic Brain System (OBServ), is designed to function by optimally activating visual responses in V1 from a fully-implantable coplanar emitter array coupled with a video camera and a bioluminescent read-out system. It follows that if we stimulate the LGN input modules in the same pattern as natural vision, the recipient should perceive naturalistic prosthetic vision. As such, the system holds the promise of restoring vision in the blind at the highest attainable acuity, with maximal contrast sensitivity, using an integrated nanophotonic implantable device that receives eye-tracked video input from a head-mounted video camera, using relatively non-invasive prosthetic technology that does not cross the pia mater of the brain. 
    more » « less