skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A Semi-Automated Technique for Transcribing Accurate Crowd Motions
We present a novel technique for transcribing crowds in video scenes that allows extracting the positions of moving objects in video frames. The technique can be used as a more precise alternative to image processing methods, such as background-removal or automated pedestrian detection based on feature extraction and classification. By manually projecting pedestrian actors on a two-dimensional plane and translating screen coordinates to absolute real-world positions using the cross ratio, we provide highly accurate and complete results at the cost of increased processing time. We are able to completely avoid most errors found in other automated annotation techniques, resulting from sources such as noise, occlusion, shadows, view angle or the density of pedestrians. It is further possible to process scenes that are difficult or impossible to transcribe by automated image processing methods, such as low-contrast or low-light environments. We validate our model by comparing it to the results of both background-removal and feature extraction and classification in a variety of scenes.  more » « less
Award ID(s):
1718139
PAR ID:
10205877
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
International Journal of Image and Graphics
Volume:
20
Issue:
02
ISSN:
0219-4678
Page Range / eLocation ID:
2050012
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. High-dimensional data is commonly encountered in various applications, including genomics, as well as image and video processing. Analyzing, computing, and visualizing such data pose significant challenges. Feature extraction methods are crucial in addressing these challenges by obtaining compressed representations that are suitable for analysis and downstream tasks. One effective technique along these lines is sparse coding, which involves representing data as a sparse linear combination of a set of exemplars. In this study, we propose a local sparse coding framework within the context of a classification problem. The objective is to predict the label of a given data point based on labeled training data. The primary optimization problem encourages the representation of each data point using nearby exemplars. We leverage the optimized sparse representation coefficients to predict the label of a test data point by assessing its similarity to the sparse representations of the training data. The proposed framework is computationally efficient and provides interpretable sparse representations. To illustrate the practicality of our proposed framework, we apply it to agriculture for the classification of crop diseases. 
    more » « less
  2. Abstract—Hyperdimensional Computing (HDC) is a neurallyinspired computation model working based on the observation that the human brain operates on high-dimensional representations of data, called hypervector. Although HDC is significantly powerful in reasoning and association of the abstract information, it is weak on features extraction from complex data such as image/video. As a result, most existing HDC solutions rely on expensive pre-processing algorithms for feature extraction. In this paper, we propose StocHD, a novel end-to-end hyperdimensional system that supports accurate, efficient, and robust learning over raw data. Unlike prior work that used HDC for learning tasks, StocHD expands HDC functionality to the computing area by mathematically defining stochastic arithmetic over HDC hypervectors. StocHD enables an entire learning application (including feature extractor) to process using HDC data representation, enabling uniform, efficient, robust, and highly parallel computation. We also propose a novel fully digital and scalable Processing In-Memory (PIM) architecture that exploits the HDC memorycentric nature to support extensively parallel computation. Our evaluation over a wide range of classification tasks shows that StocHD provides, on average, 3.3x and 6.4x (52.3x and 143.Sx) faster and higher energy efficiency as compared to state-of-the-art HDC algorithm running on PIM (NVIDIA GPU), while providing 16x higher computational robustness. 
    more » « less
  3. Vehicle-to-pedestrian communication could significantly improve pedestrian safety at signalized intersections. However, it is unlikely that pedestrians will typically be carrying a low latency communication-enabled device with an activated pedestrian safety application in their hand-held device all the time. Because of this, multiple traffic cameras at a signalized intersection could be used to accurately detect and locate pedestrians using deep learning, and broadcast safety alerts related to pedestrians to warn connected and automated vehicles around signalized intersections. However, the unavailability of high-performance roadside computing infrastructure and the limited network bandwidth between traffic cameras and the computing infrastructure limits the ability of real-time data streaming and processing for pedestrian detection. In this paper, we describe an edge computing-based real-time pedestrian detection strategy that combines a pedestrian detection algorithm using deep learning and an efficient data communication approach to reduce bandwidth requirements while maintaining high pedestrian detection accuracy. We utilize a lossy compression technique on traffic camera data to determine the tradeoff between the reduction of the communication bandwidth requirements and a defined pedestrian detection accuracy. The performance of the pedestrian detection strategy is measured in relation to pedestrian classification accuracy with varying peak signal-to-noise ratios. The analyses reveal that we detect pedestrians by maintaining a defined detection accuracy with a peak signal-to-noise ratio 43 dB while reducing the communication bandwidth from 9.82 Mbits/sec to 0.31 Mbits/sec, a 31× reduction. 
    more » « less
  4. Chondrocyte viability is a crucial factor in evaluating cartilage health. Most cell viability assays rely on dyes and are not applicable forin vivoor longitudinal studies. We previously demonstrated that two-photon excited autofluorescence and second harmonic generation microscopy provided high-resolution images of cells and collagen structure; those images allowed us to distinguish live from dead chondrocytes by visual assessment or by the normalized autofluorescence ratio. However, both methods require human involvement and have low throughputs. Methods for automated cell-based image processing can improve throughput. Conventional image processing algorithms do not perform well on autofluorescence images acquired by nonlinear microscopes due to low image contrast. In this study, we compared conventional, machine learning, and deep learning methods in chondrocyte segmentation and classification. We demonstrated that deep learning significantly improved the outcome of the chondrocyte segmentation and classification. With appropriate training, the deep learning method can achieve 90% accuracy in chondrocyte viability measurement. The significance of this work is that automated imaging analysis is possible and should not become a major hurdle for the use of nonlinear optical imaging methods in biological or clinical studies. 
    more » « less
  5. Online lecture videos are increasingly important e-learning materials for students. Automated content extraction from lecture videos facilitates information retrieval applications that improve access to the lecture material. A significant number of lecture videos include the speaker in the image. Speakers perform various semantically meaningful actions during the process of teaching. Among all the movements of the speaker, key actions such as writing or erasing potentially indicate important features directly related to the lecture content. In this paper, we present a methodology for lecture video content extraction using the speaker actions. Each lecture video is divided into small temporal units called action segments. Using a pose estimator, body and hands skeleton data are extracted and used to compute motion-based features describing each action segment. Then, the dominant speaker action of each of these segments is classified using Random forests and the motion-based features. With the temporal and spatial range of these actions, we implement an alternative way to draw key-frames of handwritten content from the video. In addition, for our fixed camera videos, we also use the skeleton data to compute a mask of the speaker writing locations for the subtraction of the background noise from the binarized key-frames. Our method has been tested on a publicly available lecture video dataset, and it shows reasonable recall and precision results, with a very good compression ratio which is better than previous methods based on content analysis. 
    more » « less