Crowded metropolises present unique challenges to the potential deployment of autonomous vehicles. Safety of pedestrians cannot be compromised and personal privacy must be preserved. Smart city intersections will be at the core of Artificial Intelligence (AI)-powered citizen-friendly traffic management systems for such metropolises. Hence, the main objective of this work is to develop an experimentation framework for designing applications in support of secure and efficient traffic intersections in urban areas. We integrated a camera and a programmable edge computing node, deployed within the COSMOS testbed in New York City, with an Eclipse sensiNact data platform provided by Kentyou. We use this pipeline to collect and analyze video streams in real-time to support smart city applications. In this demo, we present a video analytics pipeline that analyzes the video stream from a COSMOS’ street-level camera to extract traffic/crowd-related information and sends it to a dedicated dashboard for real-time visualization and further assessment. This is done without sending the raw video, in order to avoid violating pedestrians’ privacy.
more »
« less
VisualWorldDB: A DBMS for the Visual World. Brandon Haynes, Maureen Daum, Amrita Mazumdar, Magdalena Balazinska, Alvin Cheung, and Luis Ceze. CIDR, 2020.
Many recent video applications |including autonomous driving, traffic monitoring, drone analytics, large-scale surveillance networks, and virtual reality require reasoning about, combining, and operating over many video streams, each with distinct position and orientation. However, modern video data management systems are largely designed to process individual streams of video data as if they were independent and unrelated. In this paper, we present VisualWorldDB, a vision and an initial architecture for a new type of database management system optimized for multi-video applications. VisualWorldDB ingests video data from many perspectives and makes them queryable as a single multidimensional visual object. It incorporates new techniques for optimizing, executing, and storing multi-perspective video data. Our preliminary results suggest that this approach allows for faster queries and lower storage costs, improving the state of the art for applications that operate over this type of video data.
more »
« less
- Award ID(s):
- 1703051
- PAR ID:
- 10257106
- Date Published:
- Journal Name:
- Conference on Innovative Data Systems Research (CIDR)
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
With the fast development of Fifth-/Sixth-Generation (5G/6G) communications and the Internet of Video Things (IoVT), a broad range of mega-scale data applications emerge (e.g., all-weather all-time video). These network-based applications highly depend on reliable, secure, and real-time audio and/or video streams (AVSs), which consequently become a target for attackers. While modern Artificial Intelligence (AI) technology is integrated with many multimedia applications to help enhance its applications, the development of General Adversarial Networks (GANs) also leads to deepfake attacks that enable manipulation of audio or video streams to mimic any targeted person. Deepfake attacks are highly disturbing and can mislead the public, raising further challenges in policy, technology, social, and legal aspects. Instead of engaging in an endless AI arms race “fighting fire with fire”, where new Deep Learning (DL) algorithms keep making fake AVS more realistic, this paper proposes a novel approach that tackles the challenging problem of detecting deepfaked AVS data leveraging Electrical Network Frequency (ENF) signals embedded in the AVS data as a fingerprint. Under low Signal-to-Noise Ratio (SNR) conditions, Short-Time Fourier Transform (STFT) and Multiple Signal Classification (MUSIC) spectrum estimation techniques are investigated to detect the Instantaneous Frequency (IF) of interest. For reliable authentication, we enhanced the ENF signal embedded through an artificial power source in a noisy environment using the spectral combination technique and a Robust Filtering Algorithm (RFA). The proposed signal estimation workflow was deployed on a continuous audio/video input for resilience against frame manipulation attacks. A Singular Spectrum Analysis (SSA) approach was selected to minimize the false positive rate of signal correlations. Extensive experimental analysis for a reliable ENF edge-based estimation in deepfaked multimedia recordings is provided to facilitate the need for distinguishing artificially altered media content.more » « less
-
Cloud computing infrastructures have become the de-facto platform for data driven machine learning applications. However, these centralized models of computing are unqualified for dispersed high volume real-time edge data intensive applications such as real time object detection, where video streams may be captured at multiple geographical locations. While many recent advancements in object detection have been made using Convolutional Neural Networks but these performance improvements only focus on a single contiguous object detection model. In this paper, we propose a distributed Edge-Cloud R-CNN by splitting the model into components and dynamically distributing these components in the cloud for optimal performance for real time object detection. As a proof of concept, we evaluate the performance of the proposed system on a distributed computing platform encompasses cloud servers and edge embedded devices for real-time object detection on video streams.more » « less
-
The density and complexity of urban environments present significant challenges for autonomous vehicles. Moreover, ensuring pedestrians’ safety and protecting personal privacy are crucial considerations in these environments. Smart city intersections and AI-powered traffic management systems will be essential for addressing these challenges. Therefore, our research focuses on creating an experimental framework for the design of applications that support the secure and efficient management of traffic intersections in urban areas. We integrated two cameras (street-level and bird’s eye view), both viewing an intersection, and a programmable edge computing node, deployed within the COSMOS testbed in New York City, with a central management platform provided by Kentyou. We designed a pipeline to collect and analyze the video streams from both cameras and obtain real-time traffic/pedestrian-related information to support smart city applications. The obtained information from both cameras is merged, and the results are sent to a dedicated dashboard for real-time visualization and further assessment (e.g., accident prevention). The process does not require sending the raw videos in order to avoid violating pedestrians’ privacy. In this demo, we present the designed video analytic pipelines and their integration with Kentyou central management platform.more » « less
-
In today's world, AI systems need to make sense of large amounts of data as it unfolds in real-time, whether it's a video from surveillance and monitoring cameras, streams of egocentric footage, or sequences in other domains such as text or audio. The ability to break these continuous data streams into meaningful events, discover nested structures, and predict what might happen next at different levels of abstraction is crucial for applications ranging from passive surveillance systems to sensory-motor autonomous learning. However, most existing models rely heavily on large, annotated datasets with fixed data distributions and offline epoch-based training, which makes them impractical for handling the unpredictability and scale of dynamic real-world environments. This dissertation tackles these challenges by introducing a set of predictive models designed to process streaming data efficiently, segment events, and build sequential memory models without supervision or data storage. First, we present a single-layer predictive model that segments long, unstructured video streams by detecting temporal events and spatially localizing objects in each frame. The model is applied to wildlife monitoring footage, where it processes continuous, high-frame-rate video and successfully detects and tracks events without supervision. It operates in an online streaming manner to perform simultaneous training and inference without storing or revisiting the processed data. This approach alleviates the need for manual labeling, making it ideal for handling long-duration, real-world video footage. Building on this, we introduce STREAMER, a multi-layered architecture that extends the single-layer model into a hierarchical predictive framework. STREAMER segments events at different levels of abstraction, capturing the compositional structure of activities in egocentric videos. By dynamically adapting to various timescales, it creates a hierarchy of nested events and forms more complex and abstract representations of the input data. Finally, we propose the Predictive Attractor Model (PAM), which builds biologically plausible memory models of sequential data. Inspired by neuroscience, PAM uses sparse distributed representations and local learning rules to avoid catastrophic forgetting, allowing it to continually learn and make predictions without overwriting previous knowledge. Unlike many traditional models, PAM can generate multiple potential future outcomes conditioned on the same context, which allows for handling uncertainty in generative tasks. Together, these models form a unified framework of predictive learning that addresses multiple challenges in event understanding and temporal data analyses. By using prediction as the core mechanism, they segment continuous data streams into events, discover hierarchical structures across multiple levels of abstraction, learn semantic event representations, and model sequences without catastrophic forgetting.more » « less
An official website of the United States government

