skip to main content

Title: Cycling-Net: A Deep Learning Approach to Predicting Cyclist Behaviors from Geo-Referenced Egocentric Video Data
Cycling, as a green transportation mode, provides an environmentally friendly transportation choice for short-distance traveling. However, cyclists are also getting involved in fatal accidents more frequently in recent years. Thus, understanding and modeling their road behaviors is crucial in helping improving road safety laws and infrastructures. Traditionally, people understand road user behavior using either purely spatial trajectory data, or videos from fixed surveillance camera through tracking or predicting their paths. However, these data only cover limited areas and do not provide information from the cyclist's field of view. In this paper, we take advantage of geo-referenced egocentric video data collected from the handlebar cameras of cyclists to learn how to predict their behaviors. This approach is technically more challenging, because both the observer and objects in the scene might be moving, and there are strong temporal dependencies in both the behaviors of cyclists and the video scenes. We propose Cycling-Net, a novel deep learning model that tracks different types of objects in consecutive scenes and learns the relationship between the movement of these objects and the behavior of the cyclist. Experiment results on a naturalistic trip dataset show the Cycling-Net is effective in behavior prediction and outperforms a baseline model.  more » « less
Award ID(s):
1942680 1952085 1831140
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
Proceedings of the 28th International Conference on Advances in Geographic Information Systems
Page Range / eLocation ID:
337 to 346
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Cycling as a mode of transportation has been recording an upward trend in both the U.S. and Europe. Unfortunately, the safety of cyclists has been a point of growing concern. Data from the National Highway Traffic Safety Administration (NHTSA) show that the crashes that occur during the events of motorists overtaking cyclists was one of the leading categories involving cyclists in fatal crashes. In support of the efforts to understand the driving behavior of drivers of motorized vehicles while overtaking cyclists, this research project is aimed at developing an algorithm to identify the overtaking events. Most existing quantitative studies on cycling safety leverage instrumented bicycles or vehicles with sensors for extracting naturalistic driving trajectories. Whereas we use data from a recent research that provides naturalistic driving trajectories of road users collected at select intersections in urban areas in Germany using drones equipped with cameras. Using these videos with a data frequency of 25 Hz, the authors of this study have output inD dataset. The inD dataset contains trajectories of road users that are captured in form of coordinates on a two-dimensional plane obtained from the ariel or bird's eye view of the road. Additionally, the data also captures velocity, acceleration, heading angles, dimensions of driver's vehicle etc. Overtaking can be thought of as four phases of approaching, steering away, passing, and returning. Using the inD dataset, we have developed an algorithm to identify events when a driver of motor vehicle overtakes a cyclist. This work fits into our broader goal to contribute to the body of knowledge for improving road safety of cyclists. The work is expected to provide inputs to governmental/ traffic authorities in aspects such as design of intersections and design of bicycle lanes by providing insights into overtaking events. 
    more » « less
  2. Learning the human--mobility interaction (HMI) on interactive scenes (e.g., how a vehicle turns at an intersection in response to traffic lights and other oncoming vehicles) can enhance the safety, efficiency, and resilience of smart mobility systems (e.g., autonomous vehicles) and many other ubiquitous computing applications. Towards the ubiquitous and understandable HMI learning, this paper considers both spoken language (e.g., human textual annotations) and unspoken language (e.g., visual and sensor-based behavioral mobility information related to the HMI scenes) in terms of information modalities from the real-world HMI scenarios. We aim to extract the important but possibly implicit HMI concepts (as the named entities) from the textual annotations (provided by human annotators) through a novel human language and sensor data co-learning design.

    To this end, we propose CG-HMI, a novel Cross-modality Graph fusion approach for extracting important Human-Mobility Interaction concepts from co-learning of textual annotations as well as the visual and behavioral sensor data. In order to fuse both unspoken and spoken languages, we have designed a unified representation called the human--mobility interaction graph (HMIG) for each modality related to the HMI scenes, i.e., textual annotations, visual video frames, and behavioral sensor time-series (e.g., from the on-board or smartphone inertial measurement units). The nodes of the HMIG in these modalities correspond to the textual words (tokenized for ease of processing) related to HMI concepts, the detected traffic participant/environment categories, and the vehicle maneuver behavior types determined from the behavioral sensor time-series. To extract the inter- and intra-modality semantic correspondences and interactions in the HMIG, we have designed a novel graph interaction fusion approach with differentiable pooling-based graph attention. The resulting graph embeddings are then processed to identify and retrieve the HMI concepts within the annotations, which can benefit the downstream human-computer interaction and ubiquitous computing applications. We have developed and implemented CG-HMI into a system prototype, and performed extensive studies upon three real-world HMI datasets (two on car driving and the third one on e-scooter riding). We have corroborated the excellent performance (on average 13.11% higher accuracy than the other baselines in terms of precision, recall, and F1 measure) and effectiveness of CG-HMI in recognizing and extracting the important HMI concepts through cross-modality learning. Our CG-HMI studies also provide real-world implications (e.g., road safety and driving behaviors) about the interactions between the drivers and other traffic participants.

    more » « less
  3. Already known as densely populated areas with land use including housing, transportation, sanitation, utilities and communication, nowadays, cities tend to grow even bigger. Genuine road-user's types are emerging with further technological developments to come. As cities population size escalates, and roads getting congested, government agencies such as Department of Transportation (DOT) through the National Highway Traffic Safety Administration (NHTSA) are in pressing need to perfect their management systems with new efficient technologies. The challenge is to anticipate on never before seen problems, in their effort to save lives and implement sustainable cost-effective management systems. To make things yet more complicated and a bit daunting, self-driving car will be authorized in a close future in crowded major cities where roads are to be shared among pedestrians, cyclists, cars, and trucks. Roads sizes and traffic signaling will need to be constantly adapted accordingly. Counting and classifying turning vehicles and pedestrians at an intersection is an exhausting task and despite traffic monitoring systems use, human interaction is heavily required for counting. Our approach to resolve traffic intersection turning-vehicles counting is less invasive, requires no road dig up or costly installation. Live or recorded videos from already installed camera all over the cities can be used as well as any camera including cellphones. Our system is based on Neural Network and Deep Learning of object detection along computer vision technology and several methods and algorithms. Our approach will work on still images, recorded-videos, real-time live videos and will detect, classify, track and compute moving object velocity and direction using convolution neural network. Created based upon series of algorithms modeled after the human brain, our system uses NVIDIA Video cards with GPU, CUDA, OPENCV and mathematical vectors systems to perform. 
    more » « less
  4. Abstract

    Recent advancements in artificial intelligence (AI) have seen the emergence of smart video surveillance (SVS) in many practical applications, particularly for building safer and more secure communities in our urban environments. Cognitive tasks, such as identifying objects, recognizing actions, and detecting anomalous behaviors, can produce data capable of providing valuable insights to the community through statistical and analytical tools. However, artificially intelligent surveillance systems design requires special considerations for ethical challenges and concerns. The use and storage of personally identifiable information (PII) commonly pose an increased risk to personal privacy. To address these issues, this paper identifies the privacy concerns and requirements needed to address when designing AI-enabled smart video surveillance. Further, we propose the first end-to-end AI-enabled privacy-preserving smart video surveillance system that holistically combines computer vision analytics, statistical data analytics, cloud-native services, and end-user applications. Finally, we propose quantitative and qualitative metrics to evaluate intelligent video surveillance systems. The system shows the 17.8 frame-per-second (FPS) processing in extreme video scenes. However, considering privacy in designing such a system results in preferring the pose-based algorithm to the pixel-based one. This choice resulted in dropping accuracy in both action and anomaly detection tasks. The results drop from 97.48% to 73.72% in anomaly detection and 96% to 83.07% in the action detection task. On average, the latency of the end-to-end system is 36.1 seconds.

    more » « less
  5. Video scene analysis is a well-investigated area where researchers have devoted efforts to detect and classify people and objects in the scene. However, real-life scenes are more complex: the intrinsic states of the objects (e.g., machine operating states or human vital signals) are often overlooked by vision-based scene analysis. Recent work has proposed a radio frequency (RF) sensing technique, wireless vibrometry, that employs wireless signals to sense subtle vibrations from the objects and infer their internal states. We envision that the combination of video scene analysis with wireless vibrometry form a more comprehensive understanding of the scene, namely "rich scene analysis". However, the RF sensors used in wireless vibrometry only provide time series, and it is challenging to associate these time series data with multiple real-world objects. We propose a real-time RF-vision sensor fusion system, Capricorn, that efficiently builds a cross-modal correspondence between visual pixels and RF time series to better understand the complex natures of a scene. The vision sensors in Capricorn model the surrounding environment in 3D and obtain the distances of different objects. In the RF domain, the distance is proportional to the signal time-of-flight (ToF), and we can leverage the ToF to separate the RF time series corresponding to each object. The RF-vision sensor fusion in Capricorn brings multiple benefits. The vision sensors provide environmental contexts to guide the processing of RF data, which helps us select the most appropriate algorithms and models. Meanwhile, the RF sensor yields additional information that is originally invisible to vision sensors, providing insight into objects' intrinsic states. Our extensive evaluations show that Capricorn real-timely monitors multiple appliances' operating status with an accuracy of 97%+ and recovers vital signals like respirations from multiple people. A video ( demonstrates the capability of Capricorn. 
    more » « less