skip to main content


Title: Integrating Heterogeneous Sources for Learned Prediction of Vehicular Data Consumption
In addition to the multiple sensors to measure parameters that can be used to improve both safety and efficiency, modern vehicles also gather information about external data (e.g., traffic conditions, weather) which, if properly used, could further improve the overall trip experience. Specifically, when it comes to navigation, one source that can provide increased context awareness, especially for autonomous driving, are the High Definition (HD) maps, which have recently witnessed a tremendous growth of popularity in vehicular technology and use. As they are limited to a particular geographic area, different portions need to be downloaded (and processed) on multiple occasions throughout a given trip, along with the other data from other internal and external sources. In this paper, we provide an effective deep learning approach for the recently introduced problem of Predicting Map Data Consumption (PMDC) in the future time instants for a given trip. We propose a novel methodology that integrates multiple data sources (road network, traffic, historic trips, HD maps) and, for a given trip, enables prediction of the map data consumption. Our experimental observations demonstrate the benefits of the proposed approach over the candidate baselines.  more » « less
Award ID(s):
2030249
NSF-PAR ID:
10403394
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
23rd IEEE International Conference on Mobile Data Management, MDM 2022
Page Range / eLocation ID:
54 to 63
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Autonomous vehicles often rely on high-definition (HD) maps to navigate around. However, lane markings (LMs) are not necessarily static objects due to wear \& tear from usage and road reconstruction \& maintenance. Therefore, the wrong matching between LMs in the HD map and sensor readings may lead to erroneous localization or even cause traffic accidents. It is imperative to keep LMs up-to-date. However, frequently recollecting data with dedicated hardware and specialists to update HD maps is not only cost-prohibitive but also unviable. Here we propose to utilize crowdsourced images from multiple vehicles at different times to help verify LMs for HD map maintenance. We obtain the LM distribution in the image space by considering the camera pose uncertainty in perspective projection. Both LMs in HD map and LMs in the image are treated as observations of LM distributions which allow us to construct posterior conditional distribution (a.k.a Bayesian belief functions) of LMs from either sources. An LM is consistent if belief functions from the map and the image satisfy statistical hypothesis testing. We further extend the Bayesian belief model into a sequential belief update using crowdsourced images. LMs with a higher probability of existence are kept in the HD map whereas those with a lower probability of existence are removed from the HD map. We verify our approach using real data. Experimental results show that our method is capable of verifying and updating LMs in the HD map. 
    more » « less
  2. The ability to accurately predict public transit ridership demand benefits passengers and transit agencies. Agencies will be able to reallocate buses to handle under or over-utilized bus routes, improving resource utilization, and passengers will be able to adjust and plan their schedules to avoid overcrowded buses and maintain a certain level of comfort. However, accurately predicting occupancy is a non-trivial task. Various reasons such as heterogeneity, evolving ridership patterns, exogenous events like weather, and other stochastic variables, make the task much more challenging. With the progress of big data, transit authorities now have access to real-time passenger occupancy information for their vehicles. The amount of data generated is staggering. While there is no shortage in data, it must still be cleaned, processed, augmented, and merged before any useful information can be generated. In this paper, we propose the use and fusion of data from multiple sources, cleaned, processed, and merged together, for use in training machine learning models to predict transit ridership. We use data that spans a 2-year period (2020-2022) incorporating transit, weather, traffic, and calendar data. The resulting data, which equates to 17 million observations, is used to train separate models for the trip and stop level prediction. We evaluate our approach on real-world transit data provided by the public transit agency of Nashville, TN. We demonstrate that the trip level model based on Xgboost and the stop level model based on LSTM outperform the baseline statistical model across the entire transit service day. 
    more » « less
  3. The estimation of malaria parasite migration can play a vital role in informing elimination strategies by pinpointing regions with higher parasite migration that act as transmission sources, and that could be the focus of elimination interventions. Gene flow simulation methods such as Estimated Effective Migration Surfaces (EEMS) and Migration and Population-Size Surfaces (MAPS) use a Markov Chain Monte Carlo simulation-based approach to visualize a species' migration and diversity. These methods utilize georeferenced genomic data and present output in the form of migration contour maps. Despite their potential, there is uncertainty in EEMS and MAPS outputs when sampling locations are sparse - an aspect that remains under-explored in current research. We present a framework designed to systematically assess the impact of sample locations and sample size on migration contours in gene flow simulations that goes beyond the posterior probability map available in EEMS. We test our framework using publicly available genomic data collected from Cambodia and border regions of Thailand, Vietnam, and Laos during 2008-2013. The methodology leverages kernel density estimation and topological skeletons in conjunction with other spatial analysis methods to quantify the impact of sparse sample locations on gene flow simulations. Multiple sample resolutions were tested against a baseline resolution, and the findings highlight how migration contours vary with sampling resolution and how our approach can be applied to guide the production and mapping of reliable migration contours. Our research provides valuable insights about both the reliability and precision of model outputs when employing gene flow simulation techniques e.g., EEMS and MAPS, to estimate malaria parasite migration. The findings revealed that by employing our approach, we were able to maintain approximately 67% consistency between the contours and the reference dataset, even when utilizing only half of the sample locations. This knowledge will improve both the reliability and precision of these model outputs in future studies. 
    more » « less
  4. Translating information between the domains of systematics and conservation requires novel information management designs. Such designs should improve interactions across the trading zone between the domains, herein understood as the model according to which knowledge and uncertainty are productively translated in both directions (cf. Collins et al. 2019). Two commonly held attitudes stand in the way of designing a well-functioning systematics-to-conservation trading zone. On one side, there are calls to unify the knowledge signal produced by systematics, underpinned by the argument that such unification is a necessary precondition for conservation policy to be reliably expressed and enacted (e.g., Garnett et al. 2020). As a matter of legal scholarship, the argument for systematic unity by legislative necessity is principally false (Weiss 2003, MacNeil 2009, Chromá 2011), but perhaps effective enough as a strategy to win over audiences unsure about robust law-making practices in light of variable and uncertain knowledge. On the other side, there is an attitude that conservation cannot ever restrict the academic freedom of systematics as a scientific discipline (e.g., Raposo et al. 2017). This otherwise sound argument misses the mark in the context of designing a productive trading zone with conservation. The central interactional challenge is not whether the systematic knowledge can vary at a given time and/or evolve over time, but whether these signal dynamics are tractable in ways that actors can translate into robust maxims for conservation. Redesigning the trading zone should rest on the (historically validated) projection that systematics will continue to attract generations of inspired, productive researchers and broad-based societal support, frequently leading to protracted conflicts and dramatic shifts in how practioners in the field organize and identify organismal lineages subject to conservation. This confident outlook for systematics' future, in turn, should refocus the challenge of designing the trading zone as one of building better information services to model the concurrent conflicts and longer-term evolution of systematic knowledge. It would seem unreasonable to expect the International Union for Conservation of Nature (IUCN) Red List Index to develop better data science models for the dynamics of systematic knowledge (cf. Hoffmann et al. 2011) than are operational in the most reputable information systems designed and used by domain experts (Burgin et al. 2018). The reasonable challenge from conservation to systematics is not to stop being a science but to be a better data science. In this paper, we will review advances in biodiversity data science in relation to representing and reasoning over changes in systematic knowledge with computational logic, i.e., modeling systematic intelligence (Franz et al. 2016). We stress-test this approach with a use case where rapid systematic signal change and high stakes for conservation action intersect, i.e., the Malagasy mouse lemurs ( Microcebus É. Geoffroy, 1834 sec. Schüßler et al. 2020), where the number of recognized species-level concepts has risen from 2 to 25 in the span of 38 years (1982–2020). As much as scientifically defensible, we extend our modeling approach to the level of individual published occurrence records, where the inability to do so sometimes reflects substandard practice but more importantly reveals systemic inadequacies in biodiversity data science or informational modeling. In the absence of shared, sound theoretical foundations to assess taxonomic congruence or incongruence across treatments, and in the absence of biodiversity data platforms capable of propagating logic-enabled, scalable occurrence-to-concept identification events to produce alternative and succeeding distribution maps, there is no robust way to provide a knowledge signal from systematics to conservation that is both consistent in its syntax and acccurate in its semantics, in the sense of accurately reflecting the variation and uncertainty that exists across multiple systematic perspectives. Translating this diagnosis into new designs for the trading zone is only one "half" of the solution, i.e., a technical advancement that then would need to be socially endorsed and incentivized by systematic and conservation communities motivated to elevate their collaborative interactions and trade robustly in inherently variable and uncertain information. 
    more » « less
  5. With the increasing availability of GPS trajectory data, map construction algorithms have been developed that automatically construct road maps from this data. In order to assess the quality of such (constructed) road maps, the need for meaningful road map comparison algorithms becomes increasingly important. Indeed, different approaches for map comparison have been recently proposed; however, most of these approaches assume that the road maps are modeled as undirected embedded planar graphs. In this paper, we study map comparison algorithms for more realistic models of road maps: directed roads as well as weighted roads. In particular, we address two main questions: how close are the graphs to each other, and how close is the information presented by the graphs (i.e., traffic times, trajectories, and road type)? We propose new road network comparisons and give illustrative examples. Furthermore, our approaches do not only apply to road maps but can be used to compare other kinds of graphs as well. 
    more » « less