skip to main content


Title: Mover: Generalizability Verification of Human Mobility Models via Heterogeneous Use Cases

Human mobility models typically produce mobility data to capture human mobility patterns individually or collectively based on real-world observations or assumptions, which are essential for many use cases in research and practice, e.g., mobile networking, autonomous driving, urban planning, and epidemic control. However, most existing mobility models suffer from practical issues like unknown accuracy and uncertain parameters in new use cases because they are normally designed and verified based on a particular use case (e.g., mobile phones, taxis, or mobile payments). This causes significant challenges for researchers when they try to select a representative human mobility model with appropriate parameters for new use cases. In this paper, we introduce a MObility VERification framework called MOVER to systematically measure the performance of a set of representative mobility models including both theoretical and empirical models based on a diverse set of use cases with various measures. Based on a taxonomy built upon spatial granularity and temporal continuity, we selected four representative mobility use cases (e.g., the vehicle tracking system, the camera-based system, the mobile payment system, and the cellular network system) to verify the generalizability of the state-of-the-art human mobility models. MOVER methodically characterizes the accuracy of five different mobility models in these four use cases based on a comprehensive set of mobility measures and provide two key lessons learned: (i) For the collective level measures, the finer spatial granularity of the user cases, the better generalization of the theoretical models; (ii) For the individual-level measures, the lower periodic temporal continuity of the user cases, the theoretical models typically generalize better than the empirical models. The verification results can help the research community to select appropriate mobility models and parameters in different use cases.

 
more » « less
Award ID(s):
1849238 1932223 1952096 1951890 2047822 2003874
NSF-PAR ID:
10484118
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
Date Published:
Journal Name:
Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
Volume:
5
Issue:
4
ISSN:
2474-9567
Page Range / eLocation ID:
1 to 21
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Data from the cellular network have been proved as one of the most promising way to understand large-scale human mobility for various ubiquitous computing applications due to the high penetration of cellphones and low collection cost. Existing mobility models driven by cellular network data suffer from sparse spatial-temporal observations because user locations are recorded with cellphone activities, e.g., calls, text, or internet access. In this paper, we design a human mobility recovery system called CellSense to take the sparse cellular billing data (CBR) as input and outputs dense continuous records to recover the sensing gap when using cellular networks as sensing systems to sense the human mobility. There is limited work on this kind of recovery systems at large scale because even though it is straightforward to design a recovery system based on regression models, it is very challenging to evaluate these models at large scale due to the lack of the ground truth data. In this paper, we explore a new opportunity based on the upgrade of cellular infrastructures to obtain cellular network signaling data as the ground truth data, which log the interaction between cellphones and cellular towers at signal levels (e.g., attaching, detaching, paging) even without billable activities. Based on the signaling data, we design a system CellSense for human mobility recovery by integrating collective mobility patterns with individual mobility modeling, which achieves the 35.3% improvement over the state-of-the-art models. The key application of our recovery model is to take regular sparse CBR data that a researcher already has, and to recover the missing data due to sensing gaps of CBR data to produce a dense cellular data for them to train a machine learning model for their use cases, e.g., next location prediction. 
    more » « less
  2. It takes great effort to manually or semi-automatically convert free-text phenotype narratives (e.g., morphological descriptions in taxonomic works) to a computable format before they can be used in large-scale analyses. We argue that neither a manual curation approach nor an information extraction approach based on machine learning is a sustainable solution to produce computable phenotypic data that are FAIR (Findable, Accessible, Interoperable, Reusable) (Wilkinson et al. 2016). This is because these approaches do not scale to all biodiversity, and they do not stop the publication of free-text phenotypes that would need post-publication curation. In addition, both manual and machine learning approaches face great challenges: the problem of inter-curator variation (curators interpret/convert a phenotype differently from each other) in manual curation, and keywords to ontology concept translation in automated information extraction, make it difficult for either approach to produce data that are truly FAIR. Our empirical studies show that inter-curator variation in translating phenotype characters to Entity-Quality statements (Mabee et al. 2007) is as high as 40% even within a single project. With this level of variation, curated data integrated from multiple curation projects may still not be FAIR. The key causes of this variation have been identified as semantic vagueness in original phenotype descriptions and difficulties in using standardized vocabularies (ontologies). We argue that the authors describing characters are the key to the solution. Given the right tools and appropriate attribution, the authors should be in charge of developing a project's semantics and ontology. This will speed up ontology development and improve the semantic clarity of the descriptions from the moment of publication. In this presentation, we will introduce the Platform for Author-Driven Computable Data and Ontology Production for Taxonomists, which consists of three components: a web-based, ontology-aware software application called 'Character Recorder,' which features a spreadsheet as the data entry platform and provides authors with the flexibility of using their preferred terminology in recording characters for a set of specimens (this application also facilitates semantic clarity and consistency across species descriptions); a set of services that produce RDF graph data, collects terms added by authors, detects potential conflicts between terms, dispatches conflicts to the third component and updates the ontology with resolutions; and an Android mobile application, 'Conflict Resolver,' which displays ontological conflicts and accepts solutions proposed by multiple experts. a web-based, ontology-aware software application called 'Character Recorder,' which features a spreadsheet as the data entry platform and provides authors with the flexibility of using their preferred terminology in recording characters for a set of specimens (this application also facilitates semantic clarity and consistency across species descriptions); a set of services that produce RDF graph data, collects terms added by authors, detects potential conflicts between terms, dispatches conflicts to the third component and updates the ontology with resolutions; and an Android mobile application, 'Conflict Resolver,' which displays ontological conflicts and accepts solutions proposed by multiple experts. Fig. 1 shows the system diagram of the platform. The presentation will consist of: a report on the findings from a recent survey of 90+ participants on the need for a tool like Character Recorder; a methods section that describes how we provide semantics to an existing vocabulary of quantitative characters through a set of properties that explain where and how a measurement (e.g., length of perigynium beak) is taken. We also report on how a custom color palette of RGB values obtained from real specimens or high-quality specimen images, can be used to help authors choose standardized color descriptions for plant specimens; and a software demonstration, where we show how Character Recorder and Conflict Resolver can work together to construct both human-readable descriptions and RDF graphs using morphological data derived from species in the plant genus Carex (sedges). The key difference of this system from other ontology-aware systems is that authors can directly add needed terms to the ontology as they wish and can update their data according to ontology updates. a report on the findings from a recent survey of 90+ participants on the need for a tool like Character Recorder; a methods section that describes how we provide semantics to an existing vocabulary of quantitative characters through a set of properties that explain where and how a measurement (e.g., length of perigynium beak) is taken. We also report on how a custom color palette of RGB values obtained from real specimens or high-quality specimen images, can be used to help authors choose standardized color descriptions for plant specimens; and a software demonstration, where we show how Character Recorder and Conflict Resolver can work together to construct both human-readable descriptions and RDF graphs using morphological data derived from species in the plant genus Carex (sedges). The key difference of this system from other ontology-aware systems is that authors can directly add needed terms to the ontology as they wish and can update their data according to ontology updates. The software modules currently incorporated in Character Recorder and Conflict Resolver have undergone formal usability studies. We are actively recruiting Carex experts to participate in a 3-day usability study of the entire system of the Platform for Author-Driven Computable Data and Ontology Production for Taxonomists. Participants will use the platform to record 100 characters about one Carex species. In addition to usability data, we will collect the terms that participants submit to the underlying ontology and the data related to conflict resolution. Such data allow us to examine the types and the quantities of logical conflicts that may result from the terms added by the users and to use Discrete Event Simulation models to understand if and how term additions and conflict resolutions converge. We look forward to a discussion on how the tools (Character Recorder is online at http://shark.sbs.arizona.edu/chrecorder/public) described in our presentation can contribute to producing and publishing FAIR data in taxonomic studies. 
    more » « less
  3. null (Ed.)
    Abstract In recent years, extreme shocks, such as natural disasters, are increasing in both frequency and intensity, causing significant economic loss to many cities around the world. Quantifying the economic cost of local businesses after extreme shocks is important for post-disaster assessment and pre-disaster planning. Conventionally, surveys have been the primary source of data used to quantify damages inflicted on businesses by disasters. However, surveys often suffer from high cost and long time for implementation, spatio-temporal sparsity in observations, and limitations in scalability. Recently, large scale human mobility data (e.g. mobile phone GPS) have been used to observe and analyze human mobility patterns in an unprecedented spatio-temporal granularity and scale. In this work, we use location data collected from mobile phones to estimate and analyze the causal impact of hurricanes on business performance. To quantify the causal impact of the disaster, we use a Bayesian structural time series model to predict the counterfactual performances of affected businesses ( what if the disaster did not occur? ), which may use performances of other businesses outside the disaster areas as covariates. The method is tested to quantify the resilience of 635 businesses across 9 categories in Puerto Rico after Hurricane Maria. Furthermore, hierarchical Bayesian models are used to reveal the effect of business characteristics such as location and category on the long-term resilience of businesses. The study presents a novel and more efficient method to quantify business resilience, which could assist policy makers in disaster preparation and relief processes. 
    more » « less
  4. Abstract In addition to measuring forecast accuracy in terms of errors in a tropical system’s forecast track and other meteorological characteristics, it is important to measure the impact of those errors on society. With this in mind, the authors designed a coupled natural–human modeling framework with high-level representations of the natural hazard (hurricane), the human system (information flow, evacuation decisions), the built environment (road infrastructure), and connections between elements (forecasts and warning information, traffic). Using the model, this article begins exploring how tropical cyclone forecast errors impact evacuations and, in doing so, builds toward the development of new verification approaches. Specifically, the authors implement track errors representative of 2007 and 2022, and create situations with unexpected rapid intensification and/or rapid onset, and evaluate their impact on evacuations across real and hypothetical forecast scenarios (e.g., Hurricane Irma, Hurricane Dorian making landfall across east Florida). The results provide first-order evidence that 1) reduced forecast track errors across the 2007–22 period translate to improvements in evacuation outcomes across these cases and 2) unexpected rapid intensification and/or rapid onset scenarios can reduce evacuation rates, and increase traffic, across the most impacted areas. In exploring these relationships, the results demonstrate how experiments with coupled natural–human models can offer a societally relevant complement to traditional metrics of forecast accuracy. In doing so, this work points toward further development of natural–human models and associated methodologies to address these types of questions and improve forecast verification across the weather enterprise. 
    more » « less
  5. Skateboarding as a method of transportation has become prevalent, which has increased the occurrence and likelihood of pedestrian–skateboarder collisions and near-collision scenarios in shared-use roadway areas. Collisions between pedestrians and skateboarders can result in significant injury. New approaches are needed to evaluate shared-use areas prone to hazardous pedestrian–skateboarder interactions, and perform real-time, in situ (e.g., on-device) predictions of pedestrian–skateboarder collisions as road conditions vary due to changes in land usage and construction. A mechanism called the Surrogate Safety Measures for skateboarder–pedestrian interaction can be computed to evaluate high-risk conditions on roads and sidewalks using deep learning object detection models. In this paper, we present the first ever skateboarder–pedestrian safety study leveraging deep learning architectures. We view and analyze state of the art deep learning architectures, namely the Faster R-CNN and two variants of the Single Shot Multi-box Detector (SSD) model to select the correct model that best suits two different tasks: automated calculation of Post Encroachment Time (PET) and finding hazardous conflict zones in real-time. We also contribute a new annotated data set that contains skateboarder–pedestrian interactions that has been collected for this study. Both our selected models can detect and classify pedestrians and skateboarders correctly and efficiently. However, due to differences in their architectures and based on the advantages and disadvantages of each model, both models were individually used to perform two different set of tasks. Due to improved accuracy, the Faster R-CNN model was used to automate the calculation of post encroachment time, whereas to determine hazardous regions in real-time, due to its extremely fast inference rate, the Single Shot Multibox MobileNet V1 model was used. An outcome of this work is a model that can be deployed on low-cost, small-footprint mobile and IoT devices at traffic intersections with existing cameras to perform on-device inferencing for in situ Surrogate Safety Measurement (SSM), such as Time-To-Collision (TTC) and Post Encroachment Time (PET). SSM values that exceed a hazard threshold can be published to an Message Queuing Telemetry Transport (MQTT) broker, where messages are received by an intersection traffic signal controller for real-time signal adjustment, thus contributing to state-of-the-art vehicle and pedestrian safety at hazard-prone intersections. 
    more » « less