NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

An Infectious Disease Spread Simulation to Control Data Bias

https://doi.org/10.1145/3678717.3691293

Kong, Ruochen; Anderson, Taylor; Heslop, David; Zufle, Andreas (October 2024, ACM)

The increased availability of datasets during the COVID-19 pandemic enabled machine-learning approaches for modeling and forecasting infectious diseases. However, such approaches are known to amplify the bias in the data they are trained on. Bias in such input data like clinical case data for COVID-19 is difficult to measure due to disparities in testing availability, reporting standards, and healthcare access among different populations and regions. Furthermore, the way such biases may propagate through the modeling pipeline to decision-making is relatively unknown. Therefore, we present a system that leverages a highly detailed agent-based model (ABM) of infectious disease spread in a city to simulate the collection of biased clinical case data where the bias is known. Our system allows users to load either a pre-selected region or select their own (using OpenStreetMap data for the environment and census data for the population), specify population and infectious disease parameters, and the degree(s) to which different populations will be overrepresented or underrepresented in the case data. In addition to the system, we provide a large number of benchmark datasets that produce case data at different levels of bias for different regions. Wehope that infectious disease modelers will use these datasets to investigate how well their models are robust to data bias or whether their model is overfit to biased data.
more » « less
Full Text Available
Using Generative Adversarial Networks to Assist Synthetic Population Creation for Simulations

https://doi.org/10.23919/ANNSIM55834.2022.9859422

Kotnana, Srihan; Han, David; Anderson, Taylor; Zufle, Andreas; Kavak, Hamdi (July 2022, 2022 Annual Modeling and Simulation Conference (ANNSIM))

Full Text Available
PhyloView: A System to Visualize the Ecology of Infectious Diseases Using Phylogenetic Data

https://doi.org/10.1109/MDM55031.2022.00051

Le, Minh Tri; Attaway, David; Anderson, Taylor; Kavak, Hamdi; Roess, Amira; Zufle, Andreas (June 2022, 2022 23rd IEEE International Conference on Mobile Data Management (MDM))

Full Text Available
Traffic Flow Estimation using Probe Vehicle Data

https://doi.org/10.1109/DSAA49011.2020.00073

Gkountouna, Olga; Pfoser, Dieter; Zufle, Andreas (October 2020, 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA))
null (Ed.)
Full Text Available
Vehicle Relocation for Ride-Hailing

https://doi.org/10.1109/DSAA49011.2020.00074

Kim, Joon-Seok; Pfoser, Dieter; Zufle, Andreas (October 2020, 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA))
null (Ed.)
Full Text Available
Semantically Diverse Path Search

https://doi.org/10.1109/MDM48529.2020.00028

Teng, Xu; Trajcevski, Goce; Kim, Joon-Seok; Zufle, Andreas (June 2020, 21st {IEEE} International Conference on Mobile Data Management, {MDM} 2020, Versailles, France, June 30 - July 3, 2020)
null (Ed.)
Location-Based Services are often used to find proximal Points of Interest PoI - e.g., nearby restaurants and museums, police stations, hospitals, etc. - in a plethora of applications. An important recently addressed variant of the problem not only considers the distance/proximity aspect, but also desires semantically diverse locations in the answer-set. For instance, rather than picking several close-by attractions with similar features - e.g., restaurants with similar menus; museums with similar art exhibitions - a tourist may be more interested in a result set that could potentially provide more diverse types of experiences, for as long as they are within an acceptable distance from a given (current) location. Towards that goal, in this work we propose a novel approach to efficiently retrieve a path that will maximize the semantic diversity of the visited PoIs that are within distance limits along a given road network. We introduce a novel indexing structure - the Diversity Aggregated R-tree, based on which we devise efficient algorithms to generate the answer-set - i.e., the recommended locations among a set of given PoIs - relying on a greedy search strategy. Our experimental evaluations conducted on real datasets demonstrate the benefits of proposed methodology over the baseline alternative approaches.
more » « less
Full Text Available
Managing Uncertainty in Evolving Geo-Spatial Data

https://doi.org/10.1109/MDM48529.2020.00021

Zufle, Andreas; Trajcevski, Goce; Pfoser, Dieter; Kim, Joon-Seok (June 2020, 21st IEEE International Conference on Mobile Data Management (MDM))

Our ability to extract knowledge from evolving spatial phenomena and make it actionable is often impaired by unreliable, erroneous, obsolete, imprecise, sparse, and noisy data. Integrating the impact of this uncertainty is a paramount when estimating the reliability/confidence of any time-varying query result from the underlying input data. The goal of this advanced seminar is to survey solutions for managing, querying and mining uncertain spatial and spatio-temporal data. We survey different models and show examples of how to efficiently enrich query results with reliability information. We discuss both analytical solutions as well as approximate solutions based on geosimulation.
more » « less
Full Text Available
Location-Based Social Network Data Generation Based on Patterns of Life

https://doi.org/10.1109/MDM48529.2020.00038

Kim, Joon-Seok; Jin, Hyunjee; Kavak, Hamdi; Rouly, Ovi Chris; Crooks, Andrew; Pfoser, Dieter; Wenk, Carola; Zufle, Andreas (June 2020, IEEE International Conference on Mobile Data Management (MDM’20))

Location-based social networks (LBSNs) have been studied extensively in recent years. However, utilizing real-world LBSN data sets yields several weaknesses: sparse and small data sets, privacy concerns, and a lack of authoritative ground-truth. To overcome these weaknesses, we leverage a large-scale LBSN simulation to create a framework to simulate human behavior and to create synthetic but realistic LBSN data based on human patterns of life. Such data not only captures the location of users over time but also their interactions via social networks. Patterns of life are simulated by giving agents (i.e., people) an array of “needs” that they aim to satisfy, e.g., agents go home when they are tired, to restaurants when they are hungry, to work to cover their financial needs, and to recreational sites to meet friends and satisfy their social needs. While existing real-world LBSN data sets are trivially small, the proposed framework provides a source for massive LBSN benchmark data that closely mimics the real-world. As such, it allows us to capture 100% of the (simulated) population without any data uncertainty, privacy-related concerns, or incompleteness. It allows researchers to see the (simulated) world through the lens of an omniscient entity having perfect data. Our framework is made available to the community. In addition, we provide a series of simulated benchmark LBSN data sets using different synthetic towns and real-world urban environments obtained from OpenStreetMap. The simulation software and data sets, which comprise gigabytes of spatio-temporal and temporal social network data, are made available to the research community.
more » « less
Full Text Available
Handling Uncertainty in Geo-Spatial Data

https://doi.org/10.1109/ICDE.2017.212

Zufle, Andreas; Trajcevski, Goce; Pfoser, Dieter; Renz, Matthias; Rice, Matthew T.; Leslie, Timothy; Delamater, Paul; Emrich, Tobias (April 2017, 33rd IEEE International Conference on Data Engineering (ICDE))

An inherent challenge arising in any dataset containing information of space and/or time is uncertainty due to various sources of imprecision. Integrating the impact of the uncertainty is a paramount when estimating the reliability (confidence) of any query result from the underlying input data. To deal with uncertainty, solutions have been proposed independently in the geo-science and the data-science research community. This interdisciplinary tutorial bridges the gap between the two communities by providing a comprehensive overview of the different challenges involved in dealing with uncertain geo-spatial data, by surveying solutions from both research communities, and by identifying similarities, synergies and open research problems.
more » « less
Full Text Available

Search for: All records