skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Zufle, Andreas"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. The increased availability of datasets during the COVID-19 pandemic enabled machine-learning approaches for modeling and forecasting infectious diseases. However, such approaches are known to amplify the bias in the data they are trained on. Bias in such input data like clinical case data for COVID-19 is difficult to measure due to disparities in testing availability, reporting standards, and healthcare access among different populations and regions. Furthermore, the way such biases may propagate through the modeling pipeline to decision-making is relatively unknown. Therefore, we present a system that leverages a highly detailed agent-based model (ABM) of infectious disease spread in a city to simulate the collection of biased clinical case data where the bias is known. Our system allows users to load either a pre-selected region or select their own (using OpenStreetMap data for the environment and census data for the population), specify population and infectious disease parameters, and the degree(s) to which different populations will be overrepresented or underrepresented in the case data. In addition to the system, we provide a large number of benchmark datasets that produce case data at different levels of bias for different regions. Wehope that infectious disease modelers will use these datasets to investigate how well their models are robust to data bias or whether their model is overfit to biased data. 
    more » « less
  2. null (Ed.)
  3. null (Ed.)
  4. null (Ed.)
    Location-Based Services are often used to find proximal Points of Interest PoI - e.g., nearby restaurants and museums, police stations, hospitals, etc. - in a plethora of applications. An important recently addressed variant of the problem not only considers the distance/proximity aspect, but also desires semantically diverse locations in the answer-set. For instance, rather than picking several close-by attractions with similar features - e.g., restaurants with similar menus; museums with similar art exhibitions - a tourist may be more interested in a result set that could potentially provide more diverse types of experiences, for as long as they are within an acceptable distance from a given (current) location. Towards that goal, in this work we propose a novel approach to efficiently retrieve a path that will maximize the semantic diversity of the visited PoIs that are within distance limits along a given road network. We introduce a novel indexing structure - the Diversity Aggregated R-tree, based on which we devise efficient algorithms to generate the answer-set - i.e., the recommended locations among a set of given PoIs - relying on a greedy search strategy. Our experimental evaluations conducted on real datasets demonstrate the benefits of proposed methodology over the baseline alternative approaches. 
    more » « less
  5. Our ability to extract knowledge from evolving spatial phenomena and make it actionable is often impaired by unreliable, erroneous, obsolete, imprecise, sparse, and noisy data. Integrating the impact of this uncertainty is a paramount when estimating the reliability/confidence of any time-varying query result from the underlying input data. The goal of this advanced seminar is to survey solutions for managing, querying and mining uncertain spatial and spatio-temporal data. We survey different models and show examples of how to efficiently enrich query results with reliability information. We discuss both analytical solutions as well as approximate solutions based on geosimulation. 
    more » « less
  6. Location-based social networks (LBSNs) have been studied extensively in recent years. However, utilizing real-world LBSN data sets yields several weaknesses: sparse and small data sets, privacy concerns, and a lack of authoritative ground-truth. To overcome these weaknesses, we leverage a large-scale LBSN simulation to create a framework to simulate human behavior and to create synthetic but realistic LBSN data based on human patterns of life. Such data not only captures the location of users over time but also their interactions via social networks. Patterns of life are simulated by giving agents (i.e., people) an array of “needs” that they aim to satisfy, e.g., agents go home when they are tired, to restaurants when they are hungry, to work to cover their financial needs, and to recreational sites to meet friends and satisfy their social needs. While existing real-world LBSN data sets are trivially small, the proposed framework provides a source for massive LBSN benchmark data that closely mimics the real-world. As such, it allows us to capture 100% of the (simulated) population without any data uncertainty, privacy-related concerns, or incompleteness. It allows researchers to see the (simulated) world through the lens of an omniscient entity having perfect data. Our framework is made available to the community. In addition, we provide a series of simulated benchmark LBSN data sets using different synthetic towns and real-world urban environments obtained from OpenStreetMap. The simulation software and data sets, which comprise gigabytes of spatio-temporal and temporal social network data, are made available to the research community. 
    more » « less
  7. An inherent challenge arising in any dataset containing information of space and/or time is uncertainty due to various sources of imprecision. Integrating the impact of the uncertainty is a paramount when estimating the reliability (confidence) of any query result from the underlying input data. To deal with uncertainty, solutions have been proposed independently in the geo-science and the data-science research community. This interdisciplinary tutorial bridges the gap between the two communities by providing a comprehensive overview of the different challenges involved in dealing with uncertain geo-spatial data, by surveying solutions from both research communities, and by identifying similarities, synergies and open research problems. 
    more » « less