skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A Hierarchical Approach for Geocoding Birthplaces in Temporally Continuous Crowd-Sourced Family Tree Data
Geocoding is a fundamental yet complex step in temporal studies due to constantly evolving administrative borders and place names, and the uncertainty of geographic and temporal information. For example, identifying locations within crowdsourced datasets, such as family trees, is complex because recorded place names may be uncertain, inaccurate, and contain varying spellings (for instance, full names or abbreviations) and in inconsistent formats, such as mentioning only the country, or state, or a combination of city, county, state, and country. Moreover, place names and administrative boundaries drastically change over time, adding another layer of complexity to the geocoding process of fine-scale places. This paper presents a workflow for geocoding birthplaces of US-born individuals from crowd-sourced genealogical files spanning from 1789 to 1940. We introduce a method that geocodes the birth locations at the finest possible level by matching places with corresponding historical administrative boundaries within a range of individuals’ birth years. Our preliminary study analyzing 72,335 trees with over 250 million individual records shows the potential of our approach for use in complex crowd-generated spatio-temporal datasets.  more » « less
Award ID(s):
2215568
PAR ID:
10514703
Author(s) / Creator(s):
;
Publisher / Repository:
CAGIS+UCGIS Symposium 2024
Date Published:
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Accurate citywide crowd activity prediction (CAP) can enable proactive crowd mobility management and timely responses to urban events, which has become increasingly important for a myriad of smart city planning and management purposes. However, complex correlations across the crowd activities, spatial and temporal urban environment features and theirinteractivedependencies, and relevant external factors (e.g., weather conditions) make it highly challenging to predict crowd activities accurately in terms of different venue categories (for instance, venues related to dining, services, and residence) and varying degrees (e.g., daytime and nighttime). To address the above concerns, we proposeSTICAP, a citywide spatio-temporal interactive crowd activity prediction approach. In particular,STICAPtakes in the location-based social network check-in data (e.g., from Foursquare/Gowalla) as the model inputs and forecasts the crowd activity within each time step for each venue category. Furthermore, we have integrated multiple levels of temporal discretization to interactively capture the relations with historical data. Then, three parallelResidual Spatial Attention Networks(RSAN) in theSpatial Attention Componentexploit the hourly, daily, and weekly spatial features of crowd activities, which are further fused and processed by theTemporal Attention Componentforinteractive CAP. Along with other external factors such as weather conditions and holidays,STICAPadaptively and accurately forecasts the final crowd activities per venue category, enabling potential activity recommendation and other smart city applications. Extensive experimental studies based on three different real-world crowd activity datasets have demonstrated that our proposedSTICAPoutperforms the baseline and state-of-the-art algorithms in CAP accuracy, with an average error reduction of 35.02%. 
    more » « less
  2. Abstract Seeking spatiotemporal patterns about how citizens interact with the urban space is critical for understanding how cities function. Such interactions were studied in various forms focusing on patterns of people’s presence, action, and transition in the urban environment, which are defined as human-urban interactions in this paper. Using human activity datasets that utilize mobile positioning technology for tracking the locations and movements of individuals, researchers developed stochastic models to uncover preferential return behaviors and recurrent transitional activity structures in human-urban interactions. Ad-hoc heuristics and spatial clustering methods were applied to derive meaningful activity places in those studies. However, the lack of semantic meaning in the recorded locations makes it difficult to examine the details about how people interact with different activity places. In this study, we utilized geographic context-aware Twitter data to investigate the spatiotemporal patterns of people’s interactions with their activity places in different urban settings. To test consistency of our findings, we used geo-located tweets to derive the activity places in Twitter users’ location histories over three major U.S. metropolitan areas: Greater Boston Area, Chicago, and San Diego, where the geographic context of each location was inferred from its closest land use parcel. The results showed striking spatial and temporal similarities in Twitter users’ interactions with their activity places among the three cities. By using entropy-based predictability measures, this study not only confirmed the preferential return behaviors as people tend to revisit a few highly frequented places but also revealed detailed characteristics of those activity places. 
    more » « less
  3. Crowd mobility prediction, in particular, forecasting flows at and transitions across different locations, is essential for crowd analytics and management in spacious environments featured with large gathering. We propose GAEFT, a novel crowd mobility analytics system based on the multi-task graph attention neural network to forecast crowd flows (inflows/outflows) and transitions. Specifically, we leverage the collective and sanitized campus Wi-Fi association data provided by our university information technology service and conduct a relatable case study. Our comprehensive data analysis reveals the important challenges of sparsity and skewness, as well as the complex spatio-temporal variations within the crowd mobility data. Therefore, we design a novel spatio-temporal clustering method to group Wi-Fi access points (APs) with similar transition features, and obtain more regular mobility features for model inputs. We then propose an attention-based graph embedding design to capture the correlations among the crowd flows and transitions, and jointly predict the AP-level flows as well as transitions across buildings and clusters through a multi-task formulation. Extensive experimental studies using more than 28 million association records collected during 2020-2021 academic year validate the excellent accuracy of GAEFT in forecasting dynamic and complex crowd mobility. 
    more » « less
  4. ABSTRACT Sustainability science seeks to understand human–nature interactions behind sustainability challenges, but has largely been place-based. Traditional sustainability efforts often solved problems in one place at the cost of other places, compromising global sustainability. The metacoupling framework offers a conceptual foundation and a holistic approach to integrating human–nature interactions within a place, as well as between adjacent places and between distant places worldwide. Its applications show broad utilities for advancing sustainability science with profound implications for global sustainable development. They have revealed effects of metacoupling on the performance, synergies, and trade-offs of United Nations Sustainable Development Goals (SDGs) across borders and across local to global scales; untangled complex interactions; identified new network attributes; unveiled spatio-temporal dynamics and effects of metacoupling; uncovered invisible feedbacks across metacoupled systems; expanded the nexus approach; detected and integrated hidden phenomena and overlooked issues; re-examined theories such as Tobler's First Law of Geography; and unfolded transformations among noncoupling, coupling, decoupling, and recoupling. Results from the applications are also helpful to achieve SDGs across space, amplify benefits of ecosystem restoration across boundaries and across scales, augment transboundary management, broaden spatial planning, boost supply chains, empower small agents in the large world, and shift from place-based to flow-based governance. Key topics for future research include cascading effects of an event in one place on other places both nearby and far away. Operationalizing the framework can benefit from further tracing flows across scales and space, uplifting the rigor of causal attribution, enlarging toolboxes, and elevating financial and human resources. Unleashing the full potential of the framework will generate more important scientific discoveries and more effective solutions for global justice and sustainable development. 
    more » « less
  5. Romanach, Stephanie S. (Ed.)
    Massive biological databases of species occurrences, or georeferenced locations where a species has been observed, are essential inputs for modeling present and future species distributions. Location accuracy is often assessed by determining whether the observation geocoordinates fall within the boundaries of the declared political divisions. This otherwise simple validation is complicated by the difficulty of matching political division names to the correct geospatial object. Spelling errors, abbreviations, alternative codes, and synonyms in multiple languages present daunting name disambiguation challenges. The inability to resolve political division names reduces usable data, and analysis of erroneous observations can lead to flawed results. Here, we present the Geographic Name Resolution Service (GNRS), an application for correcting, standardizing, and indexing world political division names. The GNRS resolves political division names against a reference database that combines names and codes from GeoNames with geospatial object identifiers from the Global Administrative Areas Database (GADM). In a trial resolution of political division names extracted from >270 million species occurrences, only 1.9%, representing just 6% of occurrences, matched exactly to GADM political divisions in their original form. The GNRS was able to resolve, completely or in part, 92% of the remaining 378,568 political division names, or 86% of the full biodiversity occurrence dataset. In assessing geocoordinate accuracy for >239 million species occurrences, resolution of political divisions by the GNRS enabled the detection of an order of magnitude more errors and an order of magnitude more error-free occurrences. By providing a novel solution to a significant data quality impediment, the GNRS liberates a tremendous amount of biodiversity data for quantitative biodiversity research. The GNRS runs as a web service and is accessible via an API, an R package, and a web-based graphical user interface. Its modular architecture is easily integrated into existing data validation workflows. 
    more » « less