NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

American Community Survey (ACS) Data Uncertainty and the Analysis of Segregation Dynamics

https://doi.org/10.1007/s11113-023-09754-6

Wei, Ran; Knaap, Elijah; Rey, Sergio (February 2023, Population Research and Policy Review)

Abstract American Community Survey (ACS) data have become the workhorse for the empirical analysis of segregation in the U.S.A. during the past decade. The increased frequency the ACS offers over the 10-year Census, which is the main reason for its popularity, comes with an increased level of uncertainty in the published estimates due to the reduced sampling ratio of ACS (1:40 households) relative to the Census (1:6 households). This paper introduces a new approach to integrate ACS data uncertainty into the analysis of segregation. Our method relies on variance replicate estimates for the 5-year ACS and advances over existing approaches by explicitly taking into account the covariance between ACS estimates when developing sampling distributions for segregation indices. We illustrate our approach with a study of comparative segregation dynamics for 29 metropolitan statistical areas in California, using the 2010–2014 and 2015–2019. Our methods yield different results than the simulation technique described by Napierala and Denton (Demography 54(1):285–309, 2017). Taking the ACS estimate covariance into account yields larger error margins than those generated with the simulated approach when the number of census tracts is large and minority percentage is low, and the converse is true when the number of census tracts is small and minority percentage is high.
more » « less
Full Text Available
Reverse spatial top-k keyword queries

https://doi.org/10.1007/s00778-022-00759-9

Ahmed, Pritom; Eldawy, Ahmed; Hristidis, Vagelis; Tsotras, Vassilis J. (July 2022, The VLDB Journal)

Abstract We introduce theReverseSpatial Top-kKeyword (RSK)query, which is defined as:given a query term q, an integer k and a neighborhood size find all the neighborhoods of that size where q is in the top-k most frequent terms among the social posts in those neighborhoods. An obvious approach would be to partition the dataset with a uniform grid structure of a given cell size and identify the cells where this term is in the top-k most frequent keywords. However, this answer would be incomplete since it only checks for neighborhoods that are perfectly aligned with the grid. Furthermore, for every neighborhood (square) that is an answer, we can define infinitely more result neighborhoods by minimally shifting the square without including more posts in it. To address that, we need to identify contiguous regions where any point in the region can be the center of a neighborhood that satisfies the query. We propose an algorithm to efficiently answer an RSK query using an index structure consisting of a uniform grid augmented by materialized lists of term frequencies. We apply various optimizations that drastically improve query latency against baseline approaches. We also provide a theoretical model to choose the optimal cell size for the index to minimize query latency. We further examine a restricted version of the problem (RSKR) that limits the scope of the answer and propose efficientapproximatealgorithms. Finally, we examine how parallelism can improve performance by balancing the workload using a smartload slicingtechnique. Extensive experimental performance evaluation of the proposed methods using real Twitter datasets and crime report datasets, shows the efficiency of our optimizations and the accuracy of the proposed theoretical model.
more » « less
The max‐ p ‐compact‐regions problem

https://doi.org/10.1111/tgis.12874

Feng, Xin; Rey, Sergio; Wei, Ran (November 2021, Transactions in GIS)

Abstract The max‐p‐compact‐regions problem involves the aggregation of a set of small areas into an unknown maximum number (p) of compact, homogeneous, and spatially contiguous regions such that a regional attribute value is higher than a predefined threshold. The max‐p‐compact‐regions problem is an extension of the max‐p‐regions problem accounting for compactness. The max‐p‐regions model has been widely used to define study regions in many application cases since it allows users to specify criteria and then to identify a regionalization scheme. However, the max‐p‐regions model does not consider compactness even though compactness is usually a desirable goal in regionalization, implying ideal accessibility and apparent homogeneity. This article discusses how to integrate a compactness measure into the max‐pregionalization process by constructing a multiobjective optimization model that maximizes the number of regions while optimizing the compactness of identified regions. An efficient heuristic algorithm is developed to address the computational intensity of the max‐p‐compact‐regions problem so that it can be applied to large‐scale practical regionalization problems. This new algorithm will be implemented in the open‐source Python Spatial Analysis Library. One hypothetical and one practical application of the max‐p‐compact‐regions problem are introduced to demonstrate the effectiveness and efficiency of the proposed algorithm.
more » « less
SGPAC: Generalized Scalable Spatial GroupBy Aggregations over Complex Polygons

https://doi.org/10.1007/s10707-023-00491-8

Abdelhafeez, Laila; Magdy, Amr; Tsotras, Vassilis J. (October 2023, GeoInformatica)

This paper studies the spatial group-by query over complex polygons. Given a set of spatial points and a set of polygons, the spatial group-by query returns the number of points that lie within the boundaries of each polygon. Groups are selected from a set of non-overlapping complex polygons, typically in the order of thousands, while the input is a large-scale dataset that contains hundreds of millions or even billions of spatial points. This problem is challenging because real polygons (like counties, cities, postal codes, voting regions, etc.) are described by very complex boundaries. We propose a highly-parallelized query processing framework to efficiently compute the spatial group-by query on highly skewed spatial data. We also propose an effective query optimizer that adaptively assigns the appropriate processing scheme based on the query polygons. Our experimental evaluation with real data and queries has shown significant superiority over all existing techniques.
more » « less
Full Text Available
Scalable Overlay Operations over DCEL Polygon Layers

https://doi.org/10.1145/3609956.3609964

Calderon-Romero, Andres; Tsotras, Vassilis J.; Magdy, Amr (August 2023, International Symposium on Spatial and Temporal Data)

ABSTRACT The Doubly Connected Edge List (DCEL) is an edge-list structure that has been widely utilized in spatial applications for planar topological computations. An important operation is the overlay which combines the DCELs of two input layers and can easily support spatial queries like the intersection, union and difference between these layers. However, existing sequential implementations for computing the overlay do not scale and fail to complete for large datasets (for example the US census tracks). In this paper we propose a distributed and scalable way to compute the overlay operation and its related supported queries. We address the issues involved in efficiently distributing the overlay operator and over various optimizations that improve performance. Our scalable solution can compute the overlay of very large real datasets (32M edges) in few minutes.
more » « less
Full Text Available
DDCEL: Efficient Distributed Doubly Connected Edge List for Large Spatial Networks

https://doi.org/10.1109/MDM58254.2023.00029

Abdelhafeez, Laila; Magdy, Amr; Tsotras, Vassilis J. (July 2023, IEEE International Conference on Mobile Data Management (MDM))

Abstract—The Doubly Connected Edge List (DCEL) is a popular data structure for representing planar subdivisions and is used to accelerate spatial applications like map overlay, graph simplification, and subdivision traversal. Current DCEL imple- mentations assume a standalone machine environment, which does not scale when processing the large dataset sizes that abound in today’s spatial applications. This paper proposes a Distributed Doubly Connected Edge List (DDCEL) data structure extending the DCEL to a distributed environment. The DDCEL constructor undergoes a two-phase paradigm to generate the subdivision’s vertices, half-edges, and faces. After spatially partitioning the input data, the first phase runs the sequential DCEL construction algorithm on each data partition in parallel. The second phase then iteratively merges information from multiple data parti- tions to generate the shared data structure. Our experimental evaluation with real data of road networks of up to 563 million line segments shows significant performance advantages of the proposed approach over the existing techniques.
more » « less
Full Text Available
Health disparity in the spread of COVID-19: Evidence from social distancing, risk of interactions, and access to testing

https://doi.org/10.1016/j.healthplace.2023.103031

Wei, Ran; Zhang, Yujia; Gao, Song; Brown, Brandon J.; Hu, Songhua; Link, Bruce G. (July 2023, Health & Place)

Full Text Available
U-ASK: a unified architecture for kNN spatial-keyword queries supporting negative keyword predicates

https://doi.org/10.1145/3557915.3560975

Liu, Yongyi; Magdy, Amr (November 2022, The International Conference on Advances in Geographic Information Systems)

Full Text Available
The Legacy of Redlining: A Spatial Dynamics Perspective

https://doi.org/10.1177/01600176221116566

Rey, Sergio Joseph; Knaap, Elijah (August 2022, International Regional Science Review)

This paper investigates the long-term impacts of the federal Home Owners’ Loan Corporation (HOLC) mortgage risk assessment maps on the spatial dynamics of recent income and racial distributions in California metropolitan areas over the 1990-2010 period. We combine historical HOLC boundaries with modern Census tract data and apply recently developed methods of spatial distribution dynamics to examine if legacy impacts are reflected in recent urban dynamics. Cities with HOLC assessments are found to have higher levels of isolation segregation than the non-HOLC group, but no difference in unevenness segregation between the two groups of cities are found. We find no difference in income or racial and ethnic distributional dynamics between the two groups of cities over the period. At the intra-urban scale, we find that the intersectionality of residing in a C or D graded tract that is also a low-income tract falls predominately upon the minority populations in these eight HOLC cities. Our findings indicate that neighborhoods with poor housing markets and high minority concentrations rarely experience a dramatic change in either their racial and ethnic or socioeconomic compositions—and that negative externalities (e.g. lower home prices and greater segregation levels) emanate from these neighborhoods, with inertia spilling over into nearby zones.
more » « less
Reducing racial segregation of public school districts

https://doi.org/10.1016/j.seps.2022.101415

Wei, Ran; Feng, Xin; Rey, Sergio; Knaap, Elijah (August 2022, Socio-Economic Planning Sciences)

Full Text Available

« Prev Next »

Search for: All records