skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Beyond Data Points: Regionalizing Crowdsourced Latency Measurements
Despite significant investments in access network infrastructure, universal access to high-quality Internet connectivity remains a challenge. Policymakers often rely on large-scale, crowdsourced measurement datasets to assess the distribution of access network performance across geographic areas. These decisions typically rest on the assumption that Internet performance is uniformly distributed within predefined social boundaries, such as zip codes, census tracts, or neighborhood units. However, this assumption may not be valid for two reasons: (1) crowdsourced measurements often exhibit non-uniform sampling densities within geographic areas; and (2) predefined social boundaries may not align with the actual boundaries of Internet infrastructure. In this paper, we present a spatial analysis on crowdsourced datasets for constructing stable boundaries for sampling Internet performance. We hypothesize that greater stability in sampling boundaries will reflect the true nature of Internet performance disparities than misleading patterns observed as a result of data sampling variations. We apply and evaluate a series of statistical techniques to: (1) aggregate Internet performance over geographic regions; (2) overlay interpolated maps with various sampling unit choices; and (3) spatially cluster boundary units to identify contiguous areas with similar performance characteristics. We assess the effectiveness of the techniques we apply by comparing the similarity of the resulting boundaries for monthly samples drawn from the dataset. Our evaluation shows that the combination of techniques we apply achieves higher similarity compared to directly calculating central measures of network metrics over census tracts or neighborhood boundaries. These findings underscore the important role of spatial modeling in accurately assessing and optimizing the distribution of Internet performance, which can better inform policy, network operations, and long-term planning decisions.  more » « less
Award ID(s):
2324515
PAR ID:
10613794
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
ACM Digital Library
Date Published:
Journal Name:
Proceedings of the ACM on Measurement and Analysis of Computing Systems
Volume:
8
Issue:
3
ISSN:
2476-1249
Page Range / eLocation ID:
1 to 24
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Susceptibility to infectious diseases such as COVID-19 depends on how those diseases spread. Many studies have examined the decrease in COVID-19 spread due to reduction in travel. However, less is known about how much functional geographic regions, which capture natural movements and social interactions, limit the spread of COVID-19. To determine boundaries between functional regions, we apply community-detection algorithms to large networks of mobility and social-media connections to construct geographic regions that reflect natural human movement and relationships at the county level in the coterminous United States. We measure COVID-19 case counts, case rates, and case-rate variations across adjacent counties and examine how often COVID-19 crosses the boundaries of these functional regions. We find that regions that we construct using GPS-trace networks and especially commute networks have the lowest COVID-19 case rates along the boundaries, so these regions may reflect natural partitions in COVID-19 transmission. Conversely, regions that we construct from geolocated Facebook friendships and Twitter connections yield less effective partitions. Our analysis reveals that regions that are derived from movement flows are more appropriate geographic units than states for making policy decisions about opening areas for activity, assessing vulnerability of populations, and allocating resources. Our insights are also relevant for policy decisions and public messaging in future emergency situations. 
    more » « less
  2. This dataset contains tabular data at three scales (city, tract, and synoptic site) and related vector shapefiles (for watersheds or buffers around synoptic sites) for areas included in the Carbon in Urban River Biogeochemistry Project (CURB) to assess how social, built, and biophysical factors shape aquatic functions. The city scale included 486 urban areas in the continental United States with greater than 50,000 residents. Tabular data are provided for each urban area (CURB_CensusUrbanArea.csv) and all U.S. Census tracts within seven urban areas (Atlanta, GA, Boston, MA, Miami, FL, Phoenix, AZ, Portland, OR, Salt Lake City, UT, and San Francisco, CA; CURB_CensusTract.csv) to characterize a range of social, built, and biophysical factors. In six focal cities (Baltimore, MD, Boston, MA, Atlanta, GA, Miami, FL, Salt Lake City, UT, and Portland, OR) up to 100 sites were selected for synoptic water quality sampling. For each synoptic site tabular data (CURB_SynopticSite.csv) are provided to characterize a range of social, built, and biophysical factors within the watershed (Atlanta, Baltimore, Boston, Portland, Salt Lake City) or within a buffer of the site (Miami). Vector shapefiles are provided for the watershed boundaries (CURB_Synoptic_Watersheds.zip) for all synoptic sites in each city except Miami, FL where 400-m buffers (CURB_Miami_Synoptic_Buffers.zip) around the synoptic site were used. 
    more » « less
  3. Abstract Continued climate change is increasing the frequency, severity, and duration of populations’ high temperature exposures. Indoor cooling is a key adaptation, especially in urban areas, where heat extremes are intensified—the urban heat island effect (UHI)—making residential air conditioning (AC) availability critical to protecting human health. In the United States, the differences in residential AC prevalence from one metropolitan area to another is well understood, but its intra-urban variation is poorly characterized, obscuring neighborhood-scale variability in populations’ heat vulnerability and adaptive capacity. We address this gap by constructing empirically derived probabilities of residential AC for 45,995 census tracts across 115 metropolitan areas. Within cities, AC is unequally distributed, with census tracts in the urban “core” exhibiting systematically lower prevalence than their suburban counterparts. Moreover, this disparity correlates strongly with multiple indicators of social vulnerability and summer daytime surface UHI intensity, highlighting the challenges that vulnerable urban populations face in adapting to climate-change driven heat stress amplification. 
    more » « less
  4. The use of crowdsourced data has been finding practical use for enhancing situational awareness during disasters. While recent studies have shown promising results regarding the potential of crowdsourced data (such as user-generated flood reports) for flash flood mapping and situational awareness, little attention has been paid to data imbalance issues that could introduce biases in data and assessment. To address this gap, in this study, we examine biases present in crowdsourced reports to identify data imbalance with a goal of improving disaster situational awareness. Three biases are examined: sample bias, spatial bias, and demographic bias. To examine these biases, we analyzed reported flooding from 3-1-1 reports (which is a citizen hotline allowing the community to report problems such as flooding) and Waze reports (which is a GPS navigation app that allows drivers to report flooded roads) with respect to FEMA damage data collected in the aftermaths of Tropical Storm Imelda in Harris County, Texas, in 2019 and Hurricane Ida in New York City in 2021. First, sample bias is assessed by expanding the flood-related categories in 3-1-1 reports. Integrating other flooding related topics into the Global Moran's I and Local Indicator of Spatial Association (LISA) revealed more communities that were impacted by floods. To examine spatial bias, we perform the LISA and BI-LISA tests on the data sets—FEMA damage, 3-1-1 reports, and Waze reports—at the census tract level and census block group level. By looking at two geographical aggregations, we found that the larger spatial aggregations, census tracts, show less data imbalance in the results. Through a regression analysis, we found that 3-1-1 reports and Waze reports have data imbalance limitations in areas where minority populations and single parent households reside. The findings of this study advance understanding of data imbalance and biases in crowdsourced datasets that are growingly used for disaster situational awareness. Through addressing data imbalance issues, researchers and practitioners can proactively mitigate biases in crowdsourced data and prevent biased and inequitable decisions and actions. 
    more » « less
  5. Abstract We report on the successful completion of a project to upgrade the positional accuracy of every response to the 1990, 2000, and 2010 U.S. decennial censuses. The resulting data set, called Optimized Spatial Census Information Linked Across Time (OSCILAT), resides within the restricted-access data warehouse of the Federal Statistical Research Data Center (FSRDC) system where it is available for use with approval from the U.S. Census Bureau. OSCILAT greatly improves the accuracy and completeness of spatial information for older censuses conducted prior to major quality improvements undertaken by the Bureau. Our work enables more precise spatial and longitudinal analysis of census data and supports exact tabulations of census responses for arbitrary spatial units, including tabulating responses from 1990, 2000, and 2010 within 2020 block boundaries for precise measures of change over time for small geographic areas. 
    more » « less