The process of regionalization involves clustering a set of spatial areas into spatially contiguous regions. Given the NP-hard nature of regionalization problems, all existing algorithms yield approximate solutions. To ascertain the quality of these approximations, it is crucial for domain experts to obtain statistically significant evidence on optimizing the objective function, in comparison to a random reference distribution derived from all potential sample solutions. In this paper, we propose a novel spatial regionalization problem, denoted as SISR (Statistical Inference for Spatial Regionalization), which generates random sample solutions with a predetermined region cardinality. The driving motivation behind SISR is to conduct statistical inference on any given regionalization scheme. To address SISR, we present a parallel technique named PRRP (P-Regionalization through Recursive Partitioning). PRRP operates over three phases: the region-growing phase constructs initial regions with a predetermined region cardinality, while the region merging and region-splitting phases ensure the spatial contiguity of unassigned areas, allowing for the growth of subsequent regions with predetermined cardinalities. An extensive evaluation shows the effectiveness of PRRP using various real datasets.
more »
« less
This content will become publicly available on August 15, 2026
Formalizing Dependence of Web Infrastructure
Over the past decade, Internet centralization and its implications for privacy, resilience, and innovation have become a topic of active debate. While the networking community informally agrees on the definition of centralization, we lack a formal metric for quantifying it, which has limited in-depth analysis. In this work, we introduce a rigorous statistical metric for Internet centralization. In doing so, we also uncover how regionalization—geopolitical dependence on the Internet—fundamentally affects centralization. We argue that centralization and regionalization are intertwined forms of dependence that both affect the lived experiences of users and should be jointly studied. We develop a suite of statistical tools, which we use to better understand dependence across three layers of web infrastructure—hosting providers, DNS infrastructure, certificate authorities—in 150 countries. We hope that this statistical toolkit can serve as the foundation for future analysis of Internet behavior.
more »
« less
- Award ID(s):
- 2319080
- PAR ID:
- 10656206
- Publisher / Repository:
- Proceedings of the ACM SIGCOMM 2025 Conference
- Date Published:
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
In this paper we consider the risks to Internet infrastructure in the US due to sea level rise. Our study is based on sea level incursion projections from the National Oceanic and Atmospheric Administration (NOAA) [12] and Internet infrastructure deployment data from Internet Atlas [24]. We align the data formats and assess risks in terms of the amount and type of infrastructure that will be under water in different time intervals over the next 100 years. We find that 4,067 miles of fiber conduit will be under water and 1,101 nodes (e.g., points of presence and colocation centers) will be surrounded by water in the next 15 years. We further quantify the risks of sea level rise by defining a metric that considers the combination of geographic scope and Internet infrastructure density. We use this metric to examine different regions and find that the New York, Miami, and Seattle metropolitan areas are at highest risk. We also quantify the risks to individual service provider infrastructures and find that CenturyLink, Inteliquent, and AT&T are at highest risk. While it is difficult to project the impact of countermeasures such as sea walls, our results suggest the urgency of developing mitigation strategies and alternative infrastructure deployments.more » « less
-
Spatial data regularly suffer from error and uncertainty, ranging from poorly georeferenced coordinate pairs to sampling error associated with American Community Survey data. Geographic information systems can amplify and propagate error and uncertainty through the abstraction and representation of spatial data, as can the manipulation, processing, and analysis of spatial data using exploratory and confirmatory statistical techniques. The purpose of this article is to explore and address uncertainty in regionalization, a fundamental spatial analytical method that aggregates spatial units (e.g., tracts) into a set of contiguous regions for strategic purposes, including school districting, habitat areas, and the like. Specifically, we develop a new regionalization method, theuncertain‐max‐p‐regionsproblem that explicitly incorporates attribute uncertainty and allows its impacts to be evaluated with a degree of statistical certainty. We also detail an efficient solution approach for dealing the problem. The results suggest that the developed problem can out‐perform existing regionalization approaches and that the addition of a measure of statistical confidence can help to facilitate more clarity in planning and policy decisions.more » « less
-
Legged animals still outperform many terrestrial robots due to the complex interplay of various component subsystems. Centralization is a potential integrated design axis to help improve the performance of legged robots in variable terrain environments. Centralization arises from the coupling of multiple limbs and joints through mechanics or feedback control. Strong couplings contribute to a whole-body coordinated response (centralized) and weak couplings result in localized responses (decentralized). Rarely are both mechanical and neural couplings considered together in designing centralization. In this study, we use an empirical information theory-based approach to evaluate the emergent centralization of a hexapod robot. We independently vary the mechanical and neural coupling through adjustable joint stiffness and variable coupling of leg controllers, respectively. We found an increase in centralization as neural coupling increased. Changes in mechanical coupling did not significantly affect centralization during walking, but did change the total information processing of the neuromechanical control architecture. Information-based centralization increased with robotic performance in terms of cost of transport and speed, implying that this may be a useful metric in robotic design.more » « less
-
Despite significant investments in access network infrastructure, universal access to high-quality Internet connectivity remains a challenge. Policymakers often rely on large-scale, crowdsourced measurement datasets to assess the distribution of access network performance across geographic areas. These decisions typically rest on the assumption that Internet performance is uniformly distributed within predefined social boundaries, such as zip codes, census tracts, or neighborhood units. However, this assumption may not be valid for two reasons: (1) crowdsourced measurements often exhibit non-uniform sampling densities within geographic areas; and (2) predefined social boundaries may not align with the actual boundaries of Internet infrastructure. In this paper, we present a spatial analysis on crowdsourced datasets for constructing stable boundaries for sampling Internet performance. We hypothesize that greater stability in sampling boundaries will reflect the true nature of Internet performance disparities than misleading patterns observed as a result of data sampling variations. We apply and evaluate a series of statistical techniques to: (1) aggregate Internet performance over geographic regions; (2) overlay interpolated maps with various sampling unit choices; and (3) spatially cluster boundary units to identify contiguous areas with similar performance characteristics. We assess the effectiveness of the techniques we apply by comparing the similarity of the resulting boundaries for monthly samples drawn from the dataset. Our evaluation shows that the combination of techniques we apply achieves higher similarity compared to directly calculating central measures of network metrics over census tracts or neighborhood boundaries. These findings underscore the important role of spatial modeling in accurately assessing and optimizing the distribution of Internet performance, which can better inform policy, network operations, and long-term planning decisions.more » « less
An official website of the United States government
