Recent advances in quantitative tools for examining urban morphology enable the development of morphometrics that can characterize the size, shape, and placement of buildings; the relationships between them; and their association with broader patterns of development. Although these methods have the potential to provide substantial insight into the ways in which neighborhood morphology shapes the socioeconomic and demographic characteristics of neighborhoods and communities, this question is largely unexplored. Using building footprints in five of the ten largest U.S. metropolitan areas (Atlanta, Boston, Chicago, Houston, and Los Angeles) and the opensource R package,
We introduce the
 NSFPAR ID:
 10374636
 Publisher / Repository:
 Springer Science + Business Media
 Date Published:
 Journal Name:
 The VLDB Journal
 ISSN:
 10668888
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this

Xu, Gang (Ed.)
foot , we examine how neighborhood morphology differs across U.S. metropolitan areas and across the urbanexurban landscape. Principal components analysis, unsupervised classification (Kmeans), and Ordinary Least Squares regression analysis are used to develop a morphological typology of neighborhoods and to examine its association with the spatial, socioeconomic, and demographic characteristics of census tracts. Our findings illustrate substantial variation in the morphology of neighborhoods, both across the five metropolitan areas as well as between central cities, suburbs, and the urban fringe within each metropolitan area. We identify five different types of neighborhoods indicative of different stages of development and distributed unevenly across the urban landscape: these include lowdensity neighborhoods on the urban fringe; mixed use and highdensity residential areas in central cities; and uniform residential neighborhoods in suburban cities. Results from regression analysis illustrate that the prevalence of each of these forms is closely associated with variation in socioeconomic and demographic characteristics such as population density, the prevalence of multifamily housing, and income, race/ethnicity, homeownership, and commuting by car. We conclude by discussing the implications of our findings and suggesting avenues for future research on neighborhood morphology, including ways that it might provide insight into issues such as zoning and land use, housing policy, and residential segregation. 
Motivated by an attempt to understand the formation and development of (human) language, we introduce a "distributed compression" problem. In our problem a sequence of pairs of players from a set of K players are chosen and tasked to communicate messages drawn from an unknown distribution Q. Arguably languages are created and evolve to compress frequently occurring messages, and we focus on this aspect. The only knowledge that players have about the distribution Q is from previously drawn samples, but these samples differ from player to player. The only common knowledge between the players is restricted to a common prior distribution P and some constant number of bits of information (such as a learning algorithm). Letting T_eps denote the number of iterations it would take for a typical player to obtain an epsapproximation to Q in total variation distance, we ask whether T_eps iterations suffice to compress the messages down roughly to their entropy and give a partial positive answer. We show that a natural uniform algorithm can compress the communication down to an average cost per message of O(H(Q) + log (D(P  Q) + O(1)) in $\tilde{O}(T_eps)$ iterations while allowing for O(eps)error, where D(.  .) denotes the KLdivergence between distributions. For large divergences this compares favorably with the static algorithm that ignores all samples and compresses down to H(Q) + D(P  Q) bits, while not requiring (T_eps . K) iterations that it would take players to develop optimal but separate compressions for each pair of players. Along the way we introduce a "datastructural" view of the task of communicating with a natural language and show that our natural algorithm can also be implemented by an efficient data structure, whose storage is comparable to the storage requirements of Q and whose query complexity is comparable to the lengths of the message to be compressed. Our results give a plausible mathematical analogy to the mechanisms by which human languages get created and evolve, and in particular highlights the possibility of coordination towards a joint task (agreeing on a language) while engaging in distributed learning.more » « less

The ranked (or top
k ) document retrieval problem is defined as follows: preprocess a collection{T_{1},T_{2},… ,T_{d}} ofd strings (called documents) of total lengthn into a data structure, such that for any given query(P,k) , whereP is a string (called pattern) of lengthp ≥ 1 andk ∈ [1,d] is an integer, the identifiers of thosek documents that are most relevant toP can be reported, ideally in the sorted order of their relevance. The seminal work by Hon et al. [FOCS 2009 and Journal of the ACM 2014] presented anO(n) space (in words) data structure withO(p+k logk) query time. The query time was later improved toO(p+k) [SODA 2012] and further toO(p/ log_{σn+k}) [SIAM Journal on Computing 2017] by Navarro and Nekrich, whereσ is the alphabet size. We revisit this problem in the external memory model and present three data structures. The first one takesO(n) space and answer queries inO(p/B + log_{B}n + k/B+ log^{*}(n/B) ) I/Os, whereB is the block size. The second one takesO(n log^{*}(n/B) ) space and answer queries in optimalO(p/B + log_{B}n + k/B) I/Os. In both cases, the answers are reported in the unsorted order of relevance. To handle sorted topk document retrieval, we present anO(n log(d/B)) space data structure with optimal query cost. 
null (Ed.)Consider an algorithm performing a computation on a huge random object (for example a random graph or a "long" random walk). Is it necessary to generate the entire object prior to the computation, or is it possible to provide query access to the object and sample it incrementally "onthefly" (as requested by the algorithm)? Such an implementation should emulate the random object by answering queries in a manner consistent with an instance of the random object sampled from the true distribution (or close to it). This paradigm is useful when the algorithm is sublinear and thus, sampling the entire object up front would ruin its efficiency. Our first set of results focus on undirected graphs with independent edge probabilities, i.e. each edge is chosen as an independent Bernoulli random variable. We provide a general implementation for this model under certain assumptions. Then, we use this to obtain the first efficient local implementations for the ErdösRényi G(n,p) model for all values of p, and the Stochastic Block model. As in previous localaccess implementations for random graphs, we support VertexPair and NextNeighbor queries. In addition, we introduce a new RandomNeighbor query. Next, we give the first localaccess implementation for AllNeighbors queries in the (sparse and directed) Kleinberg’s SmallWorld model. Our implementations require no preprocessing time, and answer each query using O(poly(log n)) time, random bits, and additional space. Next, we show how to implement random Catalan objects, specifically focusing on Dyck paths (balanced random walks on the integer line that are always nonnegative). Here, we support Height queries to find the location of the walk, and FirstReturn queries to find the time when the walk returns to a specified location. This in turn can be used to implement NextNeighbor queries on random rooted ordered trees, and MatchingBracket queries on random well bracketed expressions (the Dyck language). Finally, we introduce two features to define a new model that: (1) allows multiple independent (and even simultaneous) instantiations of the same implementation, to be consistent with each other without the need for communication, (2) allows us to generate a richer class of random objects that do not have a succinct description. Specifically, we study uniformly random valid qcolorings of an input graph G with maximum degree Δ. This is in contrast to prior work in the area, where the relevant random objects are defined as a distribution with O(1) parameters (for example, n and p in the G(n,p) model). The distribution over valid colorings is instead specified via a "huge" input (the underlying graph G), that is far too large to be read by a sublinear time algorithm. Instead, our implementation accesses G through local neighborhood probes, and is able to answer queries to the color of any given vertex in sublinear time for q ≥ 9Δ, in a manner that is consistent with a specific random valid coloring of G. Furthermore, the implementation is memoryless, and can maintain consistency with noncommunicating copies of itself.more » « less

Given a collection of vectors, the approximate Knearestneighbor graph (KGraph for short) connects every vector to its approximate Knearestneighbors (KNN for short). KGraph plays an important role in high dimensional data visualization, semantic search, manifold learning, and machine learning. The vectors are typically vector representations of realworld objects (e.g., images and documents), which often come with a few structured attributes, such as timestamps and locations. In this paper, we study the allrange approximate Knearestneighbor graph (ARKGraph) problem. Specifically, given a collection of vectors, each associated with a numerical search key (e.g., a timestamp), we aim to build an index that takes a search key range as the query and returns the KGraph of vectors whose search keys are within the query range. ARKGraph can facilitate interactive high dimensional data visualization, data mining, etc. A key challenge of this problem is the huge index size. This is because, given n vectors, a bruteforce index stores a KGraph for every search key range, which results in O (K n 3 ) index size as there are O ( n 2 ) search key ranges and each KGraph takes O (K n ) space. We observe that the KNN of a vector in nearby ranges are often the same, which can be grouped together to save space. Based on this observation, we propose a series of novel techniques that reduce the index size significantly to just O (K n log n ) in the average case. Furthermore, we develop an efficient indexing algorithm that constructs the optimized ARKGraph index directly without exhaustively calculating the distance between every pair of vectors. To process a query, for each vector in the query range, we only need O (log log n + K log K) to restore its KNN in the query range from the optimized ARKGraph index. We conducted extensive experiments on realworld datasets. Experimental results show that our optimized ARKGraph index achieved a small index size, low query latency, and good scalability. Specifically, our approach was 1000x faster than the baseline method that builds a KGraph for all the vectors in the query range onthefly.more » « less