skip to main content

Title: Sampling biased monotonic surfaces using exponential metrics
Abstract Monotonic surfaces spanning finite regions of ℤ d arise in many contexts, including DNA-based self-assembly, card-shuffling and lozenge tilings. One method that has been used to uniformly generate these surfaces is a Markov chain that iteratively adds or removes a single cube below the surface during a step. We consider a biased version of the chain, where we are more likely to add a cube than to remove it, thereby favouring surfaces that are ‘higher’ or have more cubes below it. We prove that the chain is rapidly mixing for any uniform bias in ℤ 2 and for bias λ > d in ℤ d when d  > 2. In ℤ 2 we match the optimal mixing time achieved by Benjamini, Berger, Hoffman and Mossel in the context of biased card shuffling [2], but using much simpler arguments. The proofs use a geometric distance function and a variant of path coupling in order to handle distances that can be exponentially large. We also provide the first results in the case of fluctuating bias , where the bias can vary depending on the location of the tile. We show that the chain continues to be rapidly mixing if the biases are more » close to uniform, but that the chain can converge exponentially slowly in the general setting. « less
; ;
Award ID(s):
Publication Date:
Journal Name:
Combinatorics, Probability and Computing
Page Range or eLocation-ID:
672 to 697
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract We study and classify proper q -colourings of the ℤ d lattice, identifying three regimes where different combinatorial behaviour holds. (1) When $q\le d+1$ , there exist frozen colourings, that is, proper q -colourings of ℤ d which cannot be modified on any finite subset. (2) We prove a strong list-colouring property which implies that, when $q\ge d+2$ , any proper q -colouring of the boundary of a box of side length $n \ge d+2$ can be extended to a proper q -colouring of the entire box. (3) When $q\geq 2d+1$ , the latter holds for any $n \ge 1$ . Consequently, we classify the space of proper q -colourings of the ℤ d lattice by their mixing properties.
  2. Sensitivity properties describe how changes to the input of a program affect the output, typically by upper bounding the distance between the outputs of two runs by a monotone function of the distance between the corresponding inputs. When programs are probabilistic, the distance between outputs is a distance between distributions. The Kantorovich lifting provides a general way of defining a distance between distributions by lifting the distance of the underlying sample space; by choosing an appropriate distance on the base space, one can recover other usual probabilistic distances, such as the Total Variation distance. We develop a relational pre-expectation calculus to upper bound the Kantorovich distance between two executions of a probabilistic program. We illustrate our methods by proving algorithmic stability of a machine learning algorithm, convergence of a reinforcement learning algorithm, and fast mixing for card shuffling algorithms. We also consider some extensions: using our calculus to show convergence of Markov chains to the uniform distribution over states and an asynchronous extension to reason about pairs of program executions with different control flow.
  3. Abstract
    Site description. This data package consists of data obtained from sampling surface soil (the 0-7.6 cm depth profile) in black mangrove (Avicennia germinans) dominated forest and black needlerush (Juncus roemerianus) saltmarsh along the Gulf of Mexico coastline in peninsular west-central Florida, USA. This location has a subtropical climate with mean daily temperatures ranging from 15.4 °C in January to 27.8 °C in August, and annual precipitation of 1336 mm. Precipitation falls as rain primarily between June and September. Tides are semi-diurnal, with 0.57 m median amplitudes during the year preceding sampling (U.S. NOAA National Ocean Service, Clearwater Beach, Florida, station 8726724). Sea-level rise is 4.0 ± 0.6 mm per year (1973-2020 trend, mean ± 95 % confidence interval, NOAA NOS Clearwater Beach station). The A. germinans mangrove zone is either adjacent to water or fringed on the seaward side by a narrow band of red mangrove (Rhizophora mangle). A near-monoculture of J. roemerianus is often adjacent to and immediately landward of the A. germinans zone. The transition from the mangrove to the J. roemerianus zone is variable in our study area. An abrupt edge between closed-canopy mangrove and J. roemerianus monoculture may extend for up to several hundred metersMore>>
  4. The seminal result of Kahn, Kalai and Linial shows that a coalition of O(n/(log n)) players can bias the outcome of any Boolean function {0,1}^n -> {0,1} with respect to the uniform measure. We extend their result to arbitrary product measures on {0,1}^n, by combining their argument with a completely different argument that handles very biased input bits. We view this result as a step towards proving a conjecture of Friedgut, which states that Boolean functions on the continuous cube [0,1]^n (or, equivalently, on {1,...,n}^n) can be biased using coalitions of o(n) players. This is the first step taken in this direction since Friedgut proposed the conjecture in 2004. Russell, Saks and Zuckerman extended the result of Kahn, Kalai and Linial to multi-round protocols, showing that when the number of rounds is o(log^* n), a coalition of o(n) players can bias the outcome with respect to the uniform measure. We extend this result as well to arbitrary product measures on {0,1}^n. The argument of Russell et al. relies on the fact that a coalition of o(n) players can boost the expectation of any Boolean function from epsilon to 1-epsilon with respect to the uniform measure. This fails for general productmore »distributions, as the example of the AND function with respect to mu_{1-1/n} shows. Instead, we use a novel boosting argument alongside a generalization of our first result to arbitrary finite ranges.« less
  5. The growth of biodiversity data sets generated by citizen scientists continues to accelerate. The availability of such data has greatly expanded the scale of questions researchers can address. Yet, error, bias, and noise continue to be serious concerns for analysts, particularly when data being contributed to these giant online data sets are difficult to verify. Counts of birds contributed to eBird, the world’s largest biodiversity online database, present a potentially useful resource for tracking trends over time and space in species’ abundances. We quantified counting accuracy in a sample of 1,406 eBird checklists by comparing numbers contributed by birders (N = 246) who visited a popular birding location in Oregon, USA, with numbers generated by a professional ornithologist engaged in a long-term study creating benchmark (reference) measurements of daily bird counts. We focused on waterbirds, which are easily visible at this site. We evaluated potential predictors of count differences, including characteristics of contributed checklists, of each species, and of time of day and year. Count differences were biased toward undercounts, with more than 75% of counts being below the daily benchmark value. Median count discrepancies were −29.1% (range: 0 to −42.8%; N = 20 species). Model sets revealed an importantmore »influence of each species’ reference count, which varied seasonally as waterbird numbers fluctuated, and of percent of species known to be present each day that were included on each checklist. That is, checklists indicating a more thorough survey of the species richness at the site also had, on average, smaller count differences. However, even on checklists with the most thorough species lists, counts were biased low and exceptionally variable in their accuracy. To improve utility of such bird count data, we suggest three strategies to pursue in the future. (1) Assess additional options for analytically determining how to select checklists that include less biased count data, as well as exploring options for correcting bias during the analysis stage. (2) Add options for users to provide additional information that helps analysts choose checklists, such as an option for users to tag checklists where they focused on obtaining accurate counts. (3) Explore opportunities to effectively calibrate citizen-science bird count data by establishing a formalized network of marquis sites where dedicated observers regularly contribute carefully collected benchmark data.« less