skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: The German Tank Problem with Multiple Factories
During the Second World War, estimates of the number of tanks deployed by Germany were critically needed. The Allies adopted a successful statistical approach to estimate this information: assuming that the tanks are sequentially numbered starting from 1, if we observe k tanks from an unknown total of N, then the best linear unbiased estimator for N is M(1+1/k)-1 where M is the maximum observed serial number. However, in many situations, the original German Tank Problem is insufficient, since typically there are l > 1 factories, and tanks produced by different factories may have serial numbers in disjoint ranges that are often far separated.Clark, Gonye and Miller presented an unbiased estimator for N when the minimum serial number is unknown. Provided one identifies which samples correspond to which factory, one can then estimate each factory's range and summing the sizes of these ranges yields an estimate for the rival's total productivity. We construct an efficient procedure to estimate the total productivity and prove that it is effective when log l/log k is sufficiently small. In the final section, we show that given information about the gaps, we can make an estimator that performs orders of magnitude better when we have a small number of samples.  more » « less
Award ID(s):
2241623
PAR ID:
10580624
Author(s) / Creator(s):
; ;
Publisher / Repository:
PUMP
Date Published:
Journal Name:
The PUMP Journal of Undergraduate Research
Volume:
7
ISSN:
2765-8724
Page Range / eLocation ID:
231 to 257
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We study subtrajectory clustering under the Fréchet distance. Given one or more trajectories, the task is to split the trajectories into several parts, such that the parts have a good clustering structure. We approach this problem via a new set cover formulation, which we think provides a natural formalization of the problem as it is studied in many applications. Given a polygonal curve P with n vertices in fixed dimension, integers k, ℓ ≥ 1, and a real value Δ > 0, the goal is to find k center curves of complexity at most ℓ such that every point on P is covered by a subtrajectory that has small Fréchet distance to one of the k center curves (≤ Δ). In many application scenarios, one is interested in finding clusters of small complexity, which is controlled by the parameter ℓ. Our main result is a bicriterial approximation algorithm: if there exists a solution for given parameters k, ℓ, and Δ, then our algorithm finds a set of k' center curves of complexity at most ℓ with covering radius Δ' with k' in O(kℓ2 log (kℓ)), and Δ' ≤ 19Δ. Moreover, within these approximation bounds, we can minimize k while keeping the other parameters fixed. If ℓ is a constant independent of n, then, the approximation factor for the number of clusters k is O(log k) and the approximation factor for the radius Δ is constant. In this case, the algorithm has expected running time in Õ(km2 + mn) and uses space in O(n + m), where m=⌈L/Δ⌉ and L is the total arclength of the curve P. 
    more » « less
  2. In password security a defender would like to identify and warn users with weak passwords. Similarly, the defender may also want to predict what fraction of passwords would be cracked within B guesses as the attacker’s guessing budget B varies from small (online attacker) to large (offline attacker). Towards each of these goals the defender would like to quickly estimate the guessing number for each user password pwd assuming that the attacker uses a password cracking model M i.e., how many password guesses will the attacker check before s/he cracks each user password pwd. Since naïve brute-force enumeration can be prohibitively expensive when the guessing number is very large, Dell’Amico and Filippone [1] developed an efficient Monte Carlo algorithm to estimate the guessing number of a given password pwd. While Dell’Amico and Filippone proved that their estimator is unbiased there is no guarantee that the Monte Carlo estimates are accurate nor does the method provide confidence ranges on the estimated guessing number or even indicate if/when there is a higher degree of uncertainty.Our contributions are as follows: First, we identify theoretical examples where, with high probability, Monte Carlo Strength estimation produces highly inaccurate estimates of individual guessing numbers as well as the entire guessing curve. Second, we introduce Confident Monte Carlo Strength Estimation as an extension of Dell’Amico and Filippone [1]. Given a password our estimator generates an upper and lower bound with the guarantee that, except with probability δ, the true guessing number lies within the given confidence range. Our techniques can also be used to characterize the attacker’s guessing curve. In particular, given a probabilistic password cracking model M we can generate high confidence upper and lower bounds on the fraction of passwords that the attacker will crack as the guessing budget B varies. 
    more » « less
  3. In adaptive importance sampling and other contexts, we haveK> 1 unbiased and uncorrelated estimates μ^kof a common quantity μ. The optimal unbiased linear combination weights them inversely to their variances, but those weights are unknown and hard to estimate. A simple deterministic square root rule based on a working model that Var(μ^k) ∝k−1/2gives an unbiased estimate of μ that is nearly optimal under a wide range of alternative variance patterns. We show that if Var(μ^k)∝k−yfor an unknown rate parametery∈[0,1], then the square root rule yields the optimal variance rate with a constant that is too large by at most 9/8 for any 0 ⩽y⩽ 1 and any numberKof estimates. Numerical work shows that rule is similarly robust to some other patterns with mildly decreasing variance askincreases. 
    more » « less
  4. Gørtz, Inge Li; Farach-Colton, Martin; Puglisi, Simon J; Herman, Grzegorz (Ed.)
    We give the first almost-linear time algorithm for computing the maximal k-edge-connected subgraphs of an undirected unweighted graph for any constant k. More specifically, given an n-vertex m-edge graph G = (V,E) and a number k = log^o(1) n, we can deterministically compute in O(m+n^{1+o(1)}) time the unique vertex partition {V_1,… ,V_z} such that, for every i, V_i induces a k-edge-connected subgraph while every superset V'_i ⊃ V_{i} does not. Previous algorithms with linear time work only when k ≤ 2 [Tarjan SICOMP'72], otherwise they all require Ω(m+n√n) time even when k = 3 [Chechik et al. SODA'17; Forster et al. SODA'20]. Our algorithm also extends to the decremental graph setting; we can deterministically maintain the maximal k-edge-connected subgraphs of a graph undergoing edge deletions in m^{1+o(1)} total update time. Our key idea is a reduction to the dynamic algorithm supporting pairwise k-edge-connectivity queries [Jin and Sun FOCS'20]. 
    more » « less
  5. Given a metric space ℳ = (X,δ), a weighted graph G over X is a metric t-spanner of ℳ if for every u,v ∈ X, δ(u,v) ≤ δ_G(u,v) ≤ t⋅ δ(u,v), where δ_G is the shortest path metric in G. In this paper, we construct spanners for finite sets in metric spaces in the online setting. Here, we are given a sequence of points (s₁, …, s_n), where the points are presented one at a time (i.e., after i steps, we have seen S_i = {s₁, … , s_i}). The algorithm is allowed to add edges to the spanner when a new point arrives, however, it is not allowed to remove any edge from the spanner. The goal is to maintain a t-spanner G_i for S_i for all i, while minimizing the number of edges, and their total weight. Under the L₂-norm in ℝ^d for arbitrary constant d ∈ ℕ, we present an online (1+ε)-spanner algorithm with competitive ratio O_d(ε^{-d} log n), improving the previous bound of O_d(ε^{-(d+1)}log n). Moreover, the spanner maintained by the algorithm has O_d(ε^{1-d}log ε^{-1})⋅ n edges, almost matching the (offline) optimal bound of O_d(ε^{1-d})⋅ n. In the plane, a tighter analysis of the same algorithm provides an almost quadratic improvement of the competitive ratio to O(ε^{-3/2}logε^{-1}log n), by comparing the online spanner with an instance-optimal spanner directly, bypassing the comparison to an MST (i.e., lightness). As a counterpart, we design a sequence of points that yields a Ω_d(ε^{-d}) lower bound for the competitive ratio for online (1+ε)-spanner algorithms in ℝ^d under the L₁-norm. Then we turn our attention to online spanners in general metrics. Note that, it is not possible to obtain a spanner with stretch less than 3 with a subquadratic number of edges, even in the offline setting, for general metrics. We analyze an online version of the celebrated greedy spanner algorithm, dubbed ordered greedy. With stretch factor t = (2k-1)(1+ε) for k ≥ 2 and ε ∈ (0,1), we show that it maintains a spanner with O(ε^{-1}logε^{-1})⋅ n^{1+1/k} edges and O(ε^{-1}n^{1/k}log² n) lightness for a sequence of n points in a metric space. We show that these bounds cannot be significantly improved, by introducing an instance that achieves an Ω(1/k⋅ n^{1/k}) competitive ratio on both sparsity and lightness. Furthermore, we establish the trade-off among stretch, number of edges and lightness for points in ultrametrics, showing that one can maintain a (2+ε)-spanner for ultrametrics with O(ε^{-1}logε^{-1})⋅ n edges and O(ε^{-2}) lightness. 
    more » « less