We develop conservative tests for the mean of a bounded population under stratified sampling and apply them to risk-limiting post-election audits. The tests are "anytime valid" under sequential sampling, allowing optional stopping in each stratum. Our core method expresses a global hypothesis about the population mean as a union of intersection hypotheses describing within-stratum means. It tests each intersection hypothesis using independent test supermartingales (TSMs) combined across strata by multiplication. A P-value for each intersection hypothesis is the reciprocal of that test statistic, and the largest P-value in the union is a P-value for the global hypothesis. This approach has two primary moving parts: the rule selecting which stratum to draw from next given the sample so far, and the form of the TSM within each stratum. These rules may vary over intersection hypotheses. We construct the test with the smallest expected stopping time, and present a few strategies for approximating that optimum. Approximately optimal methods are challenging to compute when there are more than two strata, while some simple rules that scale well can be inconsistent -- the resulting test will never reject for some alternatives, no matter how large the sample. We present a set of rules that leads to a computationally tractable test for arbitrarily many strata. In instances that arise in auditing and other applications, its expected sample size is nearly optimal and substantially smaller than that of previous methods.
more »
« less
This content will become publicly available on September 11, 2026
Dice, but Don’t Slice: Optimizing the Efficiency of ONEAudit
Abstract ONEAudit provides more efficient risk-limiting audits than other extant methods when the voting system cannot report a cast-vote record linked to each cast card. It obviates the need for re-scanning; it is simpler and more efficient than ‘hybrid’ audits; and it is far more efficient than batch-level comparison audits. There may be room to improve the efficiency of ONEAudit further by tuning the statistical tests it uses and by using stratified sampling. We show that tuning the tests by optimizing for the reported batch-level tallies or integrating over a distribution reduces expected workloads by 70–85% compared to the current ONEAudit implementation across a range of simulated elections. The improved tests reduce the expected workload to audit the 2024 Mayoral race in San Francisco, California, by half—from about 200 cards to about 100 cards. In contrast, stratified sampling does not help: it increases workloads by about 25% on average.
more »
« less
- PAR ID:
- 10646804
- Publisher / Repository:
- Springer Nature Switzerland
- Date Published:
- Page Range / eLocation ID:
- 175 to 190
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
U.S. elections rely heavily on computers such as voter registration databases, electronic pollbooks, voting machines, scanners, tabulators, and results reporting websites. These introduce digital threats to election outcomes. Risk-limiting audits (RLAs) mitigate threats to some of these systems by manually inspecting random samples of ballot cards. RLAs have a large chance of correcting wrong outcomes (by conducting a full manual tabulation of a trustworthy record of the votes), but can save labor when reported outcomes are correct. This efficiency is eroded when sampling cannot be targeted to ballot cards that contain the contest(s) under audit. If the sample is drawn from all cast cards, then RLA sample sizes scale like the reciprocal of the fraction of ballot cards that contain the contest(s) under audit. That fraction shrinks as the number of cards per ballot grows (i.e., when elections contain more contests) and as the fraction of ballots that contain the contest decreases (i.e., when a smaller percentage of voters are eligible to vote in the contest). States that conduct RLAs of contests on multi-card ballots or RLAs of small contests can dramatically reduce sample sizes by using information about which ballot cards contain which contests—by keeping track of card-style data (CSD). For instance, CSD reduce the expected number of draws needed to audit a single countywide contest on a 4-card ballot by 75%. Similarly, CSD reduce the expected number of draws by 95% or more for an audit of two contests with the same margin on a 4-card ballot if one contest is on every ballot and the other is on 10% of ballots. In realistic examples, the savings can be several orders of magnitude.more » « less
-
Risk-limiting audits (RLAs) are rigorous statistical procedures meant to detect invalid election results. RLAs examine paper ballots cast during the election to statistically assess the possibility of a disagreement between the winner determined by the ballots and the winner reported by tabulation. The design of an RLA must balance risk against efficiency: "risk" refers to a bound on the chance that the audit fails to detect such a disagreement when one occurs; "efficiency" refers to the total effort to conduct the audit. The most efficient approaches—when measured in terms of the number of ballots that must be inspected—proceed by "ballot comparison." However, ballot comparison requires an (untrusted) declaration of the contents of each cast ballot, rather than a simple tabulation of vote totals. This "cast-vote record table" (CVR) is then spot-checked against ballots for consistency. In many practical settings, the cost of generating a suitable CVR dominates the cost of conducting the audit which has prevented widespread adoption of these sample-efficient techniques. We introduce a new RLA procedure: an "adaptive ballot comparison" audit. In this audit, a global CVR is never produced; instead, a three-stage procedure is iterated: 1) a batch is selected, 2) a CVR is produced for that batch, and 3) a ballot within the batch is sampled, inspected by auditors, and compared with the CVR. We prove that such an audit can achieve risk commensurate with standard comparison audits while generating a fraction of the CVR. We present three main contributions: (1) a formal adversarial model for RLAs; (2) definition and analysis of an adaptive audit procedure with rigorous risk limits and an associated correctness analysis accounting for the incidental errors arising in typical audits; and (3) an analysis of efficiency.more » « less
-
One approach to risk-limiting audits (RLAs) compares ran- domly selected cast vote records (CVRs) to votes read by human auditors from the corresponding ballot cards. Historically, such methods reduce audit sample sizes by considering how each sampled CVR di!ers from the corresponding true vote, not merely whether they di!er. Here we investigate the latter approach, auditing by testing whether the total number of mismatches in the full set of CVRs exceeds the minimum number of CVR errors required for the reported outcome to be wrong (the “CVR margin”). This strategy makes it possible to audit more social choice functions and simplifies RLAs conceptually, which makes it easier to explain than some other RLA approaches. The cost is larger sample sizes. “Mismatch-based RLAs” only require a lower bound on the CVR margin, which for some social choice functions is easier to calculate than the e!ect of particular errors. When the population rate of mismatches is low and the lower bound on the CVR margin is close to the true CVR margin, the increase in sample size is small. However, the increase may be very large when errors include errors that, if corrected, would widen the CVR margin rather than narrow it; errors a!ect the margin between candidates other than the reported winner with the fewest votes and the reported loser with the most votes; or errors that a!ect di!erent margins.more » « less
-
Storage systems usually have many parameters that affect their behavior. Tuning those parameters can provide significant gains in performance. Alas, both manual and automatic tuning methods struggle due to the large number of parameters and exponential number of possible configurations. Since previous research has shown that some parameters have greater performance impact than others, focusing on a smaller number of more important parameters can speed up auto-tuning systems because they would have a smaller state space to explore. In this paper, we propose Carver, which uses (1) a variance-based metric to quantify storage parameters’ importance, (2) Latin Hypercube Sampling to sample huge parameter spaces; and (3) a greedy but efficient parameter-selection algorithm that can identify important parameters. We evaluated Carver on datasets consisting of more than 500,000 experiments on 7 file systems, under 4 representative workloads. Carver successfully identified important parameters for all file systems and showed that importance varies with different workloads. We demonstrated that Carver was able to identify a near-optimal set of important parameters in our datasets. We showed Carver’s efficiency by testing it with a small fraction of our dataset; it was able to identify the same set of important parameters with as little as 0.4% of the whole dataset.more » « less
An official website of the United States government
