Predicting the minimum operating voltage Vmin of chips is one of the important techniques for improving the manufacturing testing flow, as well as ensuring the long-term reliability and safety of in-field systems. Current Vmin prediction methods often provide only point estimates, necessitating additional techniques for constructing prediction confidence intervals to cover uncertainties caused by different sources of variations. While some existing techniques offer region predictions, but they rely on certain distributional assumptions and/or provide no coverage guarantees. In response to these limitations, we propose a novel distribution-free Vmin interval estimation methodology possessing a theoretical guarantee of coverage. Our approach leverages conformalized quantile regression and on-chip monitors to generate reliable prediction intervals. We demonstrate the effectiveness of the proposed method on an industrial 5nm automotive chip dataset. Moreover, we show that the use of on-chip monitors can reduce the interval length significantly for Vmin prediction.
more »
« less
Rapid Approximate Aggregation with Distribution-Sensitive Interval Guarantees
Aggregating data is fundamental to data analytics, data exploration, and OLAP. Approximate query processing (AQP) techniques are often used to accelerate computation of aggregates using samples, for which confidence intervals (CIs) are widely used to quantify the associated error. CIs used in practice fall into two categories: techniques that are tight but not correct, i.e., they yield tight intervals but only offer asymptoticguarantees,makingthem unreliable, or techniques that are correct but not tight, i.e., they offer rigorous guarantees, but are overly conservative, leading to confidence intervals that are too loose to be useful. In this paper, we develop a CI technique that is both correct and tighter than traditional approaches. Starting from conservative CIs, we identify two issues they often face: pessimistic mass allocation (PMA) and phantom outlier sensitivity (PHOS). By developing a novel range-trimming technique for eliminating PHOS and pairing it with known CI techniques without PMA, we develop a technique for computing CIs with strong guarantees that requires fewer samples for the same width. We implement our techniques underneath a sampling-optimized in-memory column store and show how they accelerate queries involving aggregates on real datasets with typical speedups on the order of 10× over both traditional AQP-with-guarantees and exact methods, all while obeying accuracy constraints.
more »
« less
- PAR ID:
- 10275604
- Date Published:
- Journal Name:
- IEEE International Conference on Data Engineering workshop
- ISSN:
- 1943-2895
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
High-level data types are often associated with semantic invariants that must be preserved by any correct implementation. While having implementations enforce strong guarantees such as linearizability or serializability can often be used to prevent invariant violations in concurrent settings, such mechanisms are impractical in geo-distributed replicated environments, the platform of choice for many scalable Web services. To achieve high-availability essential to this domain, these environments admit various forms of weak consistency that do not guarantee all replicas have a consistent view of an application's state. Consequently, they often admit difficult-to-understand anomalous behaviors that violate a data type's invariants, but which are extremely challenging, even for experts, to understand and debug. In this paper, we propose a novel programming framework for replicated data types (RDTs) equipped with an automatic (bounded) verification technique that discovers and fixes weak consistency anomalies. Our approach, implemented in a tool called Q9, involves systematically exploring the state space of an application executing on top of an eventually consistent data store, under an unrestricted consistency model but with a finite concurrency bound. Q9 uncovers anomalies (i.e., invariant violations) that manifest as finite counterexamples, and automatically generates repairs for such anamolies by selectively strengthening consistency guarantees for specific operations. Using Q9, we have uncovered a range of subtle anomalies in implementations of well-known benchmarks, and have been able to apply the repairs it mandates to effectively eliminate them. Notably, these benchmarks were written adopting best practices suggested to manage distributed replicated state (e.g., they are composed of provably convergent RDTs (CRDTs), avoid mutable state, etc.). While the safety guarantees offered by our technique are constrained by the concurrency bound, we show that in practice, proving bounded safety guarantees typically generalize to the unbounded case.more » « less
-
In critical infrastructure (CI) sectors such as emergency management or healthcare, researchers can analyze and detect useful patterns in data and help emergency management personnel efficaciously allocate limited resources or detect epidemiology spread patterns. However, all of this data contains personally identifiable information (PII) that needs to be safeguarded for legal and ethical reasons. Traditional techniques for safeguarding, such as anonymization, have shown to be ineffective. Differential privacy is a technique that supports individual privacy while allowing the analysis of datasets for societal benefit. This paper motivates the use of differential privacy to answer a wide range of queries about CI data containing PII with better privacy guarantees than is possible with traditional techniques. Moreover, it introduces a new technique based on Multipleattribute Workload Partitioning, which does not depend on the nature of the underlying dataset and provides better protection for privacy than current differential privacy approaches.more » « less
-
Conditional independence (CI) tests play a central role in statistical inference, machine learning, and causal discovery. Most existing CI tests assume that the samples are indepen- dently and identically distributed (i.i.d.). How- ever, this assumption often does not hold in the case of relational data. We define Relational Conditional Independence (RCI), a generaliza- tion of CI to the relational setting. We show how, under a set of structural assumptions, we can test for RCI by reducing the task of test- ing for RCI on non-i.i.d. data to the problem of testing for CI on several data sets each of which consists of i.i.d. samples. We develop Kernel Relational CI test (KRCIT), a nonpara- metric test as a practical approach to testing for RCI by relaxing the structural assumptions used in our analysis of RCI. We describe re- sults of experiments with synthetic relational data that show the benefits of KRCIT relative to traditional CI tests that don’t account for the non-i.i.d. nature of relational data.more » « less
-
Su, Bing (Ed.)Abstract Confidence intervals (CIs) depict the statistical uncertainty surrounding evolutionary divergence time estimates. They capture variance contributed by the finite number of sequences and sites used in the alignment, deviations of evolutionary rates from a strict molecular clock in a phylogeny, and uncertainty associated with clock calibrations. Reliable tests of biological hypotheses demand reliable CIs. However, current non-Bayesian methods may produce unreliable CIs because they do not incorporate rate variation among lineages and interactions among clock calibrations properly. Here, we present a new analytical method to calculate CIs of divergence times estimated using the RelTime method, along with an approach to utilize multiple calibration uncertainty densities in dating analyses. Empirical data analyses showed that the new methods produce CIs that overlap with Bayesian highest posterior density intervals. In the analysis of computer-simulated data, we found that RelTime CIs show excellent average coverage probabilities, that is, the actual time is contained within the CIs with a 94% probability. These developments will encourage broader use of computationally efficient RelTime approaches in molecular dating analyses and biological hypothesis testing.more » « less
An official website of the United States government

