skip to main content

Title: Certified Computation from Unreliable Datasets
A wide range of learning tasks require human input in labeling massive data. The collected data though are usually low quality and contain inaccuracies and errors. As a result, modern science and business face the problem of learning from unreliable data sets. In this work, we provide a generic approach that is based on \textit{verification} of only few records of the data set to guarantee high quality learning outcomes for various optimization objectives. Our method, identifies small sets of critical records and verifies their validity. We show that many problems only need poly(1/ε) verifications, to ensure that the output of the computation is at most a factor of (1±ε) away from the truth. For any given instance, we provide an \textit{instance optimal} solution that verifies the minimum possible number of records to approximately certify correctness. Then using this instance optimal formulation of the problem we prove our main result: “every function that satisfies some Lipschitz continuity condition can be certified with a small number of verifications”. We show that the required Lipschitz continuity condition is satisfied even by some NP-complete problems, which illustrates the generality and importance of this theorem. In case this certification step fails, an invalid record will more » be identified. Removing these records and repeating until success, guarantees that the result will be accurate and will depend only on the verified records. Surprisingly, as we show, for several computation tasks more efficient methods are possible. These methods always guarantee that the produced result is not affected by the invalid records, since any invalid record that affects the output will be detected and verified. « less
Authors:
; ;
Award ID(s):
1733808 1741137 1650733
Publication Date:
NSF-PAR ID:
10075552
Journal Name:
31st Annual Conference on Learning Theory (COLT)
Page Range or eLocation-ID:
3271-3294
Sponsoring Org:
National Science Foundation
More Like this
  1. In this paper, we consider the optimal control of semilinear fractional PDEs with both spectral and integral fractional diffusion operators of order 2 s with s ∈ (0, 1). We first prove the boundedness of solutions to both semilinear fractional PDEs under minimal regularity assumptions on domain and data. We next introduce an optimal growth condition on the nonlinearity to show the Lipschitz continuity of the solution map for the semilinear elliptic equations with respect to the data. We further apply our ideas to show existence of solutions to optimal control problems with semilinear fractional equations as constraints. Under the standard assumptions on the nonlinearity (twice continuously differentiable) we derive the first and second order optimality conditions.
  2. Cutting-edge machine learning techniques often require millions of labeled data objects to train a robust model. Because relying on humans to supply such a huge number of labels is rarely practical, automated methods for label generation are needed. Unfortunately, critical challenges in auto-labeling remain unsolved, including the following research questions: (1) which objects to ask humans to label, (2) how to automatically propagate labels to other objects, and (3) when to stop labeling. These three questions are not only each challenging in their own right, but they also correspond to tightly interdependent problems. Yet existing techniques provide at best isolated solutions to a subset of these challenges. In this work, we propose the first approach, called LANCET, that successfully addresses all three challenges in an integrated framework. LANCET is based on a theoretical foundation characterizing the properties that the labeled dataset must satisfy to train an effective prediction model, namely the Covariate-shift and the Continuity conditions. First, guided by the Covariate-shift condition, LANCET maps raw input data into a semantic feature space, where an unlabeled object is expected to share the same label with its near-by labeled neighbor. Next, guided by the Continuity condition, LANCET selects objects for labeling, aimingmore »to ensure that unlabeled objects always have some sufficiently close labeled neighbors. These two strategies jointly maximize the accuracy of the automatically produced labels and the prediction accuracy of the machine learning models trained on these labels. Lastly, LANCET uses a distribution matching network to verify whether both the Covariate-shift and Continuity conditions hold, in which case it would be safe to terminate the labeling process. Our experiments on diverse public data sets demonstrate that LANCET consistently outperforms the state-of-the-art methods from Snuba to GOGGLES and other baselines by a large margin - up to 30 percentage points increase in accuracy.« less
  3. We consider the communication complexity of a number of distributed optimization problems. We start with the problem of solving a linear system. Suppose there is a coordinator together with s servers P1, …, Ps, the i-th of which holds a subset A(i) x = b(i) of ni constraints of a linear system in d variables, and the coordinator would like to output an x ϵ ℝd for which A(i) x = b(i) for i = 1, …, s. We assume each coefficient of each constraint is specified using L bits. We first resolve the randomized and deterministic communication complexity in the point-to-point model of communication, showing it is (d2 L + sd) and (sd2L), respectively. We obtain similar results for the blackboard communication model. As a result of independent interest, we show the probability a random matrix with integer entries in {–2L, …, 2L} is invertible is 1–2−Θ(dL), whereas previously only 1 – 2−Θ(d) was known. When there is no solution to the linear system, a natural alternative is to find the solution minimizing the ℓp loss, which is the ℓp regression problem. While this problem has been studied, we give improved upper or lower bounds for every value ofmore »p ≥ 1. One takeaway message is that sampling and sketching techniques, which are commonly used in earlier work on distributed optimization, are neither optimal in the dependence on d nor on the dependence on the approximation ε, thus motivating new techniques from optimization to solve these problems. Towards this end, we consider the communication complexity of optimization tasks which generalize linear systems, such as linear, semi-definite, and convex programming. For linear programming, we first resolve the communication complexity when d is constant, showing it is (sL) in the point-to-point model. For general d and in the point-to-point model, we show an Õ(sd3L) upper bound and an (d2 L + sd) lower bound. In fact, we show if one perturbs the coefficients randomly by numbers as small as 2−Θ(L), then the upper bound is Õ(sd2L) + poly(dL), and so this bound holds for almost all linear programs. Our study motivates understanding the bit complexity of linear programming, which is related to the running time in the unit cost RAM model with words of O(log(nd)) bits, and we give the fastest known algorithms for linear programming in this model. Read More: https://epubs.siam.org/doi/10.1137/1.9781611975994.106« less
  4. We consider a large-scale service system where incoming tasks have to be instantaneously dispatched to one out of many parallel server pools. The user-perceived performance degrades with the number of concurrent tasks and the dispatcher aims at maximizing the overall quality of service by balancing the load through a simple threshold policy. We demonstrate that such a policy is optimal on the fluid and diffusion scales, while only involving a small communication overhead, which is crucial for large-scale deployments. In order to set the threshold optimally, it is important, however, to learn the load of the system, which may be unknown. For that purpose, we design a control rule for tuning the threshold in an online manner. We derive conditions that guarantee that this adaptive threshold settles at the optimal value, along with estimates for the time until this happens. In addition, we provide numerical experiments that support the theoretical results and further indicate that our policy copes effectively with time-varying demand patterns. Summary of Contribution: Data centers and cloud computing platforms are the digital factories of the world, and managing resources and workloads in these systems involves operations research challenges of an unprecedented scale. Due to the massive size,more »complex dynamics, and wide range of time scales, the design and implementation of optimal resource-allocation strategies is prohibitively demanding from a computation and communication perspective. These resource-allocation strategies are essential for certain interactive applications, for which the available computing resources need to be distributed optimally among users in order to provide the best overall experienced performance. This is the subject of the present article, which considers the problem of distributing tasks among the various server pools of a large-scale service system, with the objective of optimizing the overall quality of service provided to users. A solution to this load-balancing problem cannot rely on maintaining complete state information at the gateway of the system, since this is computationally unfeasible, due to the magnitude and complexity of modern data centers and cloud computing platforms. Therefore, we examine a computationally light load-balancing algorithm that is yet asymptotically optimal in a regime where the size of the system approaches infinity. The analysis is based on a Markovian stochastic model, which is studied through fluid and diffusion limits in the aforementioned large-scale regime. The article analyzes the load-balancing algorithm theoretically and provides numerical experiments that support and extend the theoretical results.« less
  5. Reduced bases have been introduced for the approximation of parametrized PDEs in applications where many online queries are required. Their numerical efficiency for such problems has been theoretically confirmed in Binev et al. ( SIAM J. Math. Anal. 43 (2011) 1457–1472) and DeVore et al. ( Constructive Approximation 37 (2013) 455–466), where it is shown that the reduced basis space V n of dimension n , constructed by a certain greedy strategy, has approximation error similar to that of the optimal space associated to the Kolmogorov n -width of the solution manifold. The greedy construction of the reduced basis space is performed in an offline stage which requires at each step a maximization of the current error over the parameter space. For the purpose of numerical computation, this maximization is performed over a finite training set obtained through a discretization of the parameter domain. To guarantee a final approximation error ε for the space generated by the greedy algorithm requires in principle that the snapshots associated to this training set constitute an approximation net for the solution manifold with accuracy of order ε . Hence, the size of the training set is the ε covering number for M and thismore »covering number typically behaves like exp( Cε −1/s ) for some C  > 0 when the solution manifold has n -width decay O ( n −s ). Thus, the shear size of the training set prohibits implementation of the algorithm when ε is small. The main result of this paper shows that, if one is willing to accept results which hold with high probability, rather than with certainty, then for a large class of relevant problems one may replace the fine discretization by a random training set of size polynomial in ε −1 . Our proof of this fact is established by using inverse inequalities for polynomials in high dimensions.« less