Algorithms often have tunable parameters that impact performance metrics such as runtime and solution quality. For many algorithms used in practice, no parameter settings admit meaningful worst-case bounds, so the parameters are made available for the user to tune. Alternatively, parameters may be tuned implicitly within the proof of a worst-case guarantee. Worst-case instances, however, may be rare or nonexistent in practice. A growing body of research has demonstrated that data-driven algorithm design can lead to significant improvements in performance. This approach uses a training set of problem instances sampled from an unknown, application-specific distribution and returns a parameter setting with strong average performance on the training set. 
                        more » 
                        « less   
                    
                            
                            How Much Data Is Sufficient to Learn High-Performing Algorithms?
                        
                    
    
            Algorithms often have tunable parameters that impact performance metrics such as runtime and solution quality. For many algorithms used in practice, no parameter settings admit meaningful worst-case bounds, so the parameters are made available for the user to tune. Alternatively, parameters may be tuned implicitly within the proof of a worst-case approximation ratio or runtime bound. Worst-case instances, however, may be rare or nonexistent in practice. A growing body of research has demonstrated that a data-driven approach to parameter tuning can lead to significant improvements in performance. This approach uses atraining setof problem instances sampled from an unknown, application-specific distribution and returns a parameter setting with strong average performance on the training set. We provide techniques for derivinggeneralization guaranteesthat bound the difference between the algorithm’s average performance over the training set and its expected performance on the unknown distribution. Our results apply no matter how the parameters are tuned, be it via an automated or manual approach. The challenge is that for many types of algorithms, performance is a volatile function of the parameters: slightly perturbing the parameters can cause a large change in behavior. Prior research [e.g.,12,16,20,62] has proved generalization bounds by employing case-by-case analyses of greedy algorithms, clustering algorithms, integer programming algorithms, and selling mechanisms. We streamline these analyses with a general theorem that applies whenever an algorithm’s performance is a piecewise-constant, piecewise-linear, or—more generally—piecewise-structuredfunction of its parameters. Our results, which are tight up to logarithmic factors in the worst case, also imply novel bounds for configuring dynamic programming algorithms from computational biology. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 2232121
- PAR ID:
- 10572295
- Publisher / Repository:
- ACM
- Date Published:
- Journal Name:
- Journal of the ACM
- Volume:
- 71
- Issue:
- 5
- ISSN:
- 0004-5411
- Page Range / eLocation ID:
- 1 to 58
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            We prove novel algorithmic guarantees for several online problems in the smoothed analysis model. In this model, at each time step an adversary chooses an input distribution with density function bounded above pointwise by \(\tfrac{1}{\sigma }\)times that of the uniform distribution; nature then samples an input from this distribution. Here, σ is a parameter that interpolates between the extremes of worst-case and average case analysis. Crucially, our results hold foradaptiveadversaries that can base their choice of input distribution on the decisions of the algorithm and the realizations of the inputs in the previous time steps. An adaptive adversary can nontrivially correlate inputs at different time steps with each other and with the algorithm’s current state; this appears to rule out the standard proof approaches in smoothed analysis. This paper presents a general technique for proving smoothed algorithmic guarantees against adaptive adversaries, in effect reducing the setting of an adaptive adversary to the much simpler case of an oblivious adversary (i.e., an adversary that commits in advance to the entire sequence of input distributions). We apply this technique to prove strong smoothed guarantees for three different problems:(1)Online learning: We consider the online prediction problem, where instances are generated from an adaptive sequence of σ-smooth distributions and the hypothesis class has VC dimensiond. We bound the regret by\(\tilde{O}(\sqrt {T d\ln (1/\sigma)} + d\ln (T/\sigma))\)and provide a near-matching lower bound. Our result shows that under smoothed analysis, learnability against adaptive adversaries is characterized by the finiteness of the VC dimension. This is as opposed to the worst-case analysis, where online learnability is characterized by Littlestone dimension (which is infinite even in the extremely restricted case of one-dimensional threshold functions). Our results fully answer an open question of Rakhlin et al. [64].(2)Online discrepancy minimization: We consider the setting of the online Komlós problem, where the input is generated from an adaptive sequence of σ-smooth and isotropic distributions on the ℓ2unit ball. We bound the ℓ∞norm of the discrepancy vector by\(\tilde{O}(\ln ^2(\frac{nT}{\sigma }))\). This is as opposed to the worst-case analysis, where the tight discrepancy bound is\(\Theta (\sqrt {T/n})\). We show such\(\mathrm{polylog}(nT/\sigma)\)discrepancy guarantees are not achievable for non-isotropic σ-smooth distributions.(3)Dispersion in online optimization: We consider online optimization with piecewise Lipschitz functions where functions with ℓ discontinuities are chosen by a smoothed adaptive adversary and show that the resulting sequence is\(({\sigma }/{\sqrt {T\ell }}, \tilde{O}(\sqrt {T\ell }))\)-dispersed. That is, every ball of radius\({\sigma }/{\sqrt {T\ell }}\)is split by\(\tilde{O}(\sqrt {T\ell })\)of the partitions made by these functions. This result matches the dispersion parameters of Balcan et al. [13] for oblivious smooth adversaries, up to logarithmic factors. On the other hand, worst-case sequences are trivially (0,T)-dispersed.1more » « less
- 
            Since the mid-1980s it has been known that Byzantine Agreement can be solved with probability 1 asynchronously, even against an omniscient, computationally unbounded adversary that can adaptivelycorruptup tof < n/3parties. Moreover, the problem is insoluble withf ≥ n/3corruptions. However, Bracha’s [13] 1984 protocol (see also Ben-Or [8]) achievedf < n/3resilience at the cost ofexponentialexpected latency2Θ (n), a bound that hasneverbeen improved in this model withf = ⌊ (n-1)/3 ⌋corruptions. In this article, we prove that Byzantine Agreement in the asynchronous, full information model can be solved with probability 1 against an adaptive adversary that can corruptf < n/3parties, while incurring onlypolynomial latency with high probability. Our protocol follows an earlier polynomial latency protocol of King and Saia [33,34], which hadsuboptimalresilience, namelyf ≈ n/109 [33,34]. Resiliencef = (n-1)/3is uniquely difficult, as this is the point at which the influence of the Byzantine and honest players are of roughly equal strength. The core technical problem we solve is to design a collective coin-flipping protocol thateventuallylets us flip a coin with an unambiguous outcome. In the beginning, the influence of the Byzantine players is too powerful to overcome, and they can essentially fix the coin’s behavior at will. We guarantee that after just a polynomial number of executions of the coin-flipping protocol, either (a) the Byzantine players fail to fix the behavior of the coin (thereby ending the game) or (b) we can “blacklist” players such that the blacklisting rate for Byzantine players is at least as large as the blacklisting rate for good players. The blacklisting criterion is based on a simple statistical test offraud detection.more » « less
- 
            A constraint satisfaction problem (CSP),\(\textsf {Max-CSP}(\mathcal {F})\), is specified by a finite set of constraints\(\mathcal {F}\subseteq \lbrace [q]^k \rightarrow \lbrace 0,1\rbrace \rbrace\)for positive integersqandk. An instance of the problem onnvariables is given bymapplications of constraints from\(\mathcal {F}\)to subsequences of thenvariables, and the goal is to find an assignment to the variables that satisfies the maximum number of constraints. In the (γ ,β)-approximation version of the problem for parameters 0 ≤ β ≤ γ ≤ 1, the goal is to distinguish instances where at least γ fraction of the constraints can be satisfied from instances where at most β fraction of the constraints can be satisfied. In this work, we consider the approximability of this problem in the context of sketching algorithms and give a dichotomy result. Specifically, for every family\(\mathcal {F}\)and every β < γ, we show that either a linear sketching algorithm solves the problem in polylogarithmic space or the problem is not solvable by any sketching algorithm in\(o(\sqrt {n})\)space. In particular, we give non-trivial approximation algorithms using polylogarithmic space for infinitely many constraint satisfaction problems. We also extend previously known lower bounds for general streaming algorithms to a wide variety of problems, and in particular the case ofq=k=2, where we get a dichotomy, and the case when the satisfying assignments of the constraints of\(\mathcal {F}\)support a distribution on\([q]^k\)with uniform marginals. Prior to this work, other than sporadic examples, the only systematic classes of CSPs that were analyzed considered the setting of Boolean variablesq= 2, binary constraintsk=2, and singleton families\(|\mathcal {F}|=1\)and only considered the setting where constraints are placed on literals rather than variables. Our positive results show wide applicability of bias-based algorithms used previously by [47] and [41], which we extend to include richer norm estimation algorithms, by giving a systematic way to discover biases. Our negative results combine the Fourier analytic methods of [56], which we extend to a wider class of CSPs, with a rich collection of reductions among communication complexity problems that lie at the heart of the negative results. In particular, previous works used Fourier analysis over the Boolean cube to initiate their results and the results seemed particularly tailored to functions on Boolean literals (i.e., with negations). Our techniques surprisingly allow us to get to generalq-ary CSPs without negations by appealing to the same Fourier analytic starting point over Boolean hypercubes.more » « less
- 
            We consider the linear third order (in time) PDE known as the SMGTJ-equation, defined on a bounded domain, under the action of either Dirichlet or Neumann boundary control \begin{document}$ g $$\end{document}. Optimal interior and boundary regularity results were given in [1], after [41], when \begin{document}$$ g \in L^2(0, T;L^2(\Gamma)) \equiv L^2(\Sigma) $$\end{document}, which, moreover, in the canonical case \begin{document}$$ \gamma = 0 $$\end{document}, were expressed by the well-known explicit representation formulae of the wave equation in terms of cosine/sine operators [19], [17], [24,Vol Ⅱ]. The interior or boundary regularity theory is however the same, whether \begin{document}$$ \gamma = 0 $$\end{document} or \begin{document}$$ 0 \neq \gamma \in L^{\infty}(\Omega) $$\end{document}, since \begin{document}$$ \gamma \neq 0 $$\end{document} is responsible only for lower order terms. Here we exploit such cosine operator based-explicit representation formulae to provide optimal interior and boundary regularity results with \begin{document}$$ g $$\end{document} "smoother" than \begin{document}$$ L^2(\Sigma) $$\end{document}, qualitatively by one unit, two units, etc. in the Dirichlet boundary case. To this end, we invoke the corresponding results for wave equations, as in [17]. Similarly for the Neumann boundary case, by invoking the corresponding results for the wave equation as in [22], [23], [37] for control smoother than \begin{document}$$ L^2(0, T;L^2(\Gamma)) $$\end{document}, and [44] for control less regular in space than \begin{document}$$ L^2(\Gamma) $$\end{document}$. In addition, we provide optimal interior and boundary regularity results when the SMGTJ equation is subject to interior point control, by invoking the corresponding wave equations results [42], [24,Section 9.8.2].more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
