The cumulative pebbling complexity of a directed acyclic graph G is defined as cc(G) = min_P ∑_i P_i, where the minimum is taken over all legal (parallel) black pebblings of G and P_i denotes the number of pebbles on the graph during round i. Intuitively, cc(G) captures the amortized SpaceTime complexity of pebbling m copies of G in parallel. The cumulative pebbling complexity of a graph G is of particular interest in the field of cryptography as cc(G) is tightly related to the amortized AreaTime complexity of the DataIndependent MemoryHard Function (iMHF) f_{G,H} [Joël Alwen and Vladimir Serbinenko, 2015] defined using a constant indegree directed acyclic graph (DAG) G and a random oracle H(⋅). A secure iMHF should have amortized SpaceTime complexity as high as possible, e.g., to deter bruteforce password attacker who wants to find x such that f_{G,H}(x) = h. Thus, to analyze the (in)security of a candidate iMHF f_{G,H}, it is crucial to estimate the value cc(G) but currently, upper and lower bounds for leading iMHF candidates differ by several orders of magnitude. Blocki and Zhou recently showed that it is NPHard to compute cc(G), but their techniques do not even rule out an efficient (1+ε)approximation algorithmmore »
LESS: A Matrix Split and Balance Algorithm for Parallel Circuit (Optical) or Hybrid Data Center Switching and More
The research problem of how to use a highspeed circuit switch, typically an optical switch, to most effectively boost the switching capacity of a datacenter network, has been extensively studied. In this work, we focus on a different but related research problem that arises when multiple (say $s$) parallel circuit switches are used: How to best split a switching workload $D$ into subworkloads $D_1, D_2, ..., D_s$, and give them to the $s$ switches as their respective workloads, so that the overall makespan of the parallel switching system is minimized? Computing such an optimal split is unfortunately NPhard, since the circuit/optical switch incurs a nontrivial reconfiguration delay when the switch configuration has to change.
In this work, we formulate a weaker form of this problem: How to minimize the total number of nonzero entries in $D_1, D_2, ..., D_s$ (so that the overall reconfiguration cost can be kept low), under the constraint that every row or column sum of $D$ (which corresponds to the workload imposed on a sending or receiving rack respectively) is evenly split? Although this weaker problem is still NPhard, we are able to design LESS, an approximation algorithm that has a low approximation ratio of only $1+\epsilon$ more »
 Publication Date:
 NSFPAR ID:
 10167954
 Journal Name:
 IEEE/ACM 12th International Conference on Utility and Cloud Computing (UCC'19), December 25, 2019, Auckland, New Zealand
 Page Range or eLocationID:
 187 to 197
 Sponsoring Org:
 National Science Foundation
More Like this


The Minimum Circuit Size Problem (MCSP) has been the focus of intense study recently; MCSP is hard for SZK under rather powerful reductions, and is provably not hard under "local" reductions computable in TIME(n^0.49) . The question of whether MCSP is NPhard (or indeed, hard even for small subclasses of P) under some of the more familiar notions of reducibility (such as manyone or Turing reductions computable in polynomial time or in AC^0) is closely related to many of the longstanding open questions in complexity theory. All prior hardness results for MCSP hold also for computing somewhat weak approximations to the circuit complexity of a function. Some of these results were proved by exploiting a connection to a notion of timebounded Kolmogorov complexity (KT) and the corresponding decision problem (MKTP). More recently, a new approach for proving improved hardness results for MKTP was developed, but this approach establishes only hardness of extremely good approximations of the form 1+o(1), and these improved hardness results are not yet known to hold for MCSP. In particular, it is known that MKTP is hard for the complexity class DET under nonuniform AC^0 mreductions, implying MKTP is not in AC^0[p] for any prime p. Itmore »

Abstract We study the lowrank phase retrieval problem, where our goal is to recover a $d_1\times d_2$ lowrank matrix from a series of phaseless linear measurements. This is a fourthorder inverse problem, as we are trying to recover factors of a matrix that have been observed, indirectly, through some quadratic measurements. We propose a solution to this problem using the recently introduced technique of anchored regression. This approach uses two different types of convex relaxations: we replace the quadratic equality constraints for the phaseless measurements by a search over a polytope and enforce the rank constraint through nuclear norm regularization. The result is a convex program in the space of $d_1 \times d_2$ matrices. We analyze two specific scenarios. In the first, the target matrix is rank$1$, and the observations are structured to correspond to a phaseless blind deconvolution. In the second, the target matrix has general rank, and we observe the magnitudes of the inner products against a series of independent Gaussian random matrices. In each of these problems, we show that anchored regression returns an accurate estimate from a nearoptimal number of measurements given that we have access to an anchor matrix of sufficient quality. We also showmore »

Abstract We consider a natural generalization of classical scheduling problems to a setting in which using a time unit for processing a job causes some timedependent cost, the timeofuse tariff, which must be paid in addition to the standard scheduling cost. We focus on preemptive singlemachine scheduling and two classical scheduling cost functions, the sum of (weighted) completion times and the maximum completion time, that is, the makespan. While these problems are easy to solve in the classical scheduling setting, they are considerably more complex when timeofuse tariffs must be considered. We contribute optimal polynomialtime algorithms and best possible approximation algorithms. For the problem of minimizing the total (weighted) completion time on a single machine, we present a polynomialtime algorithm that computes for any given sequence of jobs an optimal schedule, i.e., the optimal set of time slots to be used for preemptively scheduling jobs according to the given sequence. This result is based on dynamic programming using a subtle analysis of the structure of optimal solutions and a potential function argument. With this algorithm, we solve the unweighted problem optimally in polynomial time. For the more general problem, in which jobs may have individual weights, we develop a polynomialtimemore »

We present the first alloptical network, Baldur, to enable powerefficient and highspeed communications in future exascale computing systems. The essence of Baldur is its ability to perform packet routing onthefly in the optical domain using an emerging technology called the transistor laser (TL), which presents interesting opportunities and challenges at the system level. Optical packet switching readily eliminates many inefficiencies associated with the crossings between optical and electrical domains. However, TL gates consume high power at the current technology node, which makes TLbased buffering and optical clock recovery impractical. Consequently, we must adopt novel (bufferless and clockless) architecture and design approaches that are substantially different from those used in current networks. At the architecture level, we support a bufferless design by turning to techniques that have fallen out of favor for current networks. Baldur uses a lowradix, multistage network with a simple routing algorithm that drops packets to handle congestion, and we further incorporate path multiplicity and randomness to minimize packet drops. This design also minimizes the number of TL gates needed in each switch. At the logic design level, a nonconventional, lengthbased data encoding scheme is used to eliminate the need for clock recovery. We thoroughly validate and evaluatemore »