skip to main content

Title: Logarithmic heavy traffic error bounds in generalized switch and load balancing systems
Abstract Motivated by applications to wireless networks, cloud computing, data centers, etc., stochastic processing networks have been studied in the literature under various asymptotic regimes. In the heavy traffic regime, the steady-state mean queue length is proved to be $\Theta({1}/{\epsilon})$ , where $\epsilon$ is the heavy traffic parameter (which goes to zero in the limit). The focus of this paper is on obtaining queue length bounds on pre-limit systems, thus establishing the rate of convergence to heavy traffic. For the generalized switch, operating under the MaxWeight algorithm, we show that the mean queue length is within $\textrm{O}({\log}({1}/{\epsilon}))$ of its heavy traffic limit. This result holds regardless of the complete resource pooling (CRP) condition being satisfied. Furthermore, when the CRP condition is satisfied, we show that the mean queue length under the MaxWeight algorithm is within $\textrm{O}({\log}({1}/{\epsilon}))$ of the optimal scheduling policy. Finally, we obtain similar results for the rate of convergence to heavy traffic of the total queue length in load balancing systems operating under the ‘join the shortest queue’ routeing algorithm.  more » « less
Award ID(s):
1850439 2144316
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Journal of Applied Probability
Page Range / eLocation ID:
652 to 669
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The matrix completion problem seeks to recover a $d\times d$ ground truth matrix of low rank $r\ll d$ from observations of its individual elements. Real-world matrix completion is often a huge-scale optimization problem, with $d$ so large that even the simplest full-dimension vector operations with $O(d)$ time complexity become prohibitively expensive. Stochastic gradient descent (SGD) is one of the few algorithms capable of solving matrix completion on a huge scale, and can also naturally handle streaming data over an evolving ground truth. Unfortunately, SGD experiences a dramatic slow-down when the underlying ground truth is ill-conditioned; it requires at least $O(\kappa\log(1/\epsilon))$ iterations to get $\epsilon$-close to ground truth matrix with condition number $\kappa$. In this paper, we propose a preconditioned version of SGD that preserves all the favorable practical qualities of SGD for huge-scale online optimization while also making it agnostic to $\kappa$. For a symmetric ground truth and the Root Mean Square Error (RMSE) loss, we prove that the preconditioned SGD converges to $\epsilon$-accuracy in $O(\log(1/\epsilon))$ iterations, with a rapid linear convergence rate as if the ground truth were perfectly conditioned with $\kappa=1$. In our numerical experiments, we observe a similar acceleration for ill-conditioned matrix completion under the 1-bit cross-entropy loss, as well as pairwise losses such as the Bayesian Personalized Ranking (BPR) loss. 
    more » « less
  2. Abstract We present a new elementary algorithm that takes $$ \textrm{time} \ \ O_\epsilon \left( x^{\frac{3}{5}} (\log x)^{\frac{8}{5}+\epsilon } \right) \ \ \textrm{and} \ \textrm{space} \ \ O\left( x^{\frac{3}{10}} (\log x)^{\frac{13}{10}} \right) $$ time O ϵ x 3 5 ( log x ) 8 5 + ϵ and space O x 3 10 ( log x ) 13 10 (measured bitwise) for computing $$M(x) = \sum _{n \le x} \mu (n),$$ M ( x ) = ∑ n ≤ x μ ( n ) , where $$\mu (n)$$ μ ( n ) is the Möbius function. This is the first improvement in the exponent of x for an elementary algorithm since 1985. We also show that it is possible to reduce space consumption to $$O(x^{1/5} (\log x)^{5/3})$$ O ( x 1 / 5 ( log x ) 5 / 3 ) by the use of (Helfgott in: Math Comput 89:333–350, 2020), at the cost of letting time rise to the order of $$x^{3/5} (\log x)^2 \log \log x$$ x 3 / 5 ( log x ) 2 log log x . 
    more » « less
  3. In this paper, we show that under over-parametrization several standard stochastic optimization algorithms escape saddle-points and converge to local-minimizers much faster. One of the fundamental aspects of over-parametrized models is that they are capable of interpolating the training data. We show that, under interpolation-like assumptions satisfied by the stochastic gradients in an overparametrization setting, the first-order oracle complexity of Perturbed Stochastic Gradient Descent (PSGD) algorithm to reach an \epsilon-local-minimizer, matches the corresponding deterministic rate of ˜O(1/\epsilon^2). We next analyze Stochastic Cubic-Regularized Newton (SCRN) algorithm under interpolation-like conditions, and show that the oracle complexity to reach an \epsilon-local-minimizer under interpolation-like conditions, is ˜O(1/\epsilon^2.5). While this obtained complexity is better than the corresponding complexity of either PSGD, or SCRN without interpolation-like assumptions, it does not match the rate of ˜O(1/\epsilon^1.5) corresponding to deterministic Cubic-Regularized Newton method. It seems further Hessian-based interpolation-like assumptions are necessary to bridge this gap. We also discuss the corresponding improved complexities in the zeroth-order settings. 
    more » « less
  4. We consider the problem of performing linear regression over a stream of d-dimensional examples, and show that any algorithm that uses a subquadratic amount of memory exhibits a slower rate of convergence than can be achieved without memory constraints. Specifically, consider a sequence of labeled examples (a_1,b_1), (a_2,b_2)..., with a_i drawn independently from a d-dimensional isotropic Gaussian, and where b_i = + \eta_i, for a fixed x in R^d with ||x||= 1 and with independent noise \eta_i drawn uniformly from the interval [-2^{-d/5},2^{-d/5}]. We show that any algorithm with at most d^2/4 bits of memory requires at least \Omega(d \log \log \frac{1}{\epsilon}) samples to approximate x to \ell_2 error \epsilon with probability of success at least 2/3, for \epsilon sufficiently small as a function of d. In contrast, for such \epsilon, x can be recovered to error \epsilon with probability 1-o(1) with memory O\left(d^2 \log(1/\epsilon)\right) using d examples. This represents the first nontrivial lower bounds for regression with super-linear memory, and may open the door for strong memory/sample tradeoffs for continuous optimization. 
    more » « less
  5. We characterize heavy-traffic process and steady-state limits for systems staffed according to the square-root safety rule, when the service requirements of the customers are perfectly correlated with their individual patience for waiting in queue. Under the usual many-server diffusion scaling, we show that the system is asymptotically equivalent to a system with no abandonment. In particular, the limit is the Halfin-Whitt diffusion for the [Formula: see text] queue when the traffic intensity approaches its critical value 1 from below, and is otherwise a transient diffusion, despite the fact that the prelimit is positive recurrent. To obtain a refined measure of the congestion due to the correlation, we characterize a lower-order fluid (LOF) limit for the case in which the diffusion limit is transient, demonstrating that the queue in this case scales like [Formula: see text]. Under both the diffusion and LOF scalings, we show that the stationary distributions converge weakly to the time-limiting behavior of the corresponding process limit. Funding: This work was supported by the National Natural Science Foundation of China [Grant 72188101] and the Division of Civil, Mechanical and Manufacturing Innovation [Grants 1763100 and 2006350]. 
    more » « less