skip to main content


Title: A Unified Approach to Discrepancy Minimization
We study a unified approach and algorithm for constructive discrepancy minimization based on a stochastic process. By varying the parameters of the process, one can recover various state-of-the-art results. We demonstrate the flexibility of the method by deriving a discrepancy bound for smoothed instances, which interpolates between known bounds for worst-case and random instances.  more » « less
Award ID(s):
2106444
NSF-PAR ID:
10414227
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
RANDOM-APPROX
Volume:
245
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Bae, K.-H. ; Feng, B. ; Kim, S. ; Lazarova-Molnar, S. ; Zheng, Z. ; Roeder, T. ; Thiesing, R. (Ed.)
    In the subset-selection approach to ranking and selection, a decision-maker seeks a subset of simulated systems that contains the best with high probability. We present a new, generalized framework for constructing these subsets and demonstrate that some existing subset-selection procedures are situated within this framework. The subsets are built by calculating, for each system, a minimum standardized discrepancy between the observed performances and the space of problem instances for which that system is the best. A system’s minimum standardized discrepancy is then compared to a cutoff to determine whether the system is included in the subset. We examine the problem of finding the tightest statistically valid cutoff for each system and draw connections between our approach and other subset-selection methodologies. Simulation experiments demonstrate how the screening power and subset size are affected by the choice of standardized discrepancy. 
    more » « less
  2. null (Ed.)
    Finding locally optimal solutions for MAX-CUT and MAX- k -CUT are well-known PLS-complete problems. An instinctive approach to finding such a locally optimum solution is the FLIP method. Even though FLIP requires exponential time in worst-case instances, it tends to terminate quickly in practical instances. To explain this discrepancy, the run-time of FLIP has been studied in the smoothed complexity framework. Etscheid and Röglin (ACM Transactions on Algorithms, 2017) showed that the smoothed complexity of FLIP for max-cut in arbitrary graphs is quasi-polynomial. Angel, Bubeck, Peres, and Wei (STOC, 2017) showed that the smoothed complexity of FLIP for max-cut in complete graphs is ( O Φ 5 n 15.1 ), where Φ is an upper bound on the random edge-weight density and Φ is the number of vertices in the input graph. While Angel, Bubeck, Peres, and Wei’s result showed the first polynomial smoothed complexity, they also conjectured that their run-time bound is far from optimal. In this work, we make substantial progress toward improving the run-time bound. We prove that the smoothed complexity of FLIP for max-cut in complete graphs is O (Φ n 7.83 ). Our results are based on a carefully chosen matrix whose rank captures the run-time of the method along with improved rank bounds for this matrix and an improved union bound based on this matrix. In addition, our techniques provide a general framework for analyzing FLIP in the smoothed framework. We illustrate this general framework by showing that the smoothed complexity of FLIP for MAX-3-CUT in complete graphs is polynomial and for MAX - k - CUT in arbitrary graphs is quasi-polynomial. We believe that our techniques should also be of interest toward showing smoothed polynomial complexity of FLIP for MAX - k - CUT in complete graphs for larger constants k . 
    more » « less
  3. Online card transaction fraud is one of the major threats to the bottom line of E-commerce merchants. In this paper, we propose a novel method for online merchants to utilize disposable (“one-time use”) domain names to detect client IP spoofing by collecting client's DNS information during an E-commerce transaction, which in turn can help with transaction fraud detection. By inserting a dynamically generated unique hostname on the E-commerce transaction webpage, a client will issue an identifiable DNS query to the customized authoritative DNS server maintained by the online Merchant. In this way, the online Merchant is able to collect DNS configuration of the client and match it with the client's corresponding transaction in order to verify the consistency of the client's IP address. Any discrepancy can reveal proxy usage, which fraudsters commonly use to spoof their true origins. We have deployed our preliminary prototype system on a real online merchant and successfully collected clients DNS queries correlated with their web transactions; then we show some real instances of successful fraud detection using this method. We also address some concerns regarding the use of disposable domains. 
    more » « less
  4. This paper introduces kdiff, a novel kernel-based measure for estimating distances between instances of time series, random fields and other forms of structured data. This measure is based on the idea of matching distributions that only overlap over a portion of their region of support. Our proposed measure is inspired by MPdist which has been previously proposed for such datasets and is constructed using Euclidean metrics, whereas kdiff is constructed using non-linear kernel distances. Also, kdiff accounts for both self and cross similarities across the instances and is defined using a lower quantile of the distance distribution. Comparing the cross similarity to self similarity allows for measures of similarity that are more robust to noise and partial occlusions of the relevant signals. Our proposed measure kdiff is a more general form of the well known kernel-based Maximum Mean Discrepancy distance estimated over the embeddings. Some theoretical results are provided for separability conditions using kdiff as a distance measure for clustering and classification problems where the embedding distributions can be modeled as two component mixtures. Applications are demonstrated for clustering of synthetic and real-life time series and image data, and the performance of kdiff is compared to competing distance measures for clustering. 
    more » « less
  5. Pedestrian dynamics is an approach for modeling the fine-scaled movement of people. It is finding increasing application in the analysis of infection risk for directly transmitted diseases during air travel. A parameter sweep is often needed to evaluate infection risk for a variety of possible scenarios to account for inherent variability in human behavior. A low discrepancy parameter sweep was recently introduced to reduce the computational effort by one to three orders of magnitude. However, it has the following limitations: (i) a low overhead parallelization leads to significant load imbalance, and (ii) the convergence rate worsens with dimension. This paper examines whether pseudorandom and hybrid sequences can overcome these defects and whether the convergence criteria can be changed to yield accurate solutions faster. We simulate the deplaning process of an airplane using different parameter sweep strategies and evaluate their relative computational efficiencies. Our results show that hybrid and pseudorandom parameter sweeps are advantageous for moderate accuracy, while a low discrepancy sweep is preferable for high accuracy. Our results also show that the convergence criteria could be relaxed substantially to yield accurate solutions around a factor of 20 faster. They promise to help a variety of applications that employ large parameter sweeps for modeling infection risk. 
    more » « less