skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A Unified Approach to Discrepancy Minimization
We study a unified approach and algorithm for constructive discrepancy minimization based on a stochastic process. By varying the parameters of the process, one can recover various state-of-the-art results. We demonstrate the flexibility of the method by deriving a discrepancy bound for smoothed instances, which interpolates between known bounds for worst-case and random instances.  more » « less
Award ID(s):
2106444
PAR ID:
10414227
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
RANDOM-APPROX
Volume:
245
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Bae, K.-H.; Feng, B.; Kim, S.; Lazarova-Molnar, S.; Zheng, Z.; Roeder, T.; Thiesing, R. (Ed.)
    In the subset-selection approach to ranking and selection, a decision-maker seeks a subset of simulated systems that contains the best with high probability. We present a new, generalized framework for constructing these subsets and demonstrate that some existing subset-selection procedures are situated within this framework. The subsets are built by calculating, for each system, a minimum standardized discrepancy between the observed performances and the space of problem instances for which that system is the best. A system’s minimum standardized discrepancy is then compared to a cutoff to determine whether the system is included in the subset. We examine the problem of finding the tightest statistically valid cutoff for each system and draw connections between our approach and other subset-selection methodologies. Simulation experiments demonstrate how the screening power and subset size are affected by the choice of standardized discrepancy. 
    more » « less
  2. The cybersecurity workforce lacks diversity; the field is predominately men and White or Asian, with only 10% identifying as women, Latine, or Black. Previous studies identified access to supportive communities as a possible disparity between marginalized and non-marginalized cybersecurity professional populations and highlighted this support as a key to career success. We focus on these community experiences by conducting a survey of 342 cybersecurity professionals to identify differences in perceptions and experiences of belonging across demographic groups. Our results show a discrepancy between experiences for different gender identities, with women being more likely than men to report instances of harassment and encountering unsupportive environments because of their gender. Psychological safety was low across all demographic groups, meaning participants did not feel comfortable engaging with or speaking up in the community. Based on these result we provide recommendations to community leaders. 
    more » « less
  3. This paper introduces kdiff, a novel kernel-based measure for estimating distances between instances of time series, random fields and other forms of structured data. This measure is based on the idea of matching distributions that only overlap over a portion of their region of support. Our proposed measure is inspired by MPdist which has been previously proposed for such datasets and is constructed using Euclidean metrics, whereas kdiff is constructed using non-linear kernel distances. Also, kdiff accounts for both self and cross similarities across the instances and is defined using a lower quantile of the distance distribution. Comparing the cross similarity to self similarity allows for measures of similarity that are more robust to noise and partial occlusions of the relevant signals. Our proposed measure kdiff is a more general form of the well known kernel-based Maximum Mean Discrepancy distance estimated over the embeddings. Some theoretical results are provided for separability conditions using kdiff as a distance measure for clustering and classification problems where the embedding distributions can be modeled as two component mixtures. Applications are demonstrated for clustering of synthetic and real-life time series and image data, and the performance of kdiff is compared to competing distance measures for clustering. 
    more » « less
  4. Online card transaction fraud is one of the major threats to the bottom line of E-commerce merchants. In this paper, we propose a novel method for online merchants to utilize disposable (“one-time use”) domain names to detect client IP spoofing by collecting client's DNS information during an E-commerce transaction, which in turn can help with transaction fraud detection. By inserting a dynamically generated unique hostname on the E-commerce transaction webpage, a client will issue an identifiable DNS query to the customized authoritative DNS server maintained by the online Merchant. In this way, the online Merchant is able to collect DNS configuration of the client and match it with the client's corresponding transaction in order to verify the consistency of the client's IP address. Any discrepancy can reveal proxy usage, which fraudsters commonly use to spoof their true origins. We have deployed our preliminary prototype system on a real online merchant and successfully collected clients DNS queries correlated with their web transactions; then we show some real instances of successful fraud detection using this method. We also address some concerns regarding the use of disposable domains. 
    more » « less
  5. null (Ed.)
    Nearly all principal cloud providers now provide burstable instances in their offerings. The main attraction of this type of instance is that it can boost its performance for a limited time to cope with workload variations. Although burstable instances are widely adopted, it is not clear how to efficiently manage them to avoid waste of resources. In this paper, we use predictive data analytics to optimize the management of burstable instances. We design CEDULE+, a data-driven framework that enables efficient resource management for burstable cloud instances by analyzing the system workload and latency data. CEDULE+ selects the most profitable instance type to process incoming requests and controls CPU, I/O, and network usage to minimize the resource waste without violating Service Level Objectives (SLOs). CEDULE+ uses lightweight profiling and quantile regression to build a data-driven prediction model that estimates system performance for all combinations of instance type, resource type, and system workload. CEDULE+ is evaluated on Amazon EC2, and its efficiency and high accuracy are assessed through real-case scenarios. CEDULE+ predicts application latency with errors less than 10%, extends the maximum performance period of a burstable instance up to 2.4 times, and decreases deployment costs by more than 50%. 
    more » « less