skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Optimally selecting the top k values from X + Y with layer-ordered heaps
Selection and sorting the Cartesian sum, X + Y, are classic and important problems. Here, a new algorithm is presented, which generates the top k values of the form Xi+Yj. The algorithm relies on layer-ordered heaps, partial orderings of exponentially sized layers. The algorithm relies only on median-of-medians and is simple to implement. Furthermore, it uses data structures contiguous in memory, cache efficient, and fast in practice. The presented algorithm is demonstrated to be theoretically optimal.  more » « less
Award ID(s):
1845465
PAR ID:
10232270
Author(s) / Creator(s):
Date Published:
Journal Name:
PeerJ
ISSN:
2167-8359
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This paper presents a new algorithm for X-ray Computerized Tomography (CT) based on Bukhgeim’s theory of analytic maps. The reconstruction relies on a Cauchy-type integral formula, where the integration over the boundary replaces the integration in the back- projection operator used in existing algorithms. From the numerical computation stand point, the proposed method recovers the attenuation coefficient at arbitrarily points by utilizing the boundary integration without internal global meshes. This means that it achieves high-parallel efficiency, and it reduces computational resources. Some numerical examples are presented to show feasibility of the proposed algorithm. 
    more » « less
  2. The mixed Lp-norm, 0 ≤ p ≤ 2, stabilization algorithm is flexible for constructing a suite of subsurface models with either distinct, or a combination of, smooth, sparse, or blocky structures. This general purpose algorithm can be used for the inversion of data from regions with different subsurface characteristics. Model interpretation is improved by simulta- neous inversion of multiple data sets using a joint inversion approach. An effective and general algorithm is presented for the mixed Lp-norm joint inversion of gravity and magnetic data sets. The imposition of the structural cross-gradient enforces similarity between the reconstructed models. For efficiency the implementation relies on three crucial realistic details; (i) the data are assumed to be on a uniform grid providing sensitivity matrices that decompose in block Toeplitz Toeplitz block form for each depth layer of the model domain and yield efficiency in storage and computation via 2D fast Fourier transforms; (ii) matrix-free implementation for calculating derivatives of parameters reduces memory and computational overhead; and (iii) an alternating updating algorithm is employed. Balancing of the data misfit terms is imposed to assure that the gravity and magnetic data sets are fit with respect to their individual noise levels without overfitting of either model. Strategies to find all weighting parameters within the objective function are described. The algorithm is validated on two synthetic but complicated models. It is applied to invert gravity and magnetic data acquired over two kimberlite pipes in Botswana, producing models that are in good agreement with borehole information available in the survey area. 
    more » « less
  3. Computer Science students in algorithm courses often drop out and feel that what they are learning is disconnected from real life programming. Instructors, on the other hand, feel that algorithmic content is foundational for the long term development of students. The disconnect seems to stem from students not perceiving the importance of algorithmic paradigms, and how they impact performance in applications. We present the point of view that by solving real-world problems where algorithmic paradigms and complexity matter, students will become more engaged with the course and appreciate its importance. Our approach relies on a lean educational framework that provides simplified access to real life datasets and benchmarking features. The assignments we present are all scaffolded, and easily integrated into most algorithms courses. Feedback from using some of the assignments in various courses is presented to argue for the validity of the approach. 
    more » « less
  4. How can we collect the most useful labels to learn a model selection policy, when presented with arbitrary heterogeneous data streams? In this paper, we formulate this task as a contextual active model selection problem, where at each round the learner receives an unlabeled data point along with a context. The goal is to output the best model for any given context without obtaining an excessive amount of labels. In particular, we focus on the task of selecting pre-trained classifiers, and propose a contextual active model selection algorithm (CAMS), which relies on a novel uncertainty sampling query criterion defined on a given policy class for adaptive model selection. In comparison to prior art, our algorithm does not assume a globally optimal model. We provide rigorous theoretical analysis for the regret and query complexity under both adversarial and stochastic settings. Our experiments on several benchmark classification datasets demonstrate the algorithm’s effectiveness in terms of both regret and query complexity. Notably, to achieve the same accuracy, CAMS incurs less than 10% of the label cost when compared to the best online model selection baselines on CIFAR10. 
    more » « less
  5. Multi-instance learning (MIL) handles data that is organized into sets of instances known as bags. Traditionally, MIL is used in the supervised-learning setting for classifying bags which contain any number of instances. However, many traditional MIL algorithms do not scale efficiently to large datasets. In this paper, we present a novel primal–dual multi-instance support vector machine that can operate efficiently on large-scale data. Our method relies on an algorithm derived using a multi-block variation of the alternating direction method of multipliers. The approach presented in this work is able to scale to large-scale data since it avoids iteratively solving quadratic programming problems which are broadly used to optimize MIL algorithms based on SVMs. In addition, we improve our derivation to include an additional optimization designed to avoid solving a least-squares problem in our algorithm, which increases the utility of our approach to handle a large number of features as well as bags. Finally, we derive a kernel extension of our approach to learn nonlinear decision boundaries for enhanced classification capabilities. We apply our approach to both synthetic and real-world multi-instance datasets to illustrate the scalability, promising predictive performance, and interpretability of our proposed method. 
    more » « less