skip to main content


Search for: All records

Award ID contains: 1760102

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    Decision trees are a widely used method for classification, both alone and as the building blocks of multiple different ensemble learning methods. The Max Cut decision tree introduced here involves novel modifications to a standard, baseline variant of a classification decision tree, CART Gini. One modification involves an alternative splitting metric, Max Cut, based on maximizing the distance between all pairs of observations that belong to separate classes and separate sides of the threshold value. The other modification, Node Means PCA, selects the decision feature from a linear combination of the input features constructed using an adjustment to principal component analysis (PCA) locally at each node. Our experiments show that this node-based, localized PCA with the Max Cut splitting metric can dramatically improve classification accuracy while also significantly decreasing computational time compared to the CART Gini decision tree. These improvements are most significant for higher-dimensional datasets. For the example dataset CIFAR-100, the modifications enabled a 49% improvement in accuracy, relative to CART Gini, while providing a$$6.8 \times$$6.8×speed up compared to the Scikit-Learn implementation of CART Gini. These introduced modifications are expected to dramatically advance the capabilities of decision trees for difficult classification tasks.

     
    more » « less
  2. Abstract

    We present here classes of integer programming problems that are solvable efficiently and with combinatorial flow algorithms. The problems are characterized by constraints that have either at most two variables per inequality that appear with opposite sign coefficients, or have in addition a third variable that appears only in one constraint. Such integer programs, referred to here as monotone IP2 or IP3, are shown to be solvable in polynomial time for polynomially bounded variables. This article demonstrates the vast applicability of IP2 and IP3 as models for integer programs in multiple scenarios. Since the problems are easily recognized, the knowledge of their structure enables one to determine easily that they are efficiently solvable. The variety of applications, that previously were not known to be solved as efficiently, underlies the importance of recognizing this structure and, if appropriate, formulating problems as monotone IP2 or IP3. Additionally, if there is flexibility in the modeling of an integer programming problem, the formulation choice as monotone IP2 or IP3 leads to efficient algorithms, whereas slightly different modeling choices would lead to NP‐hard problems.

     
    more » « less
  3. Here, we study several variants of matching problems that arise in covariate balancing. Covariate balancing problems can be viewed as variants of matching, or b-matching, with global side constraints. We present here a comprehensive complexity study of the covariate balancing problems providing polynomial time algorithms, or a proof of NP-hardness. The polynomial time algorithms described are mostly combinatorial and rely on network flow techniques. In addition, we present several fixed-parameter tractable results for problems where the number of covariates and the number of levels of each covariate are seen as a parameter. 
    more » « less
  4. null (Ed.)
    Abstract We study a 1-dimensional discrete signal denoising problem that consists of minimizing a sum of separable convex fidelity terms and convex regularization terms, the latter penalize the differences of adjacent signal values. This problem generalizes the total variation regularization problem. We provide here a unified approach to solve the problem for general convex fidelity and regularization functions that is based on the Karush–Kuhn–Tucker optimality conditions. This approach is shown here to lead to a fast algorithm for the problem with general convex fidelity and regularization functions, and a faster algorithm if, in addition, the fidelity functions are differentiable and the regularization functions are strictly convex. Both algorithms achieve the best theoretical worst case complexity over existing algorithms for the classes of objective functions studied here. Also in practice, our C++ implementation of the method is considerably faster than popular C++ nonlinear optimization solvers for the problem. 
    more » « less
  5. null (Ed.)