skip to main content


Search for: All records

Award ID contains: 2112533

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    Decision trees are a widely used method for classification, both alone and as the building blocks of multiple different ensemble learning methods. The Max Cut decision tree introduced here involves novel modifications to a standard, baseline variant of a classification decision tree, CART Gini. One modification involves an alternative splitting metric, Max Cut, based on maximizing the distance between all pairs of observations that belong to separate classes and separate sides of the threshold value. The other modification, Node Means PCA, selects the decision feature from a linear combination of the input features constructed using an adjustment to principal component analysis (PCA) locally at each node. Our experiments show that this node-based, localized PCA with the Max Cut splitting metric can dramatically improve classification accuracy while also significantly decreasing computational time compared to the CART Gini decision tree. These improvements are most significant for higher-dimensional datasets. For the example dataset CIFAR-100, the modifications enabled a 49% improvement in accuracy, relative to CART Gini, while providing a$$6.8 \times$$6.8×speed up compared to the Scikit-Learn implementation of CART Gini. These introduced modifications are expected to dramatically advance the capabilities of decision trees for difficult classification tasks.

     
    more » « less