skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Explainable models via compression of tree ensembles
Ensemble models (bagging and gradient-boosting) of relational decision trees have proved to be some of the most effective learning methods in the area of probabilistic logic models (PLMs). While effective, they lose one of the most important benefits of PLMs—interpretability. In this paper we consider the problem of compressing a large set of learned trees into a single explainable model. To this effect, we propose CoTE—Compression of Tree Ensembles—that produces a single small decision list as a compressed representation. CoTE first converts the trees to decision lists and then performs the combination and compression with the aid of the original training set. An experimental evaluation demonstrates the effectiveness of CoTE in several benchmark relational data sets.  more » « less
Award ID(s):
1941892
PAR ID:
10504944
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Springer
Date Published:
Journal Name:
Machine Learning
Volume:
113
Issue:
3
ISSN:
0885-6125
Page Range / eLocation ID:
1303 to 1328
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Chess endgame tables encode unapproximated game- theoretic values of endgame positions. The speed at which information is retrieved from these tables and their representation size are major limiting factors in their effective use. We explore and make novel extensions to three alternatives (decision trees, decision diagrams, and logic minimization) to the currently preferred implementation (Syzygy) for representing such tables. Syzygy is most compact, but also slowest at handling queries. Two-level logic minimization works well when the full compression algorithm can be run. Decision DAGs and multiterminal binary decision diagrams are comparable and offer the best querying times, with decision diagrams providing better compression. 
    more » « less
  2. Chess endgame tables encode unapproximated game-theoretic values of endgame positions. The speed at which information is retrieved from these tables and their representation size are major limiting factors in their effective use. We explore and make novel extensions to three alternatives (decision trees, decision diagrams, and logic minimization) to the currently preferred implementation (Syzygy) for representing such tables. Syzygy is most compact, but also slowest at handling queries. Two-level logic minimization works well, though performing the compression takes significant time. Decision DAGs and multiterminal binary decision diagrams are both comparable and offer the best querying times, with decision diagrams providing better compression. 
    more » « less
  3. Abstract B-trees are widely recognized as one of the most important index structures in database systems, providing efficient query processing capabilities. Over the past few decades, many techniques have been developed to enhance the efficiency of B-trees from various perspectives. Among them,B-tree compressionis an important technique introduced as early as the 1970s to improve both space efficiency and query performance. Since then, several B-tree compression techniques have been developed. However, to our surprise, we have found that these B-tree compression techniques werenevercompared against each other in prior works. Consequently, many important questions remain unanswered, such as whether B-tree compression is truly effective or not. If it is effective, under what scenarios and which B-tree compression methods should be employed? In this paper, we conduct an experimental evaluation of seven widely used B-tree compression techniques using both synthetic and real datasets. Based on our evaluation, we present lessons and insights regarding the use of B-tree compression that can be leveraged to guide system design decisions in modern databases. 
    more » « less
  4. In this paper we propose a scalable framework for large-scale farm scene modeling that utilizes remote sensing data, specifically satellite images. Our approach begins by accurately extracting and categorizing the distributions of various scene elements from satellite images into four distinct layers: fields, trees, roads, and grasslands. For each layer, we introduce a set of controllable Parametric Layout Models (PLMs). These models are capable of learning layout parameters from satellite images, enabling them to generate complex, large-scale farm scenes that closely reproduce reality across multiple scales. Additionally, our framework provides intuitive control for users to adjust layout parameters to simulate different stages of crop growth and planting patterns. This adaptability makes our model an excellent tool for graphics and virtual reality applications. Experimental results demonstrate that our approach can rapidly generate a variety of realistic and highly detailed farm scenes with minimal inputs. 
    more » « less
  5. We study the problem of selecting most informative subset of a large observation set to enable accurate estimation of unknown parameters. This problem arises in a variety of settings in machine learning and signal processing including feature selection, phase retrieval, and target localization. Since for quadratic measurement models the moment matrix of the optimal estimator is generally unknown, majority of prior work resorts to approximation techniques such as linearization of the observation model to optimize the alphabetical optimality criteria of an approximate moment matrix. Conversely, by exploiting a connection to the classical Van Trees’ inequality, we derive new alphabetical optimality criteria without distorting the relational structure of the observation model. We further show that under certain conditions on parameters of the problem these optimality criteria are monotone and (weak) submodular set functions. These results enable us to develop an efficient greedy observation selection algorithm uniquely tailored for quadratic models, and provide theoretical bounds on its achievable utility. 
    more » « less