skip to main content

Title: Distributed Task-Based Training of Tree Models
Decision trees and tree ensembles are popular supervised learning models on tabular data. Two recent research trends on tree models stand out: (1) bigger and deeper models with many trees, and (2) scalable distributed training frameworks. However, existing implementations on distributed systems are IO-bound leaving CPU cores underutilized. They also only find best node-splitting conditions approximately due to row-based data partitioning scheme. In this paper, we target the exact training of tree models by effectively utilizing the available CPU cores. The resulting system called TreeServer adopts a column-based data partitioning scheme to minimize communication, and a node-centric task-based engine to fully explore the CPU parallelism. Experiments show that TreeServer is up to 10x faster than models in Spark MLlib. We also showcase TreeServer's high training throughput by using it to build big "deep forest" models.
Authors:
; ; ; ; ;
Award ID(s):
1755464
Publication Date:
NSF-PAR ID:
10331910
Journal Name:
Proceedings of the 38th IEEE International Conference on Data Engineering (ICDE)
Sponsoring Org:
National Science Foundation
More Like this
  1. Due to the developments of topographic techniques, clear satellite imagery, and various means for collecting information, geospatial datasets are growing in volume, complexity, and heterogeneity. For efficient execution of spatial computations and analytics on large spatial data sets, parallel processing is required. To exploit fine-grained parallel processing in large scale compute clusters, partitioning in a load-balanced way is necessary for skewed datasets. In this work, we focus on spatial join operation where the inputs are two layers of geospatial data. Our partitioning method for spatial join uses Adaptive Partitioning (ADP) technique, which is based on Quadtree partitioning. Unlike existing partitioning techniques, ADP partitions the spatial join workload instead of partitioning the individual datasets separately to provide better load-balancing. Based on our experimental evaluation, ADP partitions spatial data in a more balanced way than Quadtree partitioning and Uniform grid partitioning. ADP uses an output-sensitive duplication avoidance technique which minimizes duplication of geometries that are not part of spatial join output. In a distributed memory environment, this technique can reduce data communication and storage requirements compared to traditional methods. To improve the performance of ADP, an MPI+Threads based parallelization is presented. With ParADP, a pair of real world datasets, one with 717more »million polylines and another with 10 million polygons, is partitioned into 65,536 grid cells within 7 seconds. ParADP performs well with both good weak scaling up to 4,032 CPU cores and good strong scaling up to 4,032 CPU cores.« less
  2. Deep neural network (DNN) accelerators as an example of domain-specific architecture have demonstrated great success in DNN inference. However, the architecture acceleration for equally important DNN training has not yet been fully studied. With data forward, error backward and gradient calculation, DNN training is a more complicated process with higher computation and communication intensity. Because the recent research demonstrates a diminishing specialization return, namely, “accelerator wall”, we believe that a promising approach is to explore coarse-grained parallelism among multiple performance-bounded accelerators to support DNN training. Distributing computations on multiple heterogeneous accelerators to achieve high throughput and balanced execution, however, remaining challenging. We present ACCPAR, a principled and systematic method of determining the tensor partition among heterogeneous accelerator arrays. Compared to prior empirical or unsystematic methods, ACCPAR considers the complete tensor partition space and can reveal previously unknown new parallelism configurations. ACCPAR optimizes the performance based on a cost model that takes into account both computation and communication costs of a heterogeneous execution environment. Hence, our method can avoid the drawbacks of existing approaches that use communication as a proxy of the performance. The enhanced flexibility of tensor partitioning in ACCPAR allows the flexible ratio of computations to be distributed amongmore »accelerators with different performances. The proposed search algorithm is also applicable to the emerging multi-path patterns in modern DNNs such as ResNet. We simulate ACCPAR on a heterogeneous accelerator array composed of both TPU-v2 and TPU-v3 accelerators for the training of large-scale DNN models such as Alexnet, Vgg series and Resnet series. The average performance improvements of the state-of-the-art “one weird trick” (OWT) and HYPAR, and ACCPAR, normalized to the baseline data parallelism scheme where each accelerator replicates the model and processes different input data in parallel, are 2.98×, 3.78×, and 6.30×, respectively.« less
  3. Deep learning approaches have been adopted in Forestry research including tree classification and inventory prediction. In this study, we proposed an application of a deep learning approach, Temporal Convolution Network, on sequences of radial resistograph profiles to identify non-thrive trees and to predict wood density. Non-destructive resistance drilling measurements on South and West orientations of 274 trees in a 41-year-old Douglas-fir stand in Marion County, Oregon, USA were used as input series. Non-thrive trees were defined based on their changes in social status since establishment. Wood density was derived by X-ray densitometry from cores obtained by increment borers. Data was split for cross validation. Optimal models were fine-tuned with training and validation datasets, then run with test datasets for model evaluation metrics. Results confirmed that the application of the Temporal Convolution Network on resistograph profiles enables non-thrive tree identification with the probability, represented by the area under the Receiver Operator Characteristic curve, equal to 0.823. Temporal Convolution Network for wood density prediction showed a slight improvement in accuracy (RMSE = 18.22) compared to the traditional linear (RMSE = 20.15) and non-linear (RMSE = 20.33) regression methods. We suggest that the use of machine learning algorithms can be a promising methodologymore »for the analysis of sequential data from non-destructive devices.« less
  4. Abstract Tree cover is generally associated with cooler air temperatures in urban environments but the roles of canopy configuration, spatial context, and time of day are not well understood. The ability to examine spatiotemporal relationships between trees and urban climate has been hindered by lack of appropriate air temperature data and, perhaps, by overreliance on a single ‘tree canopy’ class, obscuring the mechanisms by which canopy cools. Here, we use >70 000 air temperature measurements collected by car throughout Washington, DC, USA in predawn (pd), afternoon (aft), and evening (eve) campaigns on a hot summer day. We subdivided tree canopy into ‘soft’ (over unpaved surfaces) and ‘hard’ (over paved surfaces) canopy classes and further partitioned soft canopy into distributed (narrow edges) and clumped patches (edges with interior cores). At each level of subdivision, we predicted air temperature anomalies using generalized additive models for each time of day. We found that the all-inclusive ‘tree canopy’ class cooled linearly at every time (pd = 0.5 °C ± 0.3 °C, aft = 1.8 °C ± 0.6 °C, eve = 1.7 °C ± 0.4 °C), but could be explained in the afternoon by aggregate effects of predominant hard and soft canopy cooling at lowmore »and high canopy cover, respectively. Soft canopy cooled nonlinearly in the afternoon with minimal effect until ∼40% cover but strongly (and linearly) across all cover fractions in the evening (pd = 0.7 °C ± 1.1 °C, aft = 2.0 °C ± 0.7 °C, eve = 2.9 °C ± 0.6 °C). Patches cooled at all times of day despite uneven allocation throughout the city, whereas more distributed canopy cooled in predawn and evening due to increased shading. This later finding is important for urban heat island mitigation planning since it is easier to find planting spaces for distributed trees rather than forest patches.« less
  5. Mining frequent subtree patterns in a tree database (or, forest) is useful in domains such as bioinformatics and mining semi-structured data. We consider the problem of mining embedded subtrees in a database of rooted, labeled, and ordered trees. We compare two existing serial mining algorithms, PrefixTreeSpan and TreeMiner, and adapt them for parallel execution using PrefixFPM, our general-purpose framework for frequent pattern mining that is designed to effectively utilize the CPU cores in a multicore machine. Our experiments show that TreeMiner is faster than its successor PrefixTreeSpan when a limited number of CPU cores are used, as the total mining workloads is smaller; however, PrefixTreeSpan has a much higher speedup ratio and can beat TreeMiner when given enough CPU cores.