Search for: All records

Creators/Authors contains: "Chowdhury, Md Mashiur"

« Prev Next »

Total Resources

3

Resource Type
Conference Paper

3

Conference Proceeding

0

Dataset

0

Journal Article

0

Workshop Report

0

Availability
Full Text / Resource Available

3

Citation Only

0

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Distributed Task-Based Training of Tree Models

Yan, Da ; Chowdhury, Md Mashiur ; Guo, Guimu ; Kahlil, Jalal ; Jiang, Zhe ; Prasad, Sushil ( January 2022 , Proceedings of the 38th IEEE International Conference on Data Engineering (ICDE))

Decision trees and tree ensembles are popular supervised learning models on tabular data. Two recent research trends on tree models stand out: (1) bigger and deeper models with many trees, and (2) scalable distributed training frameworks. However, existing implementations on distributed systems are IO-bound leaving CPU cores underutilized. They also only find best node-splitting conditions approximately due to row-based data partitioning scheme. In this paper, we target the exact training of tree models by effectively utilizing the available CPU cores. The resulting system called TreeServer adopts a column-based data partitioning scheme to minimize communication, and a node-centric task-based engine to fully explore the CPU parallelism. Experiments show that TreeServer is up to 10x faster than models in Spark MLlib. We also showcase TreeServer's high training throughput by using it to build big "deep forest" models.
more » « less
Full Text Available
G-thinker: A Distributed Framework for Mining Subgraphs in a Big Graph

Yan, Da ; Guo, Guimu ; Chowdhury, Md Mashiur ; Özsu, Tamer ; Ku, Wei-Shinn ; Lui, John C.S. ( January 2020 , Proceedings of the 36th IEEE International Conference on Data Engineering (ICDE))

Mining from a big graph those subgraphs that satisfy certain conditions is useful in many applications such as community detection and subgraph matching. These problems have a high time complexity, but existing systems to scale them are all IO-bound in execution. We propose the first truly CPU-bound distributed framework called G-thinker that adopts a user-friendly subgraph-centric vertex-pulling API for writing distributed subgraph mining algorithms. To utilize all CPU cores of a cluster, G-thinker features (1) a highly-concurrent vertex cache for parallel task access and (2) a lightweight task scheduling approach that ensures high task throughput. These designs well overlap communication with computation to minimize the CPU idle time. Extensive experiments demonstrate that G-thinker achieves orders of magnitude speedup compared even with the fastest existing subgraph-centric system, and it scales well to much larger and denser real network data. G-thinker is open-sourced at http://bit.ly/gthinker with detailed documentation.
more » « less
Full Text Available
T-thinker: a task-centric distributed framework for compute-intensive divide-and-conquer algorithms

https://doi.org/10.1145/3293883.3295709

Yan, Da ; Guo, Guimu ; Chowdhury, Md Mashiur ; Özsu, M. Tamer ; Lui, John C. ; Tan, Weida ( February 2019 , Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming)

Many computationally expensive problems are solved by a divide-and-conquer algorithm: a problem over a big dataset can be recursively divided into independent tasks over smaller subsets of the dataset. We present a distributed general-purpose framework called T-thinker which effectively utilizes the CPU cores in a cluster by properly decomposing an expensive problem into smaller independent tasks for parallel computation. T-thinker well overlaps CPU processing with network communication, and its superior performance is verified over a re-engineered graph mining system G-thinker available at http://cs.uab.edu/yanda/gthinker/
more » « less
Full Text Available