skip to main content

Search for: All records

Creators/Authors contains: "Prasad, Sushil"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Decision trees and tree ensembles are popular supervised learning models on tabular data. Two recent research trends on tree models stand out: (1) bigger and deeper models with many trees, and (2) scalable distributed training frameworks. However, existing implementations on distributed systems are IO-bound leaving CPU cores underutilized. They also only find best node-splitting conditions approximately due to row-based data partitioning scheme. In this paper, we target the exact training of tree models by effectively utilizing the available CPU cores. The resulting system called TreeServer adopts a column-based data partitioning scheme to minimize communication, and a node-centric task-based engine to fully explore the CPU parallelism. Experiments show that TreeServer is up to 10x faster than models in Spark MLlib. We also showcase TreeServer's high training throughput by using it to build big "deep forest" models.
  2. In recent times, geospatial datasets are growing in terms of size, complexity and heterogeneity. High performance systems are needed to analyze such data to produce actionable insights in an efficient manner. For polygonal a.k.a vector datasets, operations such as I/O, data partitioning, communication, and load balancing becomes challenging in a cluster environment. In this work, we present MPI-Vector-IO, a parallel I/O library that we have designed using MPI-IO specifically for partitioning and reading irregular vector data formats such as Well Known Text. It makes MPI aware of spatial data, spatial primitives and provides support for spatial data types embedded within collective computation and communication using MPI message-passing library. These abstractions along with parallel I/O support are useful for parallel Geographic Information System (GIS) application development on HPC platforms. Performance evaluation is done on Lustre and GPFS filesystems. MPI-Vector-IO scales well with MPI processes and file size and achieves bandwidth up to 22 GB/s for common spatial data access patterns. We observed that independent file read functions performed better than collective functions in MPI-IO for contiguous access pattern on Lustre. In general, the I/O is improved by one to two orders of magnitude over real-world datasets using up to 1152 CPUmore »cores. Spatial Join query is used as an exemplar to demonstrate an end-to-end application using MPI-Vector-IO.« less
  3. This special issue is devoted to progress in one of the most important challenges facing computing education.The work published here is of relevance to those who teach computing related topics at all levels, with greatest implications for undergraduate education. Parallel and distributed computing (PDC) has become ubiquitous to the extent that even casual users feel their impact. This necessitates that every programmer understands how parallelism and a distributed environment affect problem solving. Thus,teaching only traditional, sequential programming is no longer adequate. For this reason, it is essential to impart a range of PDC and high performance computing (HPC) knowledge and skills at various levels within the educational fabric woven by Computer Science (CS), Computer Engineering (CE), and related computational science and engineering curricula. This special issue sought high quality contributions in the fields of PDC and HPC education. Submissions were on the topics of EduPar2016, Euro-EduPar2016 and EduHPC2016 workshops,but the submission was open to all. This special issue includes 12 paper spanning pedagogical techniques, tools and experiences.