Performance Analysis of Divide-and-Conquer strategies for Large scale Simulations in R

Subramanian, Ranjini; Zhang, Hui

doi:10.1109/BigData.2018.8622068

Citation Details

Performance Analysis of Divide-and-Conquer strategies for Large scale Simulations in R

As the volume of data and technical complexity of large-scale analysis increases, many domain experts desire powerful computational and familiar analysis interface to fully participate in the analysis workflow by just focusing on individual datasets, leaving the large-scale computation to the system. Towards this goal, we investigate and benchmark a family of Divide-and-Conquer strategies that can help domain experts perform large-scale simulations by scaling up their analysis code written in R, the most popular data science and interactive analysis language. We implement the Divide-and-Conquer strategies that use R as the analysis (and computing) language, allowing advanced users to provide custom R scripts and variables to be fully embedded into the large-scale analysis workflow in R. The whole process will divide large-scale simulations tasks and conquer tasks with Slurm array jobs and R. Simulations and final aggregations are scheduled as array jobs in parallel means to accelerate the knowledge discovery process. The objective is to provide a new analytics workflow for performing similar large-scale analysis loops where expert users only need to focus on the Divide-and-Conquer tasks with the domain knowledge. more »

Award ID(s):: 1726532

PAR ID:: 10107982

Author(s) / Creator(s):: Subramanian, Ranjini; Zhang, Hui

Date Published:: 2018-12-01

Journal Name:: 2018 IEEE International Conference on Big Data (Big Data)

Page Range / eLocation ID:: 4261 to 4267

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1109/BigData.2018.8622068

More Like this