NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Machine Learning-assisted Computational Steering of Large-scale Scientific Simulations

https://doi.org/10.1109/ISPA-BDCloud-SocialCom-SustainCom52081.2021.00138

Liu, W.; Ye, Q.; Wu, C.Q.; Liu, Y.; Zhou, X.; Shan, Y. (October 2021, Proceedings of the 19th IEEE International Symposium on Parallel and Distributed Processing with Applications)
null (Ed.)
Next-generation scientific applications in various fields are experiencing a rapid transition from traditional experiment-based methodologies to large-scale computation-intensive simulations featuring complex numerical modeling with a large number of tunable parameters. Such model-based simulations generate colossal amounts of data, which are then processed and analyzed against experimental or observation data for parameter calibration and model validation. The sheer volume and complexity of such data, the large model-parameter space, and the intensive computation make it practically infeasible for domain experts to manually configure and tune hyperparameters for accurate modeling in complex and distributed computing environments. This calls for an online computational steering service to enable real-time multi-user interaction and automatic parameter tuning. Towards this goal, we design and develop a generic steering framework based on Bayesian Optimization (BO) and conduct theoretical performance analysis of the steering service. We present a case study with the Weather Research and Forecast (WRF) model, which illustrates the performance superiority of the BO-based tuning over other heuristic methods and manual settings of domain experts using regret analysis.
more » « less
Full Text Available
NoStop: A Novel Configuration Optimization Scheme for Spark Streaming

https://doi.org/10.1145/3472456.3472515

Ye, Qianwen; Liu, Wuji; Wu, Chase Q. (August 2021, Proceedings of the 50th International Conference on Parallel Processing)
null (Ed.)
An increasing number of big data applications in various domains generate datasets continuously, which must be processed for various purposes in a timely manner. As one of the most popular streaming data processing systems, Spark Streaming applies a batch-based mechanism, which receives real-time input data streams and divides the data into multiple batches before passing them to Spark processing engine. As such, inappropriate system configurations including batch interval and executor count may lead to unstable states, hence undermining the capability and efficiency of real-time computing. Hence, determining suitable configurations is crucial to the performance of such systems. Many machine learning- and search-based algorithms have been proposed to provide configuration recommendations for streaming applications where input data streams are fed at a constant speed, which, however, is extremely rare in practice. Most real-life streaming applications process data streams arriving at a time-varying rate and hence require real-time system monitoring and continuous configuration adjustment, which still remains largely unexplored. We propose a novel streaming optimization scheme based on Simultaneous Perturbation Stochastic Approximation (SPSA), referred to as NoStop, which dynamically tunes system configurations to optimize real-time system performance with negligible overhead and proved convergence. The performance superiority of NoStop is illustrated by real-life experiments in comparison with Bayesian Optimization and Spark Back Pressure solutions. Extensive experimental results show that NoStop is able to keep track of the changing pattern of input data in real time and provide optimal configuration settings to achieve the best system performance. This optimization scheme could also be applied to other streaming data processing engines with tunable parameters.
more » « less
Full Text Available
Exploratory analysis and performance prediction of big data transfer in High-performance Networks

https://doi.org/10.1016/j.engappai.2021.104285

Yun, Daqing; Liu, Wuji; Wu, Chase Q.; Rao, Nageswara S.V.; Kettimuthu, Rajkumar (June 2021, Engineering Applications of Artificial Intelligence)
null (Ed.)
Full Text Available
Throughput optimization for Storm-based processing of stream data on clouds

https://doi.org/10.1016/j.future.2020.06.009

Cao, Huiyan; Wu, Chase Q.; Bao, Liang; Hou, Aiqin; Shen, Wei (November 2020, Future Generation Computer Systems)
null (Ed.)
Full Text Available
Profiling-based Big Data Workflow Optimization in a Cross-Layer Coupled Design Framework

https://doi.org/10.1007/978-3-030-60248-2_14

Ye, Qianwen; Wu, Chase; Liu, Wuji; Hou, Aiqin; Shen, Wei (September 2020, Lecture notes in computer science)
Qiu, Meikang (Ed.)
Full Text Available
Performance Modeling and Prediction of Big Data Workflows: An Exploratory Analysis

https://doi.org/10.1109/ICCCN49398.2020.9209715

Liu, Wuji; Wu, Chase Q.; Ye, Qianwen; Hou, Aiqin; Shen, Wei (August 2020, 2020 29th International Conference on Computer Communications and Networks (ICCCN))
null (Ed.)
Full Text Available
On Distributed Information Composition in Big Data Systems

https://doi.org/10.1109/eScience.2019.00025

AlQuwaiee, Haifa; He, Songlin; Wu, Chase; Tang, Qiang; Shen, Xuewen (September 2019, 2019 15th International Conference on eScience (eScience))
null (Ed.)
Full Text Available

Search for: All records