Title: HPTMT Parallel Operators for High Performance Data Science and Data Engineering
Data-intensive applications are becoming commonplace in all science disciplines. They are comprised of a rich set of sub-domains such as data engineering, deep learning, and machine learning. These applications are built around efficient data abstractions and operators that suit the applications of different domains. Often lack of a clear definition of data structures and operators in the field has led to other implementations that do not work well together. The HPTMT architecture that we proposed recently, identifies a set of data structures, operators, and an execution model for creating rich data applications that links all aspects of data engineering and data science together efficiently. This paper elaborates and illustrates this architecture using an end-to-end application with deep learning and data engineering parts working together. Our analysis show that the proposed system architecture is better suited for high performance computing environments compared to the current big data processing systems. Furthermore our proposed system emphasizes the importance of efficient compact data structures such as Apache Arrow tabular data representation defined for high performance. Thus the system integration we proposed scales a sequential computation to a distributed computation retaining optimum performance along with highly usable application programming interface. more »« less
C. Deng, Y. Gong(
, Asilomar Conference on Signals, Systems, and Computers)
null
(Ed.)
Gaussian process (GP) is a popular machine learning technique that is widely used in many application domains, especially in robotics. However, GP is very computation intensive and time consuming during the inference phase, thereby bringing severe challenges for its large-scale deployment in real-time applications. In this paper, we propose two efficient hardware architecture for GP accelerator. One architecture targets for general GP inference, and the other architecture is specifically optimized for the scenario when the data point is gradually observed. Evaluation results show that the proposed hardware accelerator provides significant hardware performance improvement than the general-purpose computing platform.
Dargazany, Aras R.; Stegagno, Paolo; Mankodiya, Kunal(
, Mobile Information Systems)
This work introduces Wearable deep learning (WearableDL) that is a unifying conceptual architecture inspired by the human nervous system, offering the convergence of deep learning (DL), Internet-of-things (IoT), and wearable technologies (WT) as follows: (1) the brain, the core of the central nervous system, represents deep learning for cloud computing and big data processing. (2) The spinal cord (a part of CNS connected to the brain) represents Internet-of-things for fog computing and big data flow/transfer. (3) Peripheral sensory and motor nerves (components of the peripheral nervous system (PNS)) represent wearable technologies as edge devices for big data collection. In recent times, wearable IoT devices have enabled the streaming of big data from smart wearables (e.g., smartphones, smartwatches, smart clothings, and personalized gadgets) to the cloud servers. Now, the ultimate challenges are (1) how to analyze the collected wearable big data without any background information and also without any labels representing the underlying activity; and (2) how to recognize the spatial/temporal patterns in this unstructured big data for helping end-users in decision making process, e.g., medical diagnosis, rehabilitation efficiency, and/or sports performance. Deep learning (DL) has recently gained popularity due to its ability to (1) scale to the big data size (scalability); (2) learn the feature engineering by itself (no manual feature extraction or hand-crafted features) in an end-to-end fashion; and (3) offer accuracy or precision in learning raw unlabeled/labeled (unsupervised/supervised) data. In order to understand the current state-of-the-art, we systematically reviewed over 100 similar and recently published scientific works on the development of DL approaches for wearable and person-centered technologies. The review supports and strengthens the proposed bioinspired architecture of WearableDL. This article eventually develops an outlook and provides insightful suggestions for WearableDL and its application in the field of big data analytics.
It has been recognized that jobs across different domains is becoming more data driven, and many aspects of the economy, society, and daily life depend more and more on data. Undergraduate education offers a critical link in providing more data science and engineering (DSE) exposure to students and expanding the supply of DSE talent. The National Academies have identified that effective DSE education requires both appropriate classwork and hands-on experience with real data and real applications. Currently significant progress has been made in classwork, while progress in hands-on research experience has been lacking. To fill this gap, we have proposed to create data-enabled engineering project (DEEP) modules based on real data and applications, which is currently funded by the National Science Foundation (NSF) under the Improving Undergraduate STEM Education (IUSE) program. To achieve project goal, we have developed two internet-of-things (IoT) enabled laboratory engineering testbeds (LETs) and generated real data under various application scenarios. In addition, we have designed and developed several sample DEEP modules in interactive Jupyter Notebook using the generated data. These sample DEEP modules will also be ported to other interactive DSE learning environments, including Matlab Live Script and R Markdown, for wide and easy adoption. Finally, we have conducted metacognitive awareness gain (MAG) assessments to establish a baseline for assessing the effectiveness of DEEP modules in enhancing students’ reflection and metacognition. The DEEP modules that are currently being developed target students in Chemical Engineering, Electrical Engineering, Computer Science, and MS program in Data Science at xxx University. The modules will be deployed in the Spring of 2021, and we expect to have immediate impact to the targeted classes and students. We also anticipate that the DEEP modules can be adopted without modification to other disciplines in Engineering such as Mechanical, Industrial and Aerospace Engineering. They can also be easily extended to other disciplines in other colleges such as Liberal Arts by incorporating real data and applications from the respective disciplines. In this work, we will share our ideas, the rationale behind the proposed approach, the planned tasks for the project, the demonstration of modules developed, and potential dissemination venues.
Poduval, P.; Issa, M.; Imani, F.; Najafi, M. H.; and Imani, M.(
, Proceedings of 16th IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH 2021))
Brain-inspired HyperDimensional Computing
(HDC) is an alternative computation model working based
on the observation that the human brain operates on highdimensional
representations of data. Existing HDC solutions rely
on expensive pre-processing algorithms for feature extraction.
In this paper, we propose StocHD, a novel end-to-end
hyperdimensional system that supports accurate, efficient,
and robust learning over raw data. StocHD expands HDC
functionality to the computing area by mathematically defining
stochastic arithmetic over HDC hypervectors. StocHD enables
an entire learning application (including feature extractor)
to process using HDC data representation, enabling uniform,
efficient, robust, and highly parallel computation. We also
propose a novel fully digital and scalable Processing In-Memory
(PIM) architecture that exploits the HDC memory-centric
nature to support extensively parallel computation.
For its robust predictive power (compared to pure physics-based models) and sample-efficient training (compared to pure deep learning models), physics-informed deep learning (PIDL), a paradigm hybridizing physics-based models and deep neural networks (DNNs), has been booming in science and engineering fields. One key challenge of applying PIDL to various domains and problems lies in the design of a computational graph that integrates physics and DNNs. In other words, how the physics is encoded into DNNs and how the physics and data components are represented. In this paper, we offer an overview of a variety of architecture designs of PIDL computational graphs and how these structures are customized to traffic state estimation (TSE), a central problem in transportation engineering. When observation data, problem type, and goal vary, we demonstrate potential architectures of PIDL computational graphs and compare these variants using the same real-world dataset.
Abeykoon, Vibhatha, Kamburugamuve, Supun, Widanage, Chathura, Perera, Niranda, Uyar, Ahmet, Kanewala, Thejaka Amila, von Laszewski, Gregor, and Fox, Geoffrey. HPTMT Parallel Operators for High Performance Data Science and Data Engineering. Retrieved from https://par.nsf.gov/biblio/10381405. Frontiers in Big Data 4. Web. doi:10.3389/fdata.2021.756041.
Abeykoon, Vibhatha, Kamburugamuve, Supun, Widanage, Chathura, Perera, Niranda, Uyar, Ahmet, Kanewala, Thejaka Amila, von Laszewski, Gregor, & Fox, Geoffrey. HPTMT Parallel Operators for High Performance Data Science and Data Engineering. Frontiers in Big Data, 4 (). Retrieved from https://par.nsf.gov/biblio/10381405. https://doi.org/10.3389/fdata.2021.756041
Abeykoon, Vibhatha, Kamburugamuve, Supun, Widanage, Chathura, Perera, Niranda, Uyar, Ahmet, Kanewala, Thejaka Amila, von Laszewski, Gregor, and Fox, Geoffrey.
"HPTMT Parallel Operators for High Performance Data Science and Data Engineering". Frontiers in Big Data 4 (). Country unknown/Code not available. https://doi.org/10.3389/fdata.2021.756041.https://par.nsf.gov/biblio/10381405.
@article{osti_10381405,
place = {Country unknown/Code not available},
title = {HPTMT Parallel Operators for High Performance Data Science and Data Engineering},
url = {https://par.nsf.gov/biblio/10381405},
DOI = {10.3389/fdata.2021.756041},
abstractNote = {Data-intensive applications are becoming commonplace in all science disciplines. They are comprised of a rich set of sub-domains such as data engineering, deep learning, and machine learning. These applications are built around efficient data abstractions and operators that suit the applications of different domains. Often lack of a clear definition of data structures and operators in the field has led to other implementations that do not work well together. The HPTMT architecture that we proposed recently, identifies a set of data structures, operators, and an execution model for creating rich data applications that links all aspects of data engineering and data science together efficiently. This paper elaborates and illustrates this architecture using an end-to-end application with deep learning and data engineering parts working together. Our analysis show that the proposed system architecture is better suited for high performance computing environments compared to the current big data processing systems. Furthermore our proposed system emphasizes the importance of efficient compact data structures such as Apache Arrow tabular data representation defined for high performance. Thus the system integration we proposed scales a sequential computation to a distributed computation retaining optimum performance along with highly usable application programming interface.},
journal = {Frontiers in Big Data},
volume = {4},
author = {Abeykoon, Vibhatha and Kamburugamuve, Supun and Widanage, Chathura and Perera, Niranda and Uyar, Ahmet and Kanewala, Thejaka Amila and von Laszewski, Gregor and Fox, Geoffrey},
}
Warning: Leaving National Science Foundation Website
You are now leaving the National Science Foundation website to go to a non-government website.
Website:
NSF takes no responsibility for and exercises no control over the views expressed or the accuracy of
the information contained on this site. Also be aware that NSF's privacy policy does not apply to this site.