NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Deep Kernel Density Estimation for Photon Mapping

https://doi.org/10.1111/cgf.14052

Zhu, Shilin; Xu, Zexiang; Jensen, Henrik Wann; Su, Hao; Ramamoorthi, Ravi (July 2020, Computer Graphics Forum)

Abstract Recently, deep learning‐based denoising approaches have led to dramatic improvements in low sample‐count Monte Carlo rendering. These approaches are aimed at path tracing, which is not ideal for simulating challenging light transport effects like caustics, where photon mapping is the method of choice. However, photon mapping requires very large numbers of traced photons to achieve high‐quality reconstructions. In this paper, we develop the first deep learning‐based method for particle‐based rendering, and specifically focus on photon density estimation, the core of all particle‐based methods. We train a novel deep neural network to predict a kernel function to aggregate photon contributions at shading points. Our network encodes individual photons into per‐photon features, aggregates them in the neighborhood of a shading point to construct a photon local context vector, and infers a kernel function from the per‐photon and photon local context features. This network is easy to incorporate in many previous photon mapping methods (by simply swapping the kernel density estimator) and can produce high‐quality reconstructions of complex global illumination effects like caustics with an order of magnitude fewer photons compared to previous photon mapping methods. Our approach largely reduces the required number of photons, significantly advancing the computational efficiency in photon mapping.
more » « less
Developing a Best Practices Training Program in Cyberinfrastructure-Enabled Machine Learning Research

https://doi.org/10.1145/3569951.3597543

Thomas, Mary P.; Goetz, Andreas W.; Kandes, Martin C.; Nguyen, Mai; Rodriguez, Paul; Rose, Peter W.; Sinkovits, Robert S. (July 2023, ACM)
A near-storage framework for boosted data preprocessing of mass spectrum clustering

https://doi.org/10.1145/3489517.3530449

Xu, Weihong; Kang, Jaeyoung; Rosing, Tajana (July 2022, Proceedings of the 59th ACM/IEEE Design Automation Conference)

Full Text Available
GENERIC: highly efficient learning engine on edge using hyperdimensional computing

https://doi.org/10.1145/3489517.3530669

Khaleghi, Behnam; Kang, Jaeyoung; Xu, Hanyang; Morris, Justin; Rosing, Tajana (July 2022, Proceedings of the 59th ACM/IEEE Design Automation Conference)

Full Text Available
PatterNet: explore and exploit filter patterns for efficient deep neural networks

https://doi.org/10.1145/3489517.3530422

Khaleghi, Behnam; Mallappa, Uday; Yaldiz, Duygu; Yang, Haichao; Shah, Monil; Kang, Jaeyoung; Rosing, Tajana (July 2022, Proceedings of the 59th ACM/IEEE Design Automation Conference)

Full Text Available
FHDnn: communication efficient and robust federated learning for AIoT networks

https://doi.org/10.1145/3489517.3530394

Chandrasekaran, Rishikanth; Ergun, Kazim; Lee, Jihyun; Nanjunda, Dhanush; Kang, Jaeyoung; Rosing, Tajana (July 2022, Proceedings of the 59th ACM/IEEE Design Automation Conference)

Full Text Available
Auto-scaling HTCondor pools using Kubernetes compute resources

https://doi.org/10.1145/3491418.3535123

Sfiligoi, Igor; DeFanti, Thomas; Würthwein, Frank (July 2022, 2022 Practice & Experience in Advanced Research Computing (PEARC22))

Full Text Available
NASCENT2: Generic Near-Storage Sort Accelerator for Data Analytics on SmartSSD

https://doi.org/10.1145/3472769

Salamat, Sahand; Zhang, Hui; Ki, Yang Seok; Rosing, Tajana (June 2022, ACM Transactions on Reconfigurable Technology and Systems)

As the size of data generated every day grows dramatically, the computational bottleneck of computer systems has shifted toward storage devices. The interface between the storage and the computational platforms has become the main limitation due to its limited bandwidth, which does not scale when the number of storage devices increases. Interconnect networks do not provide simultaneous access to all storage devices and thus limit the performance of the system when executing independent operations on different storage devices. Offloading the computations to the storage devices eliminates the burden of data transfer from the interconnects. Near-storage computing offloads a portion of computations to the storage devices to accelerate big data applications. In this article, we propose a generic near-storage sort accelerator for data analytics, NASCENT2, which utilizes Samsung SmartSSD, an NVMe flash drive with an on-board FPGA chip that processes data in situ. NASCENT2 consists of dictionary decoder, sort, and shuffle FPGA-based accelerators to support sorting database tables based on a key column with any arbitrary data type. It exploits data partitioning applied by data processing management systems, such as SparkSQL, to breakdown the sort operations on colossal tables to multiple sort operations on smaller tables. NASCENT2 generic sort provides 2 × speedup and 15.2 × energy efficiency improvement as compared to the CPU baseline. It moreover considers the specifications of the SmartSSD (e.g., the FPGA resources, interconnect network, and solid-state drive bandwidth) to increase the scalability of computer systems as the number of storage devices increases. With 12 SmartSSDs, NASCENT2 is 9.9× (137.2 ×) faster and 7.3 × (119.2 ×) more energy efficient in sorting the largest tables of TPCC and TPCH benchmarks than the FPGA (CPU) baseline.
more » « less
Full Text Available
Store-n-Learn: Classification and Clustering with Hyperdimensional Computing across Flash Hierarchy

https://doi.org/10.1145/3503541

Gupta, Saransh; Khaleghi, Behnam; Salamat, Sahand; Morris, Justin; Ramkumar, Ranganathan; Yu, Jeffrey; Tiwari, Aniket; Kang, Jaeyoung; Imani, Mohsen; Aksanli, Baris; et al (May 2022, ACM Transactions on Embedded Computing Systems)

Processing large amounts of data, especially in learning algorithms, poses a challenge for current embedded computing systems. Hyperdimensional (HD) computing (HDC) is a brain-inspired computing paradigm that works with high-dimensional vectors called hypervectors . HDC replaces several complex learning computations with bitwise and simpler arithmetic operations at the expense of an increased amount of data due to mapping the data into high-dimensional space. These hypervectors, more often than not, cannot be stored in memory, resulting in long data transfers from storage. In this article, we propose Store-n-Learn, an in-storage computing solution that performs HDC classification and clustering by implementing encoding, training, retraining, and inference across the flash hierarchy. To hide the latency of training and enable efficient computation, we introduce the concept of batching in HDC. We also present on-chip acceleration for HDC encoding in flash planes. This enables us to exploit the high parallelism provided by the flash hierarchy and encode multiple data points in parallel in both batched and non-batched fashion. Store-n-Learn also implements a single top-level FPGA accelerator with novel implementations for HDC classification training, retraining, inference, and clustering on the encoded data. Our evaluation over 10 popular datasets shows that Store-n-Learn is on average 222× (543×) faster than CPU and 10.6× (7.3×) faster than the state-of-the-art in-storage computing solution, INSIDER for HDC classification (clustering).
more » « less
Full Text Available
Online Performance and Power Prediction for Edge TPU via Comprehensive Characterization

https://doi.org/10.23919/DATE54114.2022.9774764

Ni, Yang; Kim, Yeseong; Rosing, Tajana; Imani, Mohsen (March 2022, 2022 Design, Automation & Test in Europe Conference & Exhibition (DATE))

Full Text Available

« Prev Next »

Search for: All records