NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Recent Research and Operational Tools for Improved Understanding and Diagnosis of Tropical Cyclone Inner Core Structure

https://doi.org/10.2151/jmsj.2025-008

ITO, Kosuke; MIYAMOTO, Yoshiaki; WU, Chun-Chieh; DIDLAKE, Anthony; HLYWIAK, James; HUANG, Yi-Hsuan; LAI, Tsz-Kin; PATTIE, Lauren; QIN, Nannan; SHIMADA, Udai; et al (January 2025, Journal of the Meteorological Society of Japan. Ser. II)

Full Text Available
SingleStore-V: An Integrated Vector Database System in SingleStore

https://doi.org/10.14778/3685800.3685805

Chen, Cheng; Jin, Chenzhe; Zhang, Yunan; Podolsky, Sasha; Wu, Chun; Wang, Szu-Po; Hanson, Eric; Sun, Zhou; Walzer, Robert; Wang, Jianguo (August 2024, Proceedings of the VLDB Endowment)

Vector databases have recently gained significant attention due to the emergence of large language models that produce vector embeddings for text. Existing vector databases can be broadly categorized into two types: specialized and generalized. Specialized vector databases are explicitly designed and optimized for managing vector data, while generalized ones support vector data management within a general purpose database. While specialized vector databases are interesting, there is a substantial customer base interested in generalized vector databases for various reasons, e.g., a reluctance to move data out of relational databases to reduce data silos and costs, the desire to use SQL, and the need for more sophisticated query processing of vector and non-vector data. However, generalized vector databases face two main challenges: performance and interoperability of vector search with SQL, such as combining vector search with filters, joins, or even fulltext search. In this paper, we present SingleStore-V, a full-fledged generalized vector database integrated into SingleStore, a modern distributed relational database optimized for both OLAP and OLTP workloads. SingleStore-V achieves high performance and interoperability via a suite of optimizations. Experiments on standard vector benchmarks show that SingleStore-V performs comparably to Milvus, a highly-optimized specialized vector database, and significantly outperforms pgvector, a popular generalized vector database in PostgreSQL. We believe this paper will shed light on integrating vector search into relational databases in general, as many design concepts and optimizations apply to other databases.
more » « less
Full Text Available
Anomalous Electrical Transport in the Kagome Magnet ${YbFe}_{6} {Ge}_{6}$

https://doi.org/10.1103/PhysRevLett.134.186501

Yao, Weiliang; Liu, Supeng; Kikuchi, Hodaka; Ishikawa, Hajime; Fjellvåg, Øystein_S; Tam, David_W; Ye, Feng; Abernathy, Douglas_L; Wood, George_D A.; Adroja, Devashibhai; et al (May 2025, Physical Review Letters)
On the Three P's of Parallel Programming for Heterogeneous Computing: Performance, Productivity, and Portability

https://doi.org/10.1109/HPEC58863.2023.10363620

Gondhalekar, Atharva; Feng, Wu-chun (September 2023, IEEE)

As FPGAs and GPUs continue to make inroads into high-performance computing (HPC), the need for languages and frameworks that offer performance, productivity, and portability across heterogeneous platforms, such as FPGAs and GPUs, continues to grow. OpenCL and SYCL have emerged as frameworks that offer cross-platform functional portability between FPGAs and GPUs. While functional portability across a diverse set of platforms is an important feature of portable frameworks, achieving performance portability often requires vendor and platform-specific optimizations. Achieving performance portability, therefore, comes at the expense of productivity. This paper presents a quantification of the tradeoffs between performance, portability, and productivity of OpenCL and SYCL. It extends and complements our prior work on quantifying performance-productivity tradeoffs between Verilog and OpenCL for the FPGA. In addition to evaluating the performance-productivity tradeoffs between OpenCL and SYCL, this work quantifies the performance portability (PP) of OpenCL and SYCL as well as their code convergence (CC), i.e., a measure of productivity across different platforms (e.g., FPGA and GPU). Using two applications as case studies (i.e., edge detection using the Sobel filter, and graph link prediction using the Jaccard similarity index), we characterize the tradeoffs between performance, portability, and productivity. Our results show that OpenCL and SYCL offer complementary tradeoffs. While OpenCL delivers better performance portability than SYCL, SYCL offers better code convergence and a 1.6× improvement in source lines of code over OpenCL.
more » « less
Full Text Available
$S^3$: Increasing GPU Utilization during Generative Inference for Higher Throughput

Jin, Yunho; Wu, Chun-Feng; Brooks, David; Wei, Gu-Yeon (December 2023, Advances in neural information processing systems)

Full Text Available
Exact Distributed Stochastic Block Partitioning

https://doi.org/10.1109/CLUSTER52292.2023.00010

Wanye, Frank; Gleyzer, Vitaliy; Kao, Edward; Feng, Wu-Chun (October 2023, IEEE)

Stochastic block partitioning (SBP) is a community detection algorithm that is highly accurate even on graphs with a complex community structure, but its inherently serial nature hinders its widespread adoption by the wider scientific community. To make it practical to analyze large real-world graphs with SBP, there is a growing need to parallelize and distribute the algorithm. The current state-of-the-art distributed SBP algorithm is a divide-and-conquer approach that limits communication between compute nodes until the end of inference. This leads to the breaking of computational dependencies, which causes convergence issues as the number of compute nodes increases and when the graph is sufficiently sparse. To address this shortcoming, we introduce EDiSt — an exact distributed stochastic block partitioning algorithm. Under EDiSt, compute nodes periodically share community assignments during inference. Due to this additional communication, EDiSt improves upon the divide-and-conquer algorithm by allowing it to scale out to a larger number of compute nodes without suffering from convergence issues, even on sparse graphs. We show that EDiSt provides speedups of up to 26.9x over the divide-and-conquer approach and speedups up to 44.0x over shared memory parallel SBP when scaled out to 64 compute nodes.
more » « less
Full Text Available
An Integrated Approach for Accelerating Stochastic Block Partitioning

https://doi.org/10.1109/HPEC58863.2023.10363599

Wanye, Frank; Gleyzer, Vitaliy; Kao, Edward; Feng, Wu-chun (September 2023, IEEE)

Community detection, or graph partitioning, is a fundamental problem in graph analytics with applications in a wide range of domains including bioinformatics, social media analysis, and anomaly detection. Stochastic block partitioning (SBP) is a community detection algorithm based on sequential Bayesian inference. SBP is highly accurate even on graphs with a complex community structure. However, it does not scale well to large real-world graphs that can contain upwards of a million vertices due to its sequential nature. Approximate methods that break computational dependencies improve the scalability of SBP via parallelization and data reduction. However, these relaxations can lead to low accuracy on graphs with complex community structure. In this paper, we introduce additional synchronization steps through vertex-level data batching to improve the accuracy of such methods. We then leverage batching to develop a high-performance parallel approach that improves the scalability of SBP while maintaining accuracy. Our approach is the first to integrate data reduction, shared-memory parallelization, and distributed computation, thus efficiently utilizing distributed computing resources to accelerate SBP. On a one-million vertex graph processed on 64 compute nodes with 128 cores each, our approach delivers a speedup of 322x over the sequential baseline and 6.8x over the distributed-only implementation. To the best of our knowledge, this Graph Challenge submission is the highest-performing SBP implementation to date and the first to process the one-million vertex graph using SBP.
more » « less
Full Text Available
To probe the activation mechanism of the Delta opioid receptor by an agonist ADL5859 started from inactive conformation using molecular dynamic simulations

https://doi.org/10.1080/07391102.2022.2107074

Dean, Emily; Kumar, Vikash; McConnell, Ashleigh; Pagnoncelli, Iohana B.; Wu, Chun (September 2023, Journal of Biomolecular Structure and Dynamics)

Full Text Available
Computational analysis of drug resistance of taxanes bound to human β-tubulin mutant (D26E)

https://doi.org/10.1016/j.jmgm.2023.108503

Uba, Abdullahi Ibrahim; Bui-Linh, Candice; Thornton, Julianne M.; Olivieri, Michael; Wu, Chun (September 2023, Journal of Molecular Graphics and Modelling)

Full Text Available
Computational insights into the binding of pimodivir to the mutated PB2 subunit of the influenza A virus

https://doi.org/10.1080/08927022.2023.2210690

Arba, Muhammad; Ningsih, Aprilia Surya; Bande, La Ode; Wahyudi, Setyanto Tri; Bui-Linh, Candice; Wu, Chun; Karton, Amir (July 2023, Molecular Simulation)

Full Text Available

« Prev Next »

Search for: All records