NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A Distributed Matrix-Block-Vector Multiplication in Presence of System Performance Variability

https://doi.org/10.1145/3774934.3786453

Ma, Yuchen; Stathopoulos, Andreas; Ren, Bin (January 2026, Proceedings of the 31st ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming}{January 31 -- February 4, 2026)

Distributed matrix-block-vector multiplication (Matvec) algorithm is a critical component of many applications, but can be computationally challenging for dense matrices of dimension \(O(10^6\text{--}10^7)\) and blocks of \(O(10\text{--}100)\) vectors. We present performance analysis, implementation, and optimization of our \pname{} library for Matvec under the effect of system variability. Our modeling shows that 1D pipelining Matvec is as efficient as 2D algorithms at small to medium clusters, which are sufficient for these problem sizes. We develop a performance tracing framework and a simulator that reveal pipeline bubbles caused by modest \textasciitilde{}5\% system variability. To tolerate such variability, our \pname{} library, which combines on-the-fly kernel matrix generation and Matvec, integrates four optimizations: inter-process data preloading, unconventional static thread scheduling, cache-aware tiling, and multi-version unrolling. In our benchmarks on \(O(10^5)\) Matvec problems, \pname{} achieves up to 1.85× speedup over COSMA and 17× over ScaLAPACK. For \(O(10^6)\) problems, where COSMA and ScaLAPACK exceed memory capacity, \pname{} maintains linear strong scaling and achieves peak performance of 75\%~FMA~Flop/s. Its static scheduling policy has a 2.27× speedup compared to the conventional work-stealing dynamic scheduler, and is predicted to withstand up to 108\% performance variability under exponential distributed variability simulation.
more » « less
Free, publicly-accessible full text available January 31, 2027
Towards Recognizing Food Types for Unseen Subjects

https://doi.org/10.1145/3696424

Guan, Jiexiong; Wang, Junjie; Niu, Wei; Peng, Zhen; Wang, Shuangquan; Liu, Zhenming; Zhou, Gang; Ren, Bin (September 2024, ACM Transactions on Computing for Healthcare)

Recognizing food types through sensor signals for unseen users remains remarkably challenging, despite extensive recent studies. The efficacy of prior machine learning techniques is dwarfed by giant variations of data collected from multiple participants, partly because users have varied chewing habits and wear sensor devices in various manners. This work treats the problem as an instance of the domain adaptation problem, where each user represents a domain. We develop the first multi-source domain adaptation (MSDA) method for food-typing recognition, which consists of three major components: stratified normalization, a multi-source domain adaptor, and adaptive ensemble learning. New techniques are developed for each component. Using a real-world dataset comprised of 15 participants, we demonstrate that our method achieves\(1.33\times\)to\(2.13\times\)improvement in accuracy compared with nine state-of-the-art MSDA baselines. Additionally, we perform an in-depth ablation study to examine the behavior of each component and confirm their efficacy.
more » « less
Full Text Available
On Item-Sampling Evaluation for Recommender System

https://doi.org/10.1145/3629171

Li, Dong; Jin, Ruoming; Liu, Zhenming; Ren, Bin; Gao, Jing; Liu, Zhi (March 2024, ACM Transactions on Recommender Systems)

Personalized recommender systems play a crucial role in modern society, especially in e-commerce, news, and ads areas. Correctly evaluating and comparing candidate recommendation models is as essential as constructing ones. The common offline evaluation strategy is holding out some user-interacted items from training data and evaluating the performance of recommendation models based on how many items they can retrieve. Specifically, for any hold-out item or so-called target item for a user, the recommendation models try to predict the probability that the user would interact with the item and rank it among overall items, which is calledglobal evaluation. Intuitively, a good recommendation model would assign high probabilities to such hold-out/target items. Based on the specific ranks, some metrics likeRecall@KandNDCG@Kcan be calculated to further quantify the quality of the recommender model. Instead of ranking the target items among all items, Koren first proposed to rank them among a smallsampled set of items, then quantified the performance of the models, which is calledsampling evaluation. Ever since then, there has been a large amount of work adopting sampling evaluation due to its efficiency and frugality. In recent work, Rendle and Krichene argued that the sampling evaluation is “inconsistent” with respect to a global evaluation in terms of offline top-Kmetrics. In this work, we first investigate the “inconsistent” phenomenon by taking a glance at the connections between sampling evaluation and global evaluation. We reveal the approximately linear relationship between sampling with respect to its global counterpart in terms of the top-KRecall metric. Second, we propose a new statistical perspective of the sampling evaluation—to estimate the global rank distribution of the entire population. After the estimated rank distribution is obtained, the approximation of the global metric can be further derived. Third, we extend the work of Krichene and Rendle, directly optimizing the error with ground truth, providing not only a comprehensive empirical study but also a rigorous theoretical understanding of the proposed metric estimators. To address the “blind spot” issue, where accurately estimating metrics for small top-Kvalues in sampling evaluation is challenging, we propose a novel adaptive sampling method that generalizes the expectation-maximization algorithm to this setting. Last but not least, we also study the user sampling evaluation effect. This series of works outlines a clear roadmap for sampling evaluation and establishes a foundational theoretical framework. Extensive empirical studies validate the reliability of the sampling methods presented.
more » « less
Full Text Available
Distributional Cloning for Stabilized Imitation Learning via ADMM

https://doi.org/10.1109/ICDM58522.2023.00091

Zhang, Xin; Li, Yanhua; Zhang, Ziming; Brinton, Christopher G.; Liu, Zhenming; Zhang, Zhi-Li (December 2023, IEEE)

Full Text Available
Parallel Software for Million-scale Exact Kernel Regression

https://doi.org/10.1145/3577193.3593737

Chen, Yu; Skon, Lucca; Mccombs, James; Liu, Zhenming; Stathopoulos, Andreas (June 2023, ACM)
Symphony in the Latent Space: Provably Integrating High-dimensional Techniques with Non-linear Machine Learning Models

Qiong Wu; Jian Li; Zhenming Liu; Yanhua Li; Mihai Cucuringu (April 2023, Proceedings of the AAAI Conference on Artificial Intelligence)

This paper revisits building machine learning algorithms that involve interactions between entities, such as those between financial assets in an actively managed portfolio, or interac- tions between users in a social network. Our goal is to forecast the future evolution of ensembles of multivariate time series in such applications (e.g., the future return of a financial asset or the future popularity of a Twitter account). Designing ML algorithms for such systems requires addressing the challenges of high-dimensional interactions and non-linearity. Existing approaches usually adopt an ad-hoc approach to integrating high-dimensional techniques into non-linear models and re- cent studies have shown these approaches have questionable efficacy in time-evolving interacting systems. To this end, we propose a novel framework, which we dub as the additive influence model. Under our modeling assump- tion, we show that it is possible to decouple the learning of high-dimensional interactions from the learning of non-linear feature interactions. To learn the high-dimensional interac- tions, we leverage kernel-based techniques, with provable guarantees, to embed the entities in a low-dimensional latent space. To learn the non-linear feature-response interactions, we generalize prominent machine learning techniques, includ- ing designing a new statistically sound non-parametric method and an ensemble learning algorithm optimized for vector re- gressions. Extensive experiments on two common applica- tions demonstrate that our new algorithms deliver significantly stronger forecasting power compared to standard and recently proposed methods.
more » « less
Full Text Available
iQAN: Fast and Accurate Vector Search with Efficient Intra-Query Parallelism on Multi-Core Architectures

https://doi.org/10.1145/3572848.3577527

Peng, Zhen; Zhang, Minjia; Li, Kai; Jin, Ruoming; Ren, Bin (February 2023, PPoPP '23: Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming)

Vector search has drawn a rapid increase of interest in the research community due to its application in novel AI applications. Maximizing its performance is essential for many tasks but remains preliminary understood. In this work, we investigate the root causes of the scalability bottleneck of using intra-query parallelism to speedup the state-of-the-art graph-based vector search systems on multi-core architectures. Our in-depth analysis reveals several scalability challenges from both system and algorithm perspectives. Based on the insights, we propose iQAN, a parallel search algorithm with a set of optimizations that boost convergence, avoid redundant computations, and mitigate synchronization overhead. Our evaluation results on a wide range of real-world datasets show that iQAN achieves up to 37.7× and 76.6× lower latency than state-of-the-art sequential baselines on datasets ranging from a million to a hundred million datasets. We also show that iQAN achieves outstanding scalability as the graph size or the accuracy target increases, allowing it to outperform the state-of-the-art baseline on two billion-scale datasets by up to 16.0× with up to 64 cores.
more » « less
Symphony in the Latent Space: Provably Integrating High-dimensional Techniques with Non-linear Machine Learning Models

https://doi.org/10.1609/aaai.v37i9.26233

Wu, Qiong; Li, Jian; Liu, Zhenming; Li, Yanhua; Cucuringu, Mihai (February 2023, AAAI)

This paper revisits building machine learning algorithms that involve interactions between entities, such as those between financial assets in an actively managed portfolio, or interactions between users in a social network. Our goal is to forecast the future evolution of ensembles of multivariate time series in such applications (e.g., the future return of a financial asset or the future popularity of a Twitter account). Designing ML algorithms for such systems requires addressing the challenges of high-dimensional interactions and non-linearity. Existing approaches usually adopt an ad-hoc approach to integrating high-dimensional techniques into non-linear models and re- cent studies have shown these approaches have questionable efficacy in time-evolving interacting systems. To this end, we propose a novel framework, which we dub as the additive influence model. Under our modeling assump- tion, we show that it is possible to decouple the learning of high-dimensional interactions from the learning of non-linear feature interactions. To learn the high-dimensional interac- tions, we leverage kernel-based techniques, with provable guarantees, to embed the entities in a low-dimensional latent space. To learn the non-linear feature-response interactions, we generalize prominent machine learning techniques, includ- ing designing a new statistically sound non-parametric method and an ensemble learning algorithm optimized for vector re- gressions. Extensive experiments on two common applica- tions demonstrate that our new algorithms deliver significantly stronger forecasting power compared to standard and recently proposed methods.
more » « less
Full Text Available
Learning Lightweight Neural Networks via Channel-Split Recurrent Convolution

https://doi.org/10.1109/WACV56688.2023.00385

Wu, Guojun; Zhang, Xin; Zhang, Ziming; Li, Yanhua; Zhou, Xun; Brinton, Christopher; Liu, Zhenming (January 2023, IEEE)

Lightweight neural networks refer to deep networks with small numbers of parameters, which can be deployed in resource-limited hardware such as embedded systems. To learn such lightweight networks effectively and efficiently, in this paper we propose a novel convolutional layer, namely Channel-Split Recurrent Convolution (CSR-Conv), where we split the output channels to generate data sequences with length T as the input to the recurrent layers with shared weights. As a consequence, we can construct lightweight convolutional networks by simply replacing (some) linear convolutional layers with CSR-Conv layers. We prove that under mild conditions the model size decreases with the rate of O( 1 ). Empirically we demonstrate the state-of-the-art T2 performance using VGG-16, ResNet-50, ResNet-56, ResNet- 110, DenseNet-40, MobileNet, and EfficientNet as backbone networks on CIFAR-10 and ImageNet. Codes can be found on https://github.com/tuaxon/CSR Conv.
more » « less
Full Text Available
Towards Reliable Item Sampling for Recommendation Evaluation

Li, Dong; Jin, Ruoming; Liu, Zhenming; Ren, Bin; Gao, Jing; Liu, Zhi (January 2023, The Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI-23))

Since Rendle and Krichene argued that commonly used sampling-based evaluation metrics are “inconsistent” with respect to the global metrics (even in expectation), there have been a few studies on the sampling-based recommender system evaluation. Existing methods try either mapping the sampling-based metrics to their global counterparts or more generally, learning the empirical rank distribution to estimate the top-K metrics. However, despite existing efforts, there is still a lack of rigorous theoretical understanding of the proposed metric estimators, and the basic item sampling also suffers from the “blind spot” issue, i.e., estimation accuracy to recover the top-K metrics when K is small can still be rather substantial. In this paper, we provide an in-depth investigation into these problems and make two innovative contributions. First, we propose a new item-sampling estimator that explicitly optimizes the error with respect to the ground truth, and theoretically highlights its subtle difference against prior work. Second, we propose a new adaptive sampling method that aims to deal with the “blind spot” problem and also demonstrate the expectation-maximization (EM) algorithm can be generalized for such a setting. Our experimental results confirm our statistical analysis and the superiority of the proposed works. This study helps lay the theoretical foundation for adopting item sampling metrics for recommendation evaluation and provides strong evidence for making item sampling a powerful and reliable tool for recommendation evaluation.
more » « less

« Prev Next »

Search for: All records