NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

No Simple Answer to Data Complexity: An Examination of Instance-Level Complexity Metrics for Classification Tasks

Cook, Ryan A; Lalor, John P; Abbasi, Ahmed (April 2025, Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics)

Free, publicly-accessible full text available April 30, 2026
Alternating minimization algorithm for unlabeled sensing and linked linear regression

Abbasi, Ahmed; Aeron, Shuchin; Tasissa, Abiy (February 2025, Signal processing)

Unlabeled sensing is a linear inverse problem with permuted measurements. We propose an alternating minimization (AltMin) algorithm with a suitable initialization for two widely considered permutation models: partially shuffled/k-sparse permutations and r-local/block diagonal permutations. Key to the performance of the AltMin algorithm is the initialization. For the exact unlabeled sensing problem, assuming either a Gaussian measurement matrix or a sub-Gaussian signal, we bound the initialization error in terms of the number of blocks and the number of shuffles. Experimental results show that our algorithm is fast, applicable to both permutation models, and robust to choice of measurement matrix. We also test our algorithm on several real datasets for the ‘linked linear regression’ problem and show superior performance compared to baseline methods.
more » « less
Free, publicly-accessible full text available February 4, 2026
Efficient Federated Low Rank Matrix Completion

https://doi.org/10.1109/TIT.2025.3563450

Abbasi, Ahmed Ali; Vaswani, Namrata (January 2025, IEEE Transactions on Information Theory)

In this work, we develop and analyze a novel Gradient Descent (GD) based solution, called Alternating GD and Minimization (AltGDmin), for efficiently solving the low rank matrix completion (LRMC) in a federated setting. Here “efficient” refers to communication-, computation- and sample- efficiency. LRMC involves recovering an n × q rank-r matrix X⋆ from a subset of its entries when r ≪ min(n, q). Our theoretical bounds on the sample complexity and iteration complexity of AltGDmin imply that it is the most communication-efficient solution while also been one of the most computation- and sample- efficient ones. We also extend our guarantee to the noisy LRMC setting. In addition, we show how our lemmas can be used to provide an improved sample complexity guarantee for the Alternating Minimization (AltMin) algorithm for LRMC. AltMin is one of the fastest centralized solutions for LRMC; with AltGDmin having comparable time cost even for the centralized setting.
more » « less
Full Text Available
Hierarchical Deep Document Model

https://doi.org/10.1109/TKDE.2024.3487523

Yang, Yi; Lalor, John P; Abbasi, Ahmed; Zeng, Daniel Dajun (January 2025, IEEE Transactions on Knowledge and Data Engineering)

Full Text Available
Timely, Granular, and Actionable: Designing a Social Listening Platform for Public Health 3.0

Kitchens, Brent; Claggett, Jennifer; Abbasi, Ahmed (February 2024, Management information systems quarterly)

Every day patients access and generate online health content through a variety of online channels, creating an ever-expanding sea of data in the form of digital communications. At the same time, proponents of public health have recently called for timely, granular, and actionable data to address a range of public health issues, stressing the need for social listening platforms that can identify and compile this valuable data. Yet previous attempts at social listening in healthcare have yielded mixed results, largely because they have failed to incorporate sufficient context to understand the communications they seek to analyze. Guided by Activity Theory to design HealthSense, we propose a platform for efficiently sensing and gathering data across the web for real time analysis to support public health outcomes. HealthSense couples theory-guided content analysis and graph propagation with graph neural networks (GNNs) to assess the relevance and credibility of information, as well as intelligently navigate the complex online channel landscape, leading to significant improvements over existing social listening tools. We demonstrate the value of our artifact in gathering information to support two important exemplar public health tasks: 1) performing post market drug surveillance for adverse reactions and 2) addressing the opioid crisis by monitoring for potent synthetic opioids released into communities. Our results across data, user, and event experiments show that effective design artifacts can enable better outcomes across both automated and human decision-making contexts, making social listening for public health possible, practical, and valuable. Through our design process, we extend Activity Theory to address the complexities of modern online communication platforms, where information resides not only within the collection of individual communication activities, but in the complex network of interactions between them.
more » « less
Full Text Available
Enabling On-Device Large Language Model Personalization with Self-Supervised Data Selection and Synthesis

Qin, Ruiyang; Xia, Jun; Jia, Zhenge; Jiang, Meng; Abbasi, Ahmed; Zhou, Peipei; Hu, Jingtong; Shi, Yiyu (June 2024, ACM/IEEE)

After a large language model (LLM) is deployed on edge devices, it is desirable for these devices to learn from user-generated conversation data to generate user-specific and personalized responses in real-time. However, user-generated data usually contains sensitive and private information, and uploading such data to the cloud for annotation is not preferred if not prohibited. While it is possible to obtain annotation locally by directly asking users to provide preferred responses, such annotations have to be sparse to not affect user experience. In addition, the storage of edge devices is usually too limited to enable large-scale fine-tuning with full user-generated data. It remains an open question how to enable on-device LLM personalization, considering sparse annotation and limited on-device storage. In this paper, we propose a novel framework to select and store the most representative data online in a self-supervised way. Such data has a small memory footprint and allows infrequent requests of user annotations for further fine-tuning. To enhance fine-tuning quality, multiple semantically similar pairs of question texts and expected responses are generated using the LLM. Our experiments show that the proposed framework achieves the best user-specific content-generating capability (accuracy) and fine-tuning speed (performance) compared with vanilla baselines. To the best of our knowledge, this is the very first on-device LLM personalization framework.
more » « less
Full Text Available
Enabling On-Device Large Language Model Personalization with Self-Supervised Data Selection and Synthesis

https://doi.org/10.1145/3649329.3655665

Qin, Ruiyang; Xia, Jun; Jia, Zhenge; Jiang, Meng; Abbasi, Ahmed; Zhou, Peipei; Hu, Jingtong; Shi, Yiyu (June 2024, ACM)

Full Text Available
Fast Federated Low Rank Matrix Completion

https://doi.org/10.1109/Allerton58177.2023.10313472

Abbasi, Ahmed Ali; Moothedath, Shana; Vaswani, Namrata (September 2023, IEEE)

Full Text Available
Examining User Heterogeneity in Digital Experiments

https://doi.org/10.1145/3578931

Somanchi, Sriram; Abbasi, Ahmed; Kelley, Ken; Dobolyi, David; Yuan, Ted Tao (January 2023, ACM Transactions on Information Systems)

Digital experiments are routinely used to test the value of a treatment relative to a status quo control setting — for instance, a new search relevance algorithm for a website or a new results layout for a mobile app. As digital experiments have become increasingly pervasive in organizations and a wide variety of research areas, their growth has prompted a new set of challenges for experimentation platforms. One challenge is that experiments often focus on the average treatment effect (ATE) without explicitly considering differences across major sub-groups — heterogeneous treatment effect (HTE). This is especially problematic because ATEs have decreased in many organizations as the more obvious benefits have already been realized. However, questions abound regarding the pervasiveness of user HTEs and how best to detect them. We propose a framework for detecting and analyzing user HTEs in digital experiments. Our framework combines an array of user characteristics with double machine learning. Analysis of 27 real-world experiments spanning 1.76 billion sessions and simulated data demonstrates the effectiveness of our detection method relative to existing techniques. We also find that transaction, demographic, engagement, satisfaction, and lifecycle characteristics exhibit statistically significant HTEs in 10% to 20% of our real-world experiments, underscoring the importance of considering user heterogeneity when analyzing experiment results, otherwise personalized features and experiences cannot happen, thus reducing effectiveness. In terms of the number of experiments and user sessions, we are not aware of any study that has examined user HTEs at this scale. Our findings have important implications for information retrieval, user modeling, platforms, and digital experience contexts, in which online experiments are often used to evaluate the effectiveness of design artifacts.
more » « less
Full Text Available
Getting Personal: A Deep Learning Artifact for Text-Based Measurement of Personality

https://doi.org/10.1287/isre.2022.1111

Yang, Kai; Lau, Raymond Y; Abbasi, Ahmed (July 2022, Information systems research)

Full Text Available

« Prev Next »

Search for: All records