NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Sharing personal ECG time-series data privately

https://doi.org/10.1093/jamia/ocac047

Bonomi, Luca; Wu, Zeyun; Fan, Liyue (April 2022, Journal of the American Medical Informatics Association)

Abstract ObjectiveEmerging technologies (eg, wearable devices) have made it possible to collect data directly from individuals (eg, time-series), providing new insights on the health and well-being of individual patients. Broadening the access to these data would facilitate the integration with existing data sources (eg, clinical and genomic data) and advance medical research. Compared to traditional health data, these data are collected directly from individuals, are highly unique and provide fine-grained information, posing new privacy challenges. In this work, we study the applicability of a novel privacy model to enable individual-level time-series data sharing while maintaining the usability for data analytics. Methods and materialsWe propose a privacy-protecting method for sharing individual-level electrocardiography (ECG) time-series data, which leverages dimensional reduction technique and random sampling to achieve provable privacy protection. We show that our solution provides strong privacy protection against an informed adversarial model while enabling useful aggregate-level analysis. ResultsWe conduct our evaluations on 2 real-world ECG datasets. Our empirical results show that the privacy risk is significantly reduced after sanitization while the data usability is retained for a variety of clinical tasks (eg, predictive modeling and clustering). DiscussionOur study investigates the privacy risk in sharing individual-level ECG time-series data. We demonstrate that individual-level data can be highly unique, requiring new privacy solutions to protect data contributors. ConclusionThe results suggest our proposed privacy-protection method provides strong privacy protections while preserving the usefulness of the data.
more » « less
Passive detection of COVID-19 with wearable sensors and explainable machine learning algorithms

https://doi.org/10.1038/s41746-021-00533-1

Gadaleta, Matteo; Radin, Jennifer M.; Baca-Motes, Katie; Ramos, Edward; Kheterpal, Vik; Topol, Eric J.; Steinhubl, Steven R.; Quer, Giorgio (December 2021, npj Digital Medicine)

Abstract Individual smartwatch or fitness band sensor data in the setting of COVID-19 has shown promise to identify symptomatic and pre-symptomatic infection or the need for hospitalization, correlations between peripheral temperature and self-reported fever, and an association between changes in heart-rate-variability and infection. In our study, a total of 38,911 individuals (61% female, 15% over 65) have been enrolled between March 25, 2020 and April 3, 2021, with 1118 reported testing positive and 7032 negative for COVID-19 by nasopharyngeal PCR swab test. We propose an explainable gradient boosting prediction model based on decision trees for the detection of COVID-19 infection that can adapt to the absence of self-reported symptoms and to the available sensor data, and that can explain the importance of each feature and the post-test-behavior for the individuals. We tested it in a cohort of symptomatic individuals who exhibited an AUC of 0.83 [0.81–0.85], or AUC = 0.78 [0.75–0.80] when considering only data before the test date, outperforming state-of-the-art algorithm in these conditions. The analysis of all individuals (including asymptomatic and pre-symptomatic) when self-reported symptoms were excluded provided an AUC of 0.78 [0.76–0.79], or AUC of 0.70 [0.69–0.72] when considering only data before the test date. Extending the use of predictive algorithms for detection of COVID-19 infection based only on passively monitored data from any device, we showed that it is possible to scale up this platform and apply the algorithm in other settings where self-reported symptoms can not be collected.
more » « less
PrimeNet: Pre-training for Irregular Multivariate Time Series

Chowdhury, Ranak Roy; Li, Jiacheng; Zhang, Xiyuan; Hong, Dezhi; Gupta, Rajesh; Shang, Jingbo (January 2023, Proceedings of the AAAI Conference on Artificial Intelligence)

Real-world applications often involve irregular time series, for which the time intervals between successive observations are non-uniform. Irregularity across multiple features in a multi-variate time series further results in a different subset of features at any given time (i.e., asynchronicity). Existing pre-training schemes for time-series, however, often assume regularity of time series and make no special treatment of irregularity. We argue that such irregularity offers insight about domain property of the data—for example, frequency of hospital visits may signal patient health condition—that can guide representation learning. In this work, we propose PrimeNet to learn a self-supervised representation for irregular multivariate time-series. Specifically, we design a timesensitive contrastive learning and data reconstruction task to pre-train a model. Irregular time-series exhibits considerable variations in sampling density over time. Hence, our triplet generation strategy follows the density of the original data points, preserving its native irregularity. Moreover, the sampling density variation over time makes data reconstruction difficult for different regions. Therefore, we design a data masking technique that always masks a constant time duration to accommodate reconstruction for regions of different sampling density. We learn with these tasks using unlabeled data to build a pre-trained model and fine-tune on a downstream task with limited labeled data, in contrast with existing fully supervised approach for irregular time-series, requiring large amounts of labeled data. Experiment results show that PrimeNet significantly outperforms state-of-the-art methods on naturally irregular and asynchronous data from Healthcare and IoT applications for several downstream tasks, including classification, interpolation, and regression.
more » « less
Full Text Available
SQEE: A Machine Perception Approach to Sensing Quality Evaluation at the Edge by Uncertainty Quantification

https://doi.org/10.1145/3560905.3568534

Li, Shuheng; Shang, Jingbo; Gupta, Rajesh K.; Hong, Dezhi (November 2022, Proceedings of the 20th ACM Conference on Embedded Networked Sensor Systems)

Cyber-physical systems are starting to adopt neural network (NN) models for a variety of smart sensing applications. While several efforts seek better NN architectures for system performance improvement, few attempts have been made to study the deployment of these systems in the field. Proper deployment of these systems is critical to achieving ideal performance, but the current practice is largely empirical via trials and errors, lacking a measure of quality. Sensing quality should reflect the impact on the performance of NN models that drive machine perception tasks. However, traditional approaches either evaluate statistical difference that exists objectively, or model the quality subjectively via human perception. In this work, we propose an efficient sensing quality measure requiring limited data samples using smart voice sensing system as an example. We adopt recent techniques in uncertainty evaluation for NN to estimate audio sensing quality. Intuitively, a deployment at better sensing location should lead to less uncertainty in NN predictions. We design SQEE, Sensing Quality Evaluation at the Edge for NN models, which constructs a model ensemble through Monte-Carlo dropout and estimates posterior total uncertainty via average conditional entropy. We collected data from three indoor environments, with a total of 148 transmitting-receiving (t-r) locations experimented and more than 7,000 examples tested. SQEE achieves the best performance in terms of the top-1 ranking accuracy---whether the measure finds the best spot for deployment, in comparison with other uncertainty strategies. We implemented SQEE on a ReSpeaker to study SQEE's real-world efficacy. Experimental result shows that SQEE can effectively evaluate the data collected from each t-r location pair within 30 seconds and achieve an average top-3 ranking accuracy of over 94%. We further discuss generalization of our framework to other sensing schemes.
more » « less
Full Text Available
TARNet: Task-Aware Reconstruction for Time-Series Transformer

https://doi.org/10.1145/3534678.3539329

Chowdhury, Ranak Roy; Zhang, Xiyuan; Shang, Jingbo; Gupta, Rajesh K.; Hong, Dezhi (August 2022, Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining)

Full Text Available
Unsupervised Deep Keyphrase Generation

https://doi.org/10.1609/aaai.v36i10.21381

Shen, Xianjie; Wang, Yinghan; Meng, Rui; Shang, Jingbo (June 2022, Proceedings of the AAAI Conference on Artificial Intelligence)

Keyphrase generation aims to summarize long documents with a collection of salient phrases. Deep neural models have demonstrated remarkable success in this task, with the capability of predicting keyphrases that are even absent from a document. However, such abstractiveness is acquired at the expense of a substantial amount of annotated data. In this paper, we present a novel method for keyphrase generation, AutoKeyGen, without the supervision of any annotated doc-keyphrase pairs. Motivated by the observation that an absent keyphrase in a document may appear in other places, in whole or in part, we construct a phrase bank by pooling all phrases extracted from a corpus. With this phrase bank, we assign phrase candidates to new documents by a simple partial matching algorithm, and then we rank these candidates by their relevance to the document from both lexical and semantic perspectives. Moreover, we bootstrap a deep generative model using these top-ranked pseudo keyphrases to produce more absent candidates. Extensive experiments demonstrate that AutoKeyGen outperforms all unsupervised baselines and can even beat a strong supervised method in certain cases.
more » « less
Full Text Available
ESC-GAN: Extending Spatial Coverage of Physical Sensors

https://doi.org/10.1145/3488560.3498461

Zhang, Xiyuan; Chowdhury, Ranak Roy; Shang, Jingbo; Gupta, Rajesh; Hong, Dezhi (January 2022, Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining)

Full Text Available
Towards Collaborative Neural-Symbolic Graph Semantic Parsing via Uncertainty

https://doi.org/10.18653/v1/2022.findings-acl.328

Lin, Zi; Liu, Jeremiah Zhe; Shang, Jingbo (January 2022, Findings of the Association for Computational Linguistics: ACL 2022)

Full Text Available
Leveraging QA Datasets to Improve Generative Data Augmentation

Mekala, Dheeraj; Vu, Tu; Schick, Timo; Shang, Jingbo (January 2022, Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing)

The ability of generative language models (GLMs) to generate text has improved considerably in the last few years, enabling their use for generative data augmentation. In this work, we propose CONDA, an approach to further improve GLM’s ability to generate synthetic data by reformulating data generation as context generation for a given question-answer (QA) pair and leveraging QA datasets for training context generators. Then, we cast downstream tasks into the same question answering format and adapt the fine-tuned context generators to the target task domain. Finally, we use the fine-tuned GLM to generate relevant contexts, which are in turn used as synthetic training data for their corresponding tasks. We perform extensive experiments on multiple classification datasets and demonstrate substantial improvements in performance for both few- and zero-shot settings. Our analysis reveals that QA datasets that require high-level reasoning abilities (e.g., abstractive and common-sense QA datasets) tend to give the best boost in performance in both few-shot and zero-shot settings.
more » « less
Full Text Available
Towards Few-shot Entity Recognition in Document Images: A Label-aware Sequence-to-Sequence Framework

https://doi.org/10.18653/v1/2022.findings-acl.329

Wang, Zilong; Shang, Jingbo (January 2022, Findings of the Association for Computational Linguistics: ACL 2022)

Full Text Available

« Prev Next »

Search for: All records