The ability to assess sleep at home, capture sleep stages, and detect the occurrence of apnea (without on-body sensors) simply by analyzing the radio waves bouncing off people's bodies while they sleep is quite powerful. Such a capability would allow for longitudinal data collection in patients' homes, informing our understanding of sleep and its interaction with various diseases and their therapeutic responses, both in clinical trials and routine care. In this article, we develop an advanced machine learning algorithm for passively monitoring sleep and nocturnal breathing from radio waves reflected off people while asleep. Validation results in comparison with the gold standard (i.e., polysomnography) (n=849) demonstrate that the model captures the sleep hypnogram (with an accuracy of 81% for 30-second epochs categorized into Wake, Light Sleep, Deep Sleep, or REM), detects sleep apnea (AUROC = 0.88), and measures the patient's Apnea-Hypopnea Index (ICC=0.95; 95% CI = [0.93, 0.97]). Notably, the model exhibits equitable performance across race, sex, and age. Moreover, the model uncovers informative interactions between sleep stages and a range of diseases including neurological, psychiatric, cardiovascular, and immunological disorders. These findings not only hold promise for clinical practice and interventional trials but also underscore the significance of sleep as a fundamental component in understanding and managing various diseases.
more »
« less
Repeatability of Commonly Used Speech and Language Features for Clinical Applications
Introduction: Changes in speech have the potential to provide important information on the diagnosis and progression of various neurological diseases. Many researchers have relied on open-source speech features to develop algorithms for measuring speech changes in clinical populations as they are convenient and easy to use. However, the repeatability of open-source features in the context of neurological diseases has not been studied. Methods: We used a longitudinal sample of healthy controls, individuals with amyotrophic lateral sclerosis, and individuals with suspected frontotemporal dementia, and we evaluated the repeatability of acoustic and language features separately on these 3 data sets. Results: Repeatability was evaluated using intraclass correlation (ICC) and the within-subjects coefficient of variation (WSCV). In 3 sets of tasks, the median ICC were between 0.02 and 0.55, and the median WSCV were between 29 and 79%. Conclusion: Our results demonstrate that the repeatability of speech features extracted using open-source tool kits is low. Researchers should exercise caution when developing digital health models with open-source speech features. We provide a detailed summary of feature-by-feature repeatability results (ICC, WSCV, SE of measurement, limits of agreement for WSCV, and minimal detectable change) in the online supplementary material so that researchers may incorporate repeatability information into the models they develop.
more »
« less
- Award ID(s):
- 1853247
- PAR ID:
- 10205928
- Date Published:
- Journal Name:
- Digital Biomarkers
- ISSN:
- 2504-110X
- Page Range / eLocation ID:
- 109 to 122
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Background Large (>1 Mb), polymorphic inversions have substantial impacts on population structure and maintenance of genotypes. These large inversions can be detected from single nucleotide polymorphism (SNP) data using unsupervised learning techniques like PCA. Construction and analysis of a feature matrix from millions of SNPs requires large amount of memory and limits the sizes of data sets that can be analyzed. Methods We propose using feature hashing construct a feature matrix from a VCF file of SNPs for reducing memory usage. The matrix is constructed in a streaming fashion such that the entire VCF file is never loaded into memory at one time. Results When evaluated on Anopheles mosquito and Drosophila fly data sets, our approach reduced memory usage by 97% with minimal reductions in accuracy for inversion detection and localization tasks. Conclusion With these changes, inversions in larger data sets can be analyzed easily and efficiently on common laptop and desktop computers. Our method is publicly available through our open-source inversion analysis software, Asaph.more » « less
-
null (Ed.)A critical issue of current speech-based sequence-to-one learning tasks, such as speech emotion recognition (SER), is the dynamic temporal modeling for speech sentences with different durations. The goal is to extract an informative representation vector of the sentence from acoustic feature sequences with varied length. Traditional methods rely on static descriptions such as statistical functions or a universal background model (UBM), which are not capable of characterizing dynamic temporal changes. Recent advances in deep learning architectures provide promising results, directly extracting sentence-level representations from frame-level features. However, conventional cropping and padding techniques that deal with varied length sequences are not optimal, since they truncate or artificially add sentence-level information. Therefore, we propose a novel dynamic chunking approach, which maps the original sequences of different lengths into a fixed number of chunks that have the same duration by adjusting their overlap. This simple chunking procedure creates a flexible framework that can incorporate different feature extractions and sentence-level temporal aggregation approaches to cope, in a principled way, with different sequence-to-one tasks. Our experimental results based on three databases demonstrate that the proposed framework provides: 1) improvement in recognition accuracy, 2) robustness toward different temporal length predictions, and 3) high model computational efficiency advantages.more » « less
-
Abstract The gut microbiome plays a fundamental role in human health and disease. Individual variations in the microbiome and the corresponding functional implications are key considerations to enhance precision health and medicine. Metaproteomics has recently revealed protein expression that might be associated with human health and disease. Existing studies focused on either human proteins or bacterial proteins that can be identified from (meta)proteomics data sets, but not both. In this study, we examined the feasibility of identifying both human and bacterial proteins that are differentially expressed between healthy and diseased individuals from metaproteomics data sets. We further evaluated different strategies of using identified peptides and proteins for building predictive models. By leveraging existing metaproteomics data sets and a tool that we have developed for metaproteomics data analysis (MetaProD), we were able to derive both human and bacterial differentially expressed proteins that could serve as potential biomarkers for all diseases we studied. We also built predictive models using identified peptides and proteins as features for prediction of human diseases. Our results showed peptide-based identifications over protein-based ones often produce the most accurate models and that feature selection can offer improvements. Prediction accuracy could be further improved, in some cases, by including bacterial identifications, but missing data in bacterial identifications remains problematic.more » « less
-
High-quality source code comments are valuable for software development and maintenance, however, code often contains low-quality comments or lacks them altogether. We name such source code comments as suboptimal comments. Such suboptimal comments create challenges in code comprehension and maintenance. Despite substantial research on low-quality source code comments, empirical knowledge about commenting practices that produce suboptimal comments and reasons that lead to suboptimal comments are lacking. We help bridge this knowledge gap by investigating (1) independent comment changes (ICCs) —comment changes committed independently of code changes—which likely address suboptimal comments, (2) commenting guidelines, and (3) comment-checking tools and comment-generating tools, which are often employed to help commenting practice—especially to prevent suboptimal comments. We collect 24M+ comment changes from 4,392 open-source GitHub Java repositories and find that ICCs widely exist. The ICC ratio —proportion of ICCs among all comment changes—is ~15.5%, with 98.7% of the repositories having ICC. Our thematic analysis of 3,533 randomly sampled ICCs provides a three-dimensional taxonomy for what is changed (four comment categories and 13 subcategories), how it changed (six commenting activity categories), and what factors are associated with the change (three factors). We investigate 600 repositories to understand the prevalence, content, impact, and violations of commenting guidelines. We find that only 15.5% of the 600 sampled repositories have any commenting guidelines. We provide the first taxonomy for elements in commenting guidelines: where and what to comment are particularly important. The repositories without such guidelines have a statistically significantly higher ICC ratio, indicating the negative impact of the lack of commenting guidelines. However, commenting guidelines are not strictly followed: 85.5% of checked repositories have violations. We also systematically study how developers use two kinds of tools, comment-checking tools and comment-generating tools, in the 4,392 repositories. We find that the use of Javadoc tool is negatively correlated with the ICC ratio, while the use of Checkstyle has no statistically significant correlation; the use of comment-generating tools leads to a higher ICC ratio. To conclude, we reveal issues and challenges in current commenting practice, which help understand how suboptimal comments are introduced. We propose potential research directions on comment location prediction, comment generation, and comment quality assessment; suggest how developers can formulate commenting guidelines and enforce rules with tools; and recommend how to enhance current comment-checking and comment-generating tools.more » « less
An official website of the United States government

