Multiple Instance Learning (MIL) provides a promising solution to many real-world problems, where labels are only available at the bag level but missing for instances due to a high labeling cost. As a powerful Bayesian non-parametric model, Gaussian Processes (GP) have been extended from classical supervised learning to MIL settings, aiming to identify the most likely positive (or least negative) instance from a positive (or negative) bag using only the bag-level labels. However, solely focusing on a single instance in a bag makes the model less robust to outliers or multi-modal scenarios, where a single bag contains a diverse set of positive instances. We propose a general GP mixture framework that simultaneously considers multiple instances through a latent mixture model. By adding a top-k constraint, the framework is equivalent to choosing the top-k most positive instances, making it more robust to outliers and multimodal scenarios. We further introduce a Distributionally Robust Optimization (DRO) constraint that removes the limitation of specifying a fix k value. To ensure the prediction power over high-dimensional data (eg, videos and images) that are common in MIL, we augment the GP kernel with fixed basis functions by using a deep neural network to learn adaptive basis functions so that the covariance structure of high-dimensional data can be accurately captured. Experiments are conducted on highly challenging real-world video anomaly detection tasks to demonstrate the effectiveness of the proposed model.
more »
« less
A Machine Learning Framework for Detecting COVID-19 Infection Using Surface-Enhanced Raman Scattering
In this study, we explored machine learning approaches for predictive diagnosis using surface-enhanced Raman scattering (SERS), applied to the detection of COVID-19 infection in biological samples. To do this, we utilized SERS data collected from 20 patients at the University of Maryland Baltimore School of Medicine. As a preprocessing step, the positive-negative labels are obtained using Polymerase Chain Reaction (PCR) testing. First, we compared the performance of linear and nonlinear dimensionality techniques for projecting the high-dimensional Raman spectra to a low-dimensional space where a smaller number of variables defines each sample. The appropriate number of reduced features used was obtained by comparing the mean accuracy from a 10-fold cross-validation. Finally, we employed Gaussian process (GP) classification, a probabilistic machine learning approach, to correctly predict the occurrence of a negative or positive sample as a function of the low-dimensional space variables. As opposed to providing rigid class labels, the GP classifier provides a probability (ranging from zero to one) that a given sample is positive or negative. In practice, the proposed framework can be used to provide high-throughput rapid testing, and a follow-up PCR can be used for confirmation in cases where the model’s uncertainty is unacceptably high.
more »
« less
- Award ID(s):
- 2045640
- PAR ID:
- 10411563
- Date Published:
- Journal Name:
- Biosensors
- Volume:
- 12
- Issue:
- 8
- ISSN:
- 2079-6374
- Page Range / eLocation ID:
- 589
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Multiple Instance Learning (MIL) provides a promising solution to many real-world problems, where labels are only available at the bag level but missing for instances due to a high labeling cost. As a powerful Bayesian non-parametric model, Gaussian Processes (GP) have been extended from classical supervised learning to MIL settings, aiming to identify the most likely positive (or least negative) instance from a positive (or negative) bag using only the bag-level labels. However, solely focusing on a single instance in a bag makes the model less robust to outliers or multi-modal scenarios, where a single bag contains a diverse set of positive instances. We propose a general GP mixture framework that simultaneously considers multiple instances through a latent mixture model. By adding a top-k constraint, the framework is equivalent to choosing the top-k most positive instances, making it more robust to outliers and multimodal scenarios. We further introduce a Distributionally Robust Optimization (DRO) constraint that removes the limitation of specifying a fixed k value. To ensure the prediction power over high-dimensional data (e.g., videos and images) that are common in MIL, we augment the GP kernel with fixed basis functions by using a deep neural network to learn adaptive basis functions so that the covariance structure of high-dimensional data can be accurately captured. Experiments are conducted on highly challenging real-world video anomaly detection tasks to demonstrate the effectiveness of the proposed model.more » « less
-
null (Ed.)The COVID-19 pandemic demonstrated the critical need for accurate and rapid testing for virus detection. This need has generated a high number of new testing methods aimed at replacing RT-PCR, which is the golden standard for testing. Most of the testing techniques are based on biochemistry methods and require chemicals that are often expensive and the supply might become scarce in a large crisis. In the present paper we suggest the use of methods based on physics that leverage novel nanomaterials. We demonstrate that using Surface Enhanced Raman Spectroscopy (SERS) of virion particles a very distinct spectroscopic signature of the SARS-CoV-2 virus can be obtained. We demonstrate that the spectra are mainly composed by signals from the spike (S) and nucleocapsid (N) proteins. It is believed that a clinical test using SERS can be developed. The test will be fast, inexpensive, and reliable. It is also clear that SERS can be used for analysis of structural changes on the S and N proteins. This will be an example of application of nanotechnology and properties of nanoparticles for health and social related matters.more » « less
-
null (Ed.)Abstract Background Accurate diagnostic strategies to identify SARS-CoV-2 positive individuals rapidly for management of patient care and protection of health care personnel are urgently needed. The predominant diagnostic test is viral RNA detection by RT-PCR from nasopharyngeal swabs specimens, however the results are not promptly obtainable in all patient care locations. Routine laboratory testing, in contrast, is readily available with a turn-around time (TAT) usually within 1-2 hours. Method We developed a machine learning model incorporating patient demographic features (age, sex, race) with 27 routine laboratory tests to predict an individual’s SARS-CoV-2 infection status. Laboratory testing results obtained within 2 days before the release of SARS-CoV-2 RT-PCR result were used to train a gradient boosting decision tree (GBDT) model from 3,356 SARS-CoV-2 RT-PCR tested patients (1,402 positive and 1,954 negative) evaluated at a metropolitan hospital. Results The model achieved an area under the receiver operating characteristic curve (AUC) of 0.854 (95% CI: 0.829-0.878). Application of this model to an independent patient dataset from a separate hospital resulted in a comparable AUC (0.838), validating the generalization of its use. Moreover, our model predicted initial SARS-CoV-2 RT-PCR positivity in 66% individuals whose RT-PCR result changed from negative to positive within 2 days. Conclusion This model employing routine laboratory test results offers opportunities for early and rapid identification of high-risk SARS-CoV-2 infected patients before their RT-PCR results are available. It may play an important role in assisting the identification of SARS-CoV-2 infected patients in areas where RT-PCR testing is not accessible due to financial or supply constraints.more » « less
-
In a pandemic era, rapid infectious disease diagnosis is essential. Surface-enhanced Raman spectroscopy (SERS) promises sensitive and specific diagnosis including rapid point-of-care detection and drug susceptibility testing. SERS utilizes inelastic light scattering arising from the interaction of incident photons with molecular vibrations, enhanced by orders of magnitude with resonant metallic or dielectric nanostructures. While SERS provides a spectral fingerprint of the sample, clinical translation is lagged due to challenges in consistency of spectral enhancement, complexity in spectral interpretation, insufficient specificity and sensitivity, and inefficient workflow from patient sample collection to spectral acquisition. Here, we highlight the recent, complementary advances that address these shortcomings, including (1) design of label-free SERS substrates and data processing algorithms that improve spectral signal and interpretability, essential for broad pathogen screening assays; (2) development of new capture and affinity agents, such as aptamers and polymers, critical for determining the presence or absence of particular pathogens; and (3) microfluidic and bioprinting platforms for efficient clinical sample processing. We also describe the development of low-cost, point-of-care, optical SERS hardware. Our paper focuses on SERS for viral and bacterial detection, in hopes of accelerating infectious disease diagnosis, monitoring, and vaccine development. With advances in SERS substrates, machine learning, and microfluidics and bioprinting, the specificity, sensitivity, and speed of SERS can be readily translated from laboratory bench to patient bedside, accelerating point-of-care diagnosis, personalized medicine, and precision health.more » « less
An official website of the United States government

