skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Accurate virus identification with interpretable Raman signatures by machine learning
Rapid identification of newly emerging or circulating viruses is an important first step toward managing the public health response to potential outbreaks. A portable virus capture device, coupled with label-free Raman spectroscopy, holds the promise of fast detection by rapidly obtaining the Raman signature of a virus followed by a machine learning (ML) approach applied to recognize the virus based on its Raman spectrum, which is used as a fingerprint. We present such an ML approach for analyzing Raman spectra of human and avian viruses. A convolutional neural network (CNN) classifier specifically designed for spectral data achieves very high accuracy for a variety of virus type or subtype identification tasks. In particular, it achieves 99% accuracy for classifying influenza virus type A versus type B, 96% accuracy for classifying four subtypes of influenza A, 95% accuracy for differentiating enveloped and nonenveloped viruses, and 99% accuracy for differentiating avian coronavirus (infectious bronchitis virus [IBV]) from other avian viruses. Furthermore, interpretation of neural net responses in the trained CNN model using a full-gradient algorithm highlights Raman spectral ranges that are most important to virus identification. By correlating ML-selected salient Raman ranges with the signature ranges of known biomolecules and chemical functional groups—for example, amide, amino acid, and carboxylic acid—we verify that our ML model effectively recognizes the Raman signatures of proteins, lipids, and other vital functional groups present in different viruses and uses a weighted combination of these signatures to identify viruses.  more » « less
Award ID(s):
2030857 1934977 2011839
PAR ID:
10356149
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
Date Published:
Journal Name:
Proceedings of the National Academy of Sciences
Volume:
119
Issue:
23
ISSN:
0027-8424
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Emerging and reemerging viruses are responsible for a number of recent epidemic outbreaks. A crucial step in predicting and controlling outbreaks is the timely and accurate characterization of emerging virus strains. We present a portable microfluidic platform containing carbon nanotube arrays with differential filtration porosity for the rapid enrichment and optical identification of viruses. Different emerging strains (or unknown viruses) can be enriched and identified in real time through a multivirus capture component in conjunction with surface-enhanced Raman spectroscopy. More importantly, after viral capture and detection on a chip, viruses remain viable and get purified in a microdevice that permits subsequent in-depth characterizations by various conventional methods. We validated this platform using different subtypes of avian influenza A viruses and human samples with respiratory infections. This technology successfully enriched rhinovirus, influenza virus, and parainfluenza viruses, and maintained the stoichiometric viral proportions when the samples contained more than one type of virus, thus emulating coinfection. Viral capture and detection took only a few minutes with a 70-fold enrichment enhancement; detection could be achieved with as little as 10 2 EID 50 /mL (50% egg infective dose per microliter), with a virus specificity of 90%. After enrichment using the device, we demonstrated by sequencing that the abundance of viral-specific reads significantly increased from 4.1 to 31.8% for parainfluenza and from 0.08 to 0.44% for influenza virus. This enrichment method coupled to Raman virus identification constitutes an innovative system that could be used to quickly track and monitor viral outbreaks in real time. 
    more » « less
  2. Early virus identification is a key component of both patient treatment and epidemiological monitoring. In the case of influenza A virus infections, where the detection of subtypes associated with bird flu in humans could lead to a pandemic, rapid subtype-level identification is important. Surface-enhanced Raman spectroscopy coupled with machine learning can be used to rapidly detect and identify viruses in a label-free manner. As there is a range of available excitation wavelengths for performing Raman spectroscopy, we must choose the best one to permit discrimination between highly similar subtypes of a virus. We show that the spectra produced by influenza A subtypes H1N1 and H3N2 exhibit a higher degree of dissimilarity when using 785 nm excitation wavelength in comparison with 532 nm excitation wavelength. Furthermore, the cross-validated area under the curve (AUC) for identification was higher for the 785 nm excitation, reaching 0.95 as compared to 0.86 for 532 nm. Ultimately, this study suggests that exciting with a 785 nm wavelength is better able to differentiate two closely related influenza viruses and likely can extend to other closely related pathogens. 
    more » « less
  3. Abstract BackgroundBreast cancer poses a significant health risk to women worldwide, with approximately 30% being diagnosed annually in the United States. The identification of cancerous mammary tissues from non-cancerous ones during surgery is crucial for the complete removal of tumors. ResultsOur study innovatively utilized machine learning techniques (Random Forest (RF), Support Vector Machine (SVM), and Convolutional Neural Network (CNN)) alongside Raman spectroscopy to streamline and hasten the differentiation of normal and late-stage cancerous mammary tissues in mice. The classification accuracy rates achieved by these models were 94.47% for RF, 96.76% for SVM, and 97.58% for CNN, respectively. To our best knowledge, this study was the first effort in comparing the effectiveness of these three machine-learning techniques in classifying breast cancer tissues based on their Raman spectra. Moreover, we innovatively identified specific spectral peaks that contribute to the molecular characteristics of the murine cancerous and non-cancerous tissues. ConclusionsConsequently, our integrated approach of machine learning and Raman spectroscopy presents a non-invasive, swift diagnostic tool for breast cancer, offering promising applications in intraoperative settings. 
    more » « less
  4. Abstract The wild to domestic bird interface is an important nexus for emergence and transmission of highly pathogenic avian influenza (HPAI) viruses. Although the recent incursion of HPAI H5N1 Clade 2.3.4.4b into North America calls for emergency response and planning given the unprecedented scale, readily available data-driven models are lacking. Here, we provide high resolution spatial and temporal transmission risk models for the contiguous United States. Considering virus host ecology, we included weekly species-level wild waterfowl (Anatidae) abundance and endemic low pathogenic avian influenza virus prevalence metrics in combination with number of poultry farms per commodity type and relative biosecurity risks at two spatial scales: 3 km and county-level. Spillover risk varied across the annual cycle of waterfowl migration and some locations exhibited persistent risk throughout the year given higher poultry production. Validation using wild bird introduction events identified by phylogenetic analysis from 2022 to 2023 HPAI poultry outbreaks indicate strong model performance. The modular nature of our approach lends itself to building upon updated datasets under evolving conditions, testing hypothetical scenarios, or customizing results with proprietary data. This research demonstrates an adaptive approach for developing models to inform preparedness and response as novel outbreaks occur, viruses evolve, and additional data become available. 
    more » « less
  5. null (Ed.)
    From the famous 1918 H1N1 influenza to the present COVID-19 pandemic, the need for improved viral detection techniques is all too apparent. The aim of the present paper is to show that identification of individual virus particles in clinical sample materials quickly and reliably is near at hand. First of all, our team has developed techniques for identification of virions based on a modular atomic force microscopy (AFM). Furthermore, femtosecond adaptive spectroscopic techniques with enhanced resolution via coherent anti-Stokes Raman scattering (FASTER CARS) using tip-enhanced techniques markedly improves the sensitivity [M. O. Scully, et al ., Proc. Natl. Acad. Sci. U.S.A. 99, 10994–11001 (2002)]. 
    more » « less