Rapid identification of newly emerging or circulating viruses is an important first step toward managing the public health response to potential outbreaks. A portable virus capture device, coupled with label-free Raman spectroscopy, holds the promise of fast detection by rapidly obtaining the Raman signature of a virus followed by a machine learning (ML) approach applied to recognize the virus based on its Raman spectrum, which is used as a fingerprint. We present such an ML approach for analyzing Raman spectra of human and avian viruses. A convolutional neural network (CNN) classifier specifically designed for spectral data achieves very high accuracy for a variety of virus type or subtype identification tasks. In particular, it achieves 99% accuracy for classifying influenza virus type A versus type B, 96% accuracy for classifying four subtypes of influenza A, 95% accuracy for differentiating enveloped and nonenveloped viruses, and 99% accuracy for differentiating avian coronavirus (infectious bronchitis virus [IBV]) from other avian viruses. Furthermore, interpretation of neural net responses in the trained CNN model using a full-gradient algorithm highlights Raman spectral ranges that are most important to virus identification. By correlating ML-selected salient Raman ranges with the signature ranges of known biomolecules and chemical functional groups—for example, amide, amino acid, and carboxylic acid—we verify that our ML model effectively recognizes the Raman signatures of proteins, lipids, and other vital functional groups present in different viruses and uses a weighted combination of these signatures to identify viruses.
more »
« less
Automating the amino acid identification in elliptical dichroism spectrometer with Machine Learning
Amino acid identification is crucial across various scientific disciplines, including biochemistry, pharmaceutical research, and medical diagnostics. However, traditional methods such as mass spectrometry require extensive sample preparation and are time-consuming, complex and costly. Therefore, this study presents a pioneering Machine Learning (ML) approach for automatic amino acid identification by utilizing the unique absorption profiles from an Elliptical Dichroism (ED) spectrometer. Advanced data preprocessing techniques and ML algorithms to learn patterns from the absorption profiles that distinguish different amino acids were investigated to prove the feasibility of this approach. The results show that ML can potentially revolutionize the amino acid analysis and detection paradigm.
more »
« less
- PAR ID:
- 10598388
- Editor(s):
- Mitra, Saheli
- Publisher / Repository:
- PLOS
- Date Published:
- Journal Name:
- PLOS ONE
- Volume:
- 20
- Issue:
- 1
- ISSN:
- 1932-6203
- Page Range / eLocation ID:
- e0317130
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
The enantiomers of chiral amino acids play versatile roles in biological systems including humans. They are also very useful in the asymmetric synthesis of diverse chiral organic compounds. Therefore, identifying a specific amino acid and distinguishing it from its enantiomer are of great importance. Although significant progress has been made in the development of fluorescent probes for amino acids, most of them are not capable of conducting simultaneous chemoselective and enantioselective detection of a specific amino acid enantiomer. In this article, several fluorescent probes have been designed and synthesized for chemoselective as well as enantioselective recognition of certain amino acid enantiomers. ( S )-1 shows greatly enhanced fluorescence in the presence of l -glutamic acid and l -aspartic acid, but produces no or little fluorescence response toward their opposite enantiomers and other amino acids. ( R )-4 in combination with Zn 2+ shows greatly enhanced fluorescence in the presence of l -serine. ( S )-6 is designed for the selective recognition of histidine. Micelles made of an amphiphilic diblock copolymer are used to encapsulate the water-insoluble compound ( S )-8 which shows chemoselective as well as enantioselective fluorescence enhancement with l -lysine in the presence of Zn 2+ in aqueous solution. The same micelles are also used to encapsulate several ( S )-1,1′-binaphthyl-based monoaldehydes ( S )-10 for the chemoselective and enantioselective fluorescence recognition of l -tryptophan in the presence of Zn 2+ in aqueous solution. These findings have demonstrated that highly selective fluorescence identification of a specific amino acid enantiomer can be achieved by incorporating certain functional groups at the designated locations of the 1,1′-binaphthyls. The binaphthyl core structure of these probes provides both a chirality source and highly tunable fluorescence properties. Matching the structure and chirality of these probes with those of the specific amino acid enantiomers can generate structurally rigid reaction products and give rise to greatly enhanced fluorescence. The strategies of this work can be further expanded to develop fluorescent probes for the specific identification of many amino acids of interest. This should facilitate the analysis of chiral amino acids in various applications. The outlook of this research and its comparison with other methods are also discussed.more » « less
-
Abstract The capture, utilization, and storage of CO2are the primary options to minimize the adverse effects of global warming and related climate change resulting from increased anthropogenic CO2emissions. In recent years, amino acids and amino acid‐based ionic liquids (AAILs) are proposed as promising alternatives to the traditional aqueous amine solvent‐based CO2capture technology due to the presence of the ─NH2group and a CO2adsorption mechanism like amines, but with many additional advantages. Besides CO2absorption in solvent form, amino acids/AAILs‐functionalized porous sorbents demonstrate potential in CO2adsorption technology, a promising alternative to solvent‐based CO2absorption technology, as they can avoid the huge energy penalty associated with aqueous solution regeneration by heating. Additionally, amino acids/AAILs, with their CO2capture abilities, have demonstrated their potential in other promising CO2sequestration technologies: direct air capture, CO2mineralization using alkaline industrial waste, and conversion of CO2into value‐added products. This article reviews the mechanism, comparative performance, and prospects of amino acid‐based state‐of‐the‐art technologies for CO2absorption and adsorption, direct air capture, bio‐mineralization, and conversion of CO2into value‐added products, which is helpful for the further development of amino acid‐based CO2sequestration technologies.more » « less
-
Porcine reproductive and respiratory syndrome is an infectious disease of pigs caused by PRRS virus (PRRSV). A modified live-attenuated vaccine has been widely used to control the spread of PRRSV and the classification of field strains is a key for a successful control and prevention. Restriction fragment length polymorphism targeting the Open reading frame 5 (ORF5) genes is widely used to classify PRRSV strains but showed unstable accuracy. Phylogenetic analysis is a powerful tool for PRRSV classification with consistent accuracy but it demands large computational power as the number of sequences gets increased. Our study aimed to apply four machine learning (ML) algorithms, random forest, k-nearest neighbor, support vector machine and multilayer perceptron, to classify field PRRSV strains into four clades using amino acid scores based on ORF5 gene sequence. Our study used amino acid sequences of ORF5 gene in 1931 field PRRSV strains collected in the US from 2012 to 2020. Phylogenetic analysis was used to labels field PRRSV strains into one of four clades: Lineage 5 or three clades in Linage 1. We measured accuracy and time consumption of classification using four ML approaches by different size of gene sequences. We found that all four ML algorithms classify a large number of field strains in a very short time (<2.5 s) with very high accuracy (>0.99 Area under curve of the Receiver of operating characteristics curve). Furthermore, the random forest approach detects a total of 4 key amino acid positions for the classification of field PRRSV strains into four clades. Our finding will provide an insightful idea to develop a rapid and accurate classification model using genetic information, which also enables us to handle large genome datasets in real time or semi-real time for data-driven decision-making and more timely surveillance.more » « less
-
Machine learning (ML) is revolutionizing protein structural analysis, including an important subproblem of predicting protein residue contact maps, i.e., which ami-no-acid residues are in close spatial proximity given the amino-acid sequence of a protein. Despite recent progresses in ML-based protein contact prediction, predict-ing contacts with a wide range of distances (commonly classified into short-, me-dium- and long-range contacts) remains a challenge. Here, we propose a multiscale graph neural network (GNN) based approach taking a cue from multiscale physics simulations, in which a standard pipeline involving a recurrent neural network (RNN) is augmented with three GNNs to refine predictive capability for short-, medium- and long-range residue contacts, respectively. Test results on the Pro-teinNet dataset show improved accuracy for contacts of all ranges using the pro-posed multiscale RNN+GNN approach over the conventional approach, including the most challenging case of long-range contact prediction.more » « less
An official website of the United States government

