skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on January 17, 2026

Title: Automating the amino acid identification in elliptical dichroism spectrometer with Machine Learning
Amino acid identification is crucial across various scientific disciplines, including biochemistry, pharmaceutical research, and medical diagnostics. However, traditional methods such as mass spectrometry require extensive sample preparation and are time-consuming, complex and costly. Therefore, this study presents a pioneering Machine Learning (ML) approach for automatic amino acid identification by utilizing the unique absorption profiles from an Elliptical Dichroism (ED) spectrometer. Advanced data preprocessing techniques and ML algorithms to learn patterns from the absorption profiles that distinguish different amino acids were investigated to prove the feasibility of this approach. The results show that ML can potentially revolutionize the amino acid analysis and detection paradigm.  more » « less
Award ID(s):
2401151 2300064
PAR ID:
10598388
Author(s) / Creator(s):
; ; ; ; ;
Editor(s):
Mitra, Saheli
Publisher / Repository:
PLOS
Date Published:
Journal Name:
PLOS ONE
Volume:
20
Issue:
1
ISSN:
1932-6203
Page Range / eLocation ID:
e0317130
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Rapid identification of newly emerging or circulating viruses is an important first step toward managing the public health response to potential outbreaks. A portable virus capture device, coupled with label-free Raman spectroscopy, holds the promise of fast detection by rapidly obtaining the Raman signature of a virus followed by a machine learning (ML) approach applied to recognize the virus based on its Raman spectrum, which is used as a fingerprint. We present such an ML approach for analyzing Raman spectra of human and avian viruses. A convolutional neural network (CNN) classifier specifically designed for spectral data achieves very high accuracy for a variety of virus type or subtype identification tasks. In particular, it achieves 99% accuracy for classifying influenza virus type A versus type B, 96% accuracy for classifying four subtypes of influenza A, 95% accuracy for differentiating enveloped and nonenveloped viruses, and 99% accuracy for differentiating avian coronavirus (infectious bronchitis virus [IBV]) from other avian viruses. Furthermore, interpretation of neural net responses in the trained CNN model using a full-gradient algorithm highlights Raman spectral ranges that are most important to virus identification. By correlating ML-selected salient Raman ranges with the signature ranges of known biomolecules and chemical functional groups—for example, amide, amino acid, and carboxylic acid—we verify that our ML model effectively recognizes the Raman signatures of proteins, lipids, and other vital functional groups present in different viruses and uses a weighted combination of these signatures to identify viruses. 
    more » « less
  2. The enantiomers of chiral amino acids play versatile roles in biological systems including humans. They are also very useful in the asymmetric synthesis of diverse chiral organic compounds. Therefore, identifying a specific amino acid and distinguishing it from its enantiomer are of great importance. Although significant progress has been made in the development of fluorescent probes for amino acids, most of them are not capable of conducting simultaneous chemoselective and enantioselective detection of a specific amino acid enantiomer. In this article, several fluorescent probes have been designed and synthesized for chemoselective as well as enantioselective recognition of certain amino acid enantiomers. ( S )-1 shows greatly enhanced fluorescence in the presence of l -glutamic acid and l -aspartic acid, but produces no or little fluorescence response toward their opposite enantiomers and other amino acids. ( R )-4 in combination with Zn 2+ shows greatly enhanced fluorescence in the presence of l -serine. ( S )-6 is designed for the selective recognition of histidine. Micelles made of an amphiphilic diblock copolymer are used to encapsulate the water-insoluble compound ( S )-8 which shows chemoselective as well as enantioselective fluorescence enhancement with l -lysine in the presence of Zn 2+ in aqueous solution. The same micelles are also used to encapsulate several ( S )-1,1′-binaphthyl-based monoaldehydes ( S )-10 for the chemoselective and enantioselective fluorescence recognition of l -tryptophan in the presence of Zn 2+ in aqueous solution. These findings have demonstrated that highly selective fluorescence identification of a specific amino acid enantiomer can be achieved by incorporating certain functional groups at the designated locations of the 1,1′-binaphthyls. The binaphthyl core structure of these probes provides both a chirality source and highly tunable fluorescence properties. Matching the structure and chirality of these probes with those of the specific amino acid enantiomers can generate structurally rigid reaction products and give rise to greatly enhanced fluorescence. The strategies of this work can be further expanded to develop fluorescent probes for the specific identification of many amino acids of interest. This should facilitate the analysis of chiral amino acids in various applications. The outlook of this research and its comparison with other methods are also discussed. 
    more » « less
  3. Machine learning (ML) is revolutionizing protein structural analysis, including an important subproblem of predicting protein residue contact maps, i.e., which ami-no-acid residues are in close spatial proximity given the amino-acid sequence of a protein. Despite recent progresses in ML-based protein contact prediction, predict-ing contacts with a wide range of distances (commonly classified into short-, me-dium- and long-range contacts) remains a challenge. Here, we propose a multiscale graph neural network (GNN) based approach taking a cue from multiscale physics simulations, in which a standard pipeline involving a recurrent neural network (RNN) is augmented with three GNNs to refine predictive capability for short-, medium- and long-range residue contacts, respectively. Test results on the Pro-teinNet dataset show improved accuracy for contacts of all ranges using the pro-posed multiscale RNN+GNN approach over the conventional approach, including the most challenging case of long-range contact prediction. 
    more » « less
  4. Porcine reproductive and respiratory syndrome is an infectious disease of pigs caused by PRRS virus (PRRSV). A modified live-attenuated vaccine has been widely used to control the spread of PRRSV and the classification of field strains is a key for a successful control and prevention. Restriction fragment length polymorphism targeting the Open reading frame 5 (ORF5) genes is widely used to classify PRRSV strains but showed unstable accuracy. Phylogenetic analysis is a powerful tool for PRRSV classification with consistent accuracy but it demands large computational power as the number of sequences gets increased. Our study aimed to apply four machine learning (ML) algorithms, random forest, k-nearest neighbor, support vector machine and multilayer perceptron, to classify field PRRSV strains into four clades using amino acid scores based on ORF5 gene sequence. Our study used amino acid sequences of ORF5 gene in 1931 field PRRSV strains collected in the US from 2012 to 2020. Phylogenetic analysis was used to labels field PRRSV strains into one of four clades: Lineage 5 or three clades in Linage 1. We measured accuracy and time consumption of classification using four ML approaches by different size of gene sequences. We found that all four ML algorithms classify a large number of field strains in a very short time (<2.5 s) with very high accuracy (>0.99 Area under curve of the Receiver of operating characteristics curve). Furthermore, the random forest approach detects a total of 4 key amino acid positions for the classification of field PRRSV strains into four clades. Our finding will provide an insightful idea to develop a rapid and accurate classification model using genetic information, which also enables us to handle large genome datasets in real time or semi-real time for data-driven decision-making and more timely surveillance. 
    more » « less
  5. null (Ed.)
    The relation between amino acid (AA) sequence and biologically active conformation controls the process of polypeptide chains folding into three-dimensional (3d) protein structures. The recent achievements in the resolution achieved in cryo-electron microscopy coupled with improvements in computational methodologies have accelerated the analysis of structures and properties of proteins. However, the detailed interaction between AAs has not been fully elucidated. Herein, we present a de novo method to evaluate inter-amino acid interactions based on the concept of accurately evaluating the amino acid bond pairs (AABP). The results obtained enabled the identification of complex 3d long-range interconnected AA interacting network in proteins. The method is applied to the receptor binding domain (RBD) of the SARS-CoV-2 spike protein. We show that although nearest-neighbor AAs in the primary sequence have large AABP, other nonlocal AAs make substantial contribution to AABP with significant participation of both covalent and hydrogen bonding. Detailed analysis of AABP in RBD reveals the pivotal role they play in sequence conservation with profound implications on residue mutations and for therapeutic drug design. This approach could be easily applied to many other proteins of biomedical interest in life sciences. 
    more » « less