NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Acyltransferase families that act on thioesters: Sequences, structures, and mechanisms

https://doi.org/10.1002/prot.26599

de Carvalho, Caio C.; Murray, Ian P.; Nguyen, Hung; Nguyen, Tin; Cantu, David C. (September 2023, Proteins: Structure, Function, and Bioinformatics)

Abstract Acyltransferases (AT) are enzymes that catalyze the transfer of acyl group to a receptor molecule. This review focuses on ATs that act on thioester‐containing substrates. Although many ATs can recognize a wide variety of substrates, sequence similarity analysis allowed us to classify the ATs into fifteen distinct families. Each AT family is originated from enzymes experimentally characterized to have AT activity, classified according to sequence similarity, and confirmed with tertiary structure similarity for families that have crystallized structures available. All the sequences and structures of the AT families described here are present in the thioester‐active enzyme (ThYme) database. The AT sequences and structures classified into families and available in the ThYme database could contribute to enlightening the understanding acyl transfer to thioester‐containing substrates, most commonly coenzyme A, which occur in multiple metabolic pathways, mostly with fatty acids.
more » « less
Fast and precise single-cell data analysis using a hierarchical autoencoder

https://doi.org/10.1038/s41467-021-21312-2

Tran, Duc; Nguyen, Hung; Tran, Bang; La Vecchia, Carlo; Luu, Hung N.; Nguyen, Tin (February 2021, Nature Communications)

Abstract A primary challenge in single-cell RNA sequencing (scRNA-seq) studies comes from the massive amount of data and the excess noise level. To address this challenge, we introduce an analysis framework, named single-cell Decomposition using Hierarchical Autoencoder (scDHA), that reliably extracts representative information of each cell. The scDHA pipeline consists of two core modules. The first module is a non-negative kernel autoencoder able to remove genes or components that have insignificant contributions to the part-based representation of the data. The second module is a stacked Bayesian autoencoder that projects the data onto a low-dimensional space (compressed). To diminish the tendency to overfit of neural networks, we repeatedly perturb the compressed space to learn a more generalized representation of the data. In an extensive analysis, we demonstrate that scDHA outperforms state-of-the-art techniques in many research sub-fields of scRNA-seq analysis, including cell segregation through unsupervised learning, visualization of transcriptome landscape, cell classification, and pseudo-time inference.
more » « less
Identifying representative sequences of protein families using submodular optimization

https://doi.org/10.1038/s41598-025-85165-1

Nguyen, Ha; Nguyen, Hung; Nguyen, Phuong; Luu, Anh_N; Cantu, David_C; Nguyen, Tin (January 2025, Scientific Reports)
Mutation Space of Spatially Conserved Amino Acid Sites in Proteins

https://doi.org/10.1021/acsomega.3c01473

Caswell, Benjamin T.; Summers, Thomas J.; Licup, Gerra L.; Cantu, David C. (June 2023, ACS Omega)
A novel method for single-cell data imputation using subspace regression

https://doi.org/10.1038/s41598-022-06500-4

Tran, Duc; Tran, Bang; Nguyen, Hung; Nguyen, Tin (December 2022, Scientific Reports)

Abstract Recent advances in biochemistry and single-cell RNA sequencing (scRNA-seq) have allowed us to monitor the biological systems at the single-cell resolution. However, the low capture of mRNA material within individual cells often leads to inaccurate quantification of genetic material. Consequently, a significant amount of expression values are reported as missing, which are often referred to as dropouts. To overcome this challenge, we develop a novel imputation method, named single-cell Imputation via Subspace Regression (scISR), that can reliably recover the dropout values of scRNA-seq data. The scISR method first uses a hypothesis-testing technique to identify zero-valued entries that are most likely affected by dropout events and then estimates the dropout values using a subspace regression model. Our comprehensive evaluation using 25 publicly available scRNA-seq datasets and various simulation scenarios against five state-of-the-art methods demonstrates that scISR is better than other imputation methods in recovering scRNA-seq expression profiles via imputation. scISR consistently improves the quality of cluster analysis regardless of dropout rates, normalization techniques, and quantification schemes. The source code of scISR can be found on GitHub at https://github.com/duct317/scISR .
more » « less
Full Text Available
DrGA: cancer driver gene analysis in a simpler manner

https://doi.org/10.1186/s12859-022-04606-0

Nguyen, Quang-Huy; Nguyen, Tin; Le, Duc-Hau (December 2022, BMC Bioinformatics)

Abstract Background To date, cancer still is one of the leading causes of death worldwide, in which the cumulative of genes carrying mutations was said to be held accountable for the establishment and development of this disease mainly. From that, identification and analysis of driver genes were vital. Our previous study indicated disagreement on a unifying pipeline for these tasks and then introduced a complete one. However, this pipeline gradually manifested its weaknesses as being unfamiliar to non-technical users, time-consuming, and inconvenient. Results This study presented an R package named DrGA, developed based on our previous pipeline, to tackle the mentioned problems above. It wholly automated four widely used downstream analyses for predicted driver genes and offered additional improvements. We described the usage of the DrGA on driver genes of human breast cancer. Besides, we also gave the users another potential application of DrGA in analyzing genomic biomarkers of a complex disease in another organism. Conclusions DrGA facilitated the users with limited IT backgrounds and rapidly created consistent and reproducible results. DrGA and its applications, along with example data, were freely provided at https://github.com/huynguyen250896/DrGA .
more » « less
Full Text Available
DWEN: A novel method for accurate estimation of cell type compositions from bulk data samples

https://doi.org/10.1109/KSE56063.2022.9953757

Tran, Duc; Nguyen, Ha; Nguyen, Hung; Nguyen, Tin (October 2022, 2022 14th International Conference on Knowledge and Systems Engineering (KSE))

Advances in single-cell RNA sequencing (scRNAseq) technologies have allowed us to study the heterogeneity of cell populations. The cell compositions of tissues from different hosts may vary greatly, indicating the condition of the hosts, from which the samples are collected. However, the high sequencing cost and the lack of fresh tissues make single-cell approaches less appealing. In many cases, it is practically impossible to generate single-cell data in a large number of subjects, making it challenging to monitor changes in cell type compositions in various diseases. Here we introduce a novel approach, named Deconvolution using Weighted Elastic Net (DWEN), that allows researchers to accurately estimate the cell type compositions from bulk data samples without the need of generating single-cell data. It also allows for the re-analysis of bulk data collected from rare conditions to extract more in-depth cell-type level insights. The approach consists of two modules. The first module constructs the cell type signature matrix from single-cell data while the second module estimates the cell type compositions of input bulk samples. In an extensive analysis using 20 datasets generated from scRNA-seq data of different human tissues, we demonstrate that DWEN outperforms current state-of-the-arts in estimating cell type compositions of bulk samples.
more » « less
Full Text Available
Thioesterase enzyme families: Functions, structures, and mechanisms

https://doi.org/10.1002/pro.4263

Caswell, Benjamin T.; Carvalho, Caio C.; Nguyen, Hung; Roy, Monikrishna; Nguyen, Tin; Cantu, David C. (March 2022, Protein Science)

Full Text Available
Identification and Validation of a Novel Three Hub Long Noncoding RNAs With m6A Modification Signature in Low-Grade Gliomas

https://doi.org/10.3389/fmolb.2022.801931

Nguyen, Quang-Huy; Nguyen, Tin; Le, Duc-Hau (February 2022, Frontiers in Molecular Biosciences)

It has been evident that N6-methyladenosine (m6A)-modified long noncoding RNAs (m6A-lncRNAs) involves regulating tumorigenesis, invasion, and metastasis for various cancer types. In this study, we sought to pick computationally up a set of 13 hub m6A-lncRNAs in light of three state-of-the-art tools WGCNA, iWGCNA, and oCEM, and interrogated their prognostic values in brain low-grade gliomas (LGG). Of the 13 hub m6A-lncRNAs, we further detected three hub m6A-lncRNAs as independent prognostic risk factors, including HOXB-AS1, ELOA-AS1, and FLG-AS1 . Then, the m6ALncSig model was built based on these three hub m6A-lncRNAs. Patients with LGG next were divided into two groups, high- and low-risk, based on the median m6ALncSig score. As predicted, the high-risk group was more significantly related to mortality. The prognostic signature of m6ALncSig was validated using internal and external cohorts. In summary, our work introduces a high-confidence prognostic prediction signature and paves the way for using m6A-lncRNAs in the signature as new targets for treatment of LGG.
more » « less
Full Text Available
SMRT: Randomized Data Transformation for Cancer Subtyping and Big Data Analysis

https://doi.org/10.3389/fonc.2021.725133

Nguyen, Hung; Tran, Duc; Tran, Bang; Roy, Monikrishna; Cassell, Adam; Dascalu, Sergiu; Draghici, Sorin; Nguyen, Tin (October 2021, Frontiers in Oncology)

Cancer is an umbrella term that includes a range of disorders, from those that are fast-growing and lethal to indolent lesions with low or delayed potential for progression to death. The treatment options, as well as treatment success, are highly dependent on the correct subtyping of individual patients. With the advancement of high-throughput platforms, we have the opportunity to differentiate among cancer subtypes from a holistic perspective that takes into consideration phenomena at different molecular levels (mRNA, methylation, etc.). This demands powerful integrative methods to leverage large multi-omics datasets for a better subtyping. Here we introduce Subtyping Multi-omics using a Randomized Transformation (SMRT), a new method for multi-omics integration and cancer subtyping. SMRT offers the following advantages over existing approaches: (i) the scalable analysis pipeline allows researchers to integrate multi-omics data and analyze hundreds of thousands of samples in minutes, (ii) the ability to integrate data types with different numbers of patients, (iii) the ability to analyze un-matched data of different types, and (iv) the ability to offer users a convenient data analysis pipeline through a web application. We also improve the efficiency of our ensemble-based, perturbation clustering to support analysis on machines with memory constraints. In an extensive analysis, we compare SMRT with eight state-of-the-art subtyping methods using 37 TCGA and two METABRIC datasets comprising a total of almost 12,000 patient samples from 28 different types of cancer. We also performed a number of simulation studies. We demonstrate that SMRT outperforms other methods in identifying subtypes with significantly different survival profiles. In addition, SMRT is extremely fast, being able to analyze hundreds of thousands of samples in minutes. The web application is available at http://SMRT.tinnguyen-lab.com . The R package will be deposited to CRAN as part of our PINSPlus software suite.
more » « less
Full Text Available

« Prev Next »

Search for: All records