skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Functional Effects of Four or Fewer Critical Genes Linked to Lung Cancers and New Subtypes Detected by A New Machine Learning Classifier
Finding genes biologically directly or indirectly related to lung cancer has been drawing much attention, and many genes directly related to lung cancer have been reported. However, it has not been confirmed whether those published 'key' genes are truly critical to lung cancer formation, i.e., they may be with very limited useful information. As a result, finding essential genes remains a challenging lung cancer research problem. Using a recently developed competing linear factor analysis method in differentially expressed gene detection, we advance the study of lung cancer critical genes detection to a uniformly informative level. A set of common four genes and their functional effects are detected to be differentially expressed in tumor and non- tumor samples with 100% sensitivity and 100% specificity in one study of lung adenocarcinoma (LUAD) and one study of squamous cell lung cancers (LUSC) (two North American cohorts with 20429 genes, 576 and 552 samples respectively). Two additional analyses also gain accuracy of 97.8% sensitivity and 100% specificity in one study of non-small cell lung carcinomas (NSCLC, a European cohort with 20356 genes and 156 samples), and an accuracy of 100% sensitivity and 95% specificity (1 out of 20 non-tumor samples) in one study of ALK-positive and EGFR/KRAS/ALK-negative lung adenocarcinomas (LUAD, a Japanese cohort with 20356 genes and 224 samples). There are some common genes, but different functional effects, within each set of four genes among two North American cohorts and a European cohort and among North American cohorts and the Japanese cohort. These results show the four-gene-based classifiers are robust with different types of lung cancers and different race cohorts and accurate. The functional effects of four genes disclose significantly other mechanisms (mysteries) between LUAD and LUSC. These sets of four genes and their functional effects are considered to be essential for lung cancer studies and practice. These genes' functional effects naturally classify patients into different groups (more than seven subtypes). Subtype information is useful for personalized therapies. The new findings can motivate new lung cancer research in more focused and targeted directions to save lives, protect people, and reduce enormous economic costs in research and lung cancer treatments.  more » « less
Award ID(s):
2012298
PAR ID:
10340964
Author(s) / Creator(s):
Date Published:
Journal Name:
Journal of clinical trials
Volume:
14
Issue:
1
ISSN:
2167-0870
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Known genes in the breast cancer study literature could not be confirmed whether they are vital to breast cancer formations due to lack of convincing accuracy, although they may be biologically directly related to breast cancer based on present biological knowledge. It is hoped vital genes can be identified with the highest possible accuracy, for example, 100% accuracy and convincing causal patterns beyond what has been known in breast cancer. One hope is that finding gene-gene interaction signatures and functional effects may solve the puzzle. This research uses a recently developed competing linear factor analysis method in differentially expressed gene detection to advance the study of breast cancer formation. Surprisingly, 3 genes are detected to be differentially expressed in TNBC and non-TNBC (Her2, Luminal A, Luminal B) samples with 100% sensitivity and 100% specificity in 1 study of triple-negative breast cancers (TNBC, with 54 675 genes and 265 samples). These 3 genes show a clear signature pattern of how TNBC patients can be grouped. For another TNBC study (with 54 673 genes and 66 samples), 4 genes bring the same accuracy of 100% sensitivity and 100% specificity. Four genes are found to have the same accuracy of 100% sensitivity and 100% specificity in 1 breast cancer study (with 54 675 genes and 121 samples), and the same 4 genes bring an accuracy of 100% sensitivity and 96.5% specificity in the fourth breast cancer study (with 60 483 genes and 1217 samples). These results show the 4-gene-based classifiers are robust and accurate. The detected genes naturally classify patients into subtypes, for example, 7 subtypes. These findings demonstrate the clearest gene-gene interaction patterns and functional effects with the smallest numbers of genes and the highest accuracy compared with findings reported in the literature. The 4 genes are considered to be essential for breast cancer studies and practice. They can provide focused, targeted researches and precision medicine for each subtype of breast cancer. New breast cancer disease types may be detected using the classified subtypes, and hence new effective therapies can be developed. 
    more » « less
  2. e20551 Background: Enzyme activity is at the center of all biological processes. When these activities are misregulated by changes in sequence, expression, or activity, pathologies emerge. Misregulation of protease enzymes such as Matrix Metalloproteinases and Cathepsins play a key role in the pathophysiology of cancer. We describe here a novel class of graphene-based, cost effective biosensors that can detect altered protease activation in a blood sample from early stage lung cancer patients. Methods: The Gene Expression Omnibus (GEO) tool was used to identify proteases differentially expressed in lung cancer and matched normal tissue. Biosensors were assembled on a graphene backbone annotated with one of a panel of fluorescently tagged peptides. The graphene quenches fluorescence until the peptide is either cleaved by active proteases or altered by post-translational modification. 19 protease biosensors were evaluated on 431 commercially collected serum samples from non-lung cancer controls (69%) and pathologically confirmed lung cancer cases (31%) tested over two independent cohorts. Serum was incubated with each of the 19 biosensors and enzyme activity was measured indirectly as a continuous variable by a fluorescence plate reader. Analysis was performed using Emerge, a proprietary predictive and classification modeling system based on massively parallel evolving “Turing machine” algorithms. Each analysis stratified allocation into training and testing sets, and reserved an out-of-sample validation set for reporting. Results: 256 clinical samples were initially evaluated including 35% cancer cases evenly distributed across stages I (29%), II (26%), III (24%) and IV (21%). The case controls included common co-morbidies in the at-risk population such as COPD, chronic bronchitis, and benign nodules (19%). Using the Emerge classification analysis, biosensor biomarkers alone (no clinical factors) demonstrated Sensitivity (Se.) = 92% (CI 82%-99%) and Specificity (Sp.) = 82% (CI 69%-91%) in the out-of-sample set. An independent cohort of 175 clinical cases (age 67±8, 52% male) focused on early detection (26% cancer, 70% Stage I, 30% Stage II/III) were similarly evaluated. Classification showed Se. = 100% (CI 79%-100%) and Sp. = 93% (CI 80%-99%) in the out-of-sample set. For the entire dataset of 175 samples, Se. = 100% (CI 92%-100%) and Sp. = 97% (CI 92%-99%) was observed. Conclusions: Lung cancer can be treated if it is diagnosed when still localized. Despite clear data showing screening for lung cancer by Low Dose Computed Tomography (LDCT) is effective, screening compliance remains very low. Protease biosensors provide a cost effective additional specialized tool with high sensitivity and specificity in detection of early stage lung cancer. A large prospective trial of at-risk smokers with follow up is being conducted to evaluate a commercial version of this assay. 
    more » « less
  3. Abstract Lung adenocarcinoma (LUAD) remains a leading cause of cancer-related mortalities, characterized by substantial genetic heterogeneity that challenges a comprehensive understanding of its progression. This study employs next-generation sequencing data analysis to transform our comprehension of LUAD pathogenesis. Integrating epigenetic and transcriptomic data of LUAD patients, this approach assessed the critical regulatory occurrences, identified therapeutic targets, and offered profound insights into cancer molecular foundations. We employed the DNA methylation data to identify differentially methylated CpG sites and explored the transcriptome profiles of their adjacent genes. An intersectional analysis of gene expression profiles uncovered 419 differentially expressed genes (DEGs) influenced by smoke-induced differential DNA methylation, among which hub genes, including mitochondrial ribosomal proteins (MRPs), and ribosomal proteins (RPs) such asMRPS15,MRPS5,MRPL33,RPL24,RPL7L1,MRPL15,TUFM,MRPL22, andRSL1D1, were identified using a network-based approach. These hub genes were overexpressed and enriched to RNA processing, ribosome biogenesis, and mitochondrial translation, which is critical in LUAD progression. Enhancer Linking Methylation/Expression Relationship (ELMER) analysis revealed transcription factor (TF) binding motifs, such asJUN,NKX23,FOSB,RUNX3, andFOSL1, which regulated these hub genes through methylation-dependent enhancer dynamics. Predominant hypomethylation of MRPs and RPs disrupted mitochondrial function, contributed to oxidative phosphorylation (OXPHOS) and metabolic reprogramming, favoring cancer cell survival. The survival analysis validated the clinical relevance of these hub genes, with high-expression cohorts exhibiting poor overall survival (OS) outcomes enlightened their relevance in LUAD pathogenesis and presented the potential for developing novel targeted therapeutic strategies. 
    more » « less
  4. Hershberg, Ruth (Ed.)
    Comparative genomic analyses have enormous potential for identifying key genes central to human health phenotypes, including those that promote cancers. In particular, the successful development of novel therapeutics using model species requires phylogenetic analyses to determine molecular homology. Accordingly, we investigate the evolutionary histories of anaplastic lymphoma kinase (ALK)—which can underlie tumorigenesis in neuroblastoma, non-small cell lung cancer, and anaplastic large-cell lymphoma—its close relative leukocyte tyrosine kinase (LTK) and their candidate ligands. Homology of ligands identified in model organisms to those functioning in humans remains unclear. Therefore, we searched for homologs of the human genes across metazoan genomes, finding that the candidate ligands Jeb and Hen-1 were restricted to non-vertebrate species. In contrast, the ligand AUG was only identified in vertebrates. We found two ALK-like and four AUG-like protein-coding genes in lamprey. Of these six genes, only one ALK-like and two AUG-like genes exhibited early embryonic expression that parallels model mammal systems. Two copies of AUG are present in nearly all jawed vertebrates. Our phylogenetic analysis strongly supports the presence of previously unrecognized functional convergences of ALK and LTK between actinopterygians and sarcopterygians—despite contemporaneous, highly conserved synteny of ALK and LTK. These findings provide critical guidance regarding the propriety of fish and mammal models with regard to model-organism-based investigation of these medically important genes. In sum, our results provide the phylogenetic context necessary for effective investigations of the functional roles and biology of these critically important receptors. 
    more » « less
  5. null (Ed.)
    Abstract Background DNA methylation is an epigenetic event involving the addition of a methyl-group to a cytosine-guanine base pair (i.e., CpG site). It is associated with different cancers. Our research focuses on studying non-small cell lung cancer hemimethylation, which refers to methylation occurring on only one of the two DNA strands. Many studies often assume that methylation occurs on both DNA strands at a CpG site. However, recent publications show the existence of hemimethylation and its significant impact. Therefore, it is important to identify cancer hemimethylation patterns. Methods In this paper, we use the Wilcoxon signed rank test to identify hemimethylated CpG sites based on publicly available non-small cell lung cancer methylation sequencing data. We then identify two types of hemimethylated CpG clusters, regular and polarity clusters, and genes with large numbers of hemimethylated sites. Highly hemimethylated genes are then studied for their biological interactions using available bioinformatics tools. Results In this paper, we have conducted the first-ever investigation of hemimethylation in lung cancer. Our results show that hemimethylation does exist in lung cells either as singletons or clusters. Most clusters contain only two or three CpG sites. Polarity clusters are much shorter than regular clusters and appear less frequently. The majority of clusters found in tumor samples have no overlap with clusters found in normal samples, and vice versa. Several genes that are known to be associated with cancer are hemimethylated differently between the cancerous and normal samples. Furthermore, highly hemimethylated genes exhibit many different interactions with other genes that may be associated with cancer. Hemimethylation has diverse patterns and frequencies that are comparable between normal and tumorous cells. Therefore, hemimethylation may be related to both normal and tumor cell development. Conclusions Our research has identified CpG clusters and genes that are hemimethylated in normal and lung tumor samples. Due to the potential impact of hemimethylation on gene expression and cell function, these clusters and genes may be important to advance our understanding of the development and progression of non-small cell lung cancer. 
    more » « less