Abstract Biological dinitrogen (N2) fixation supplies nitrogen to the oceans, supporting primary productivity, and is carried out by some bacteria and archaea referred to as diazotrophs. Cyanobacteria are conventionally considered to be the major contributors to marine N2 fixation, but non-cyanobacterial diazotrophs (NCDs) have been shown to be distributed throughout ocean ecosystems. However, the biogeochemical significance of marine NCDs has not been demonstrated. This review synthesizes multiple datasets, drawing from cultivation-independent molecular techniques and data from extensive oceanic expeditions, to provide a comprehensive view into the diversity, biogeography, ecophysiology, and activity of marine NCDs. A NCD nifH gene catalog was compiled containing sequences from both PCR-based and PCR-free methods, identifying taxa for future studies. NCD abundances from a novel database of NCD nifH-based abundances were colocalized with environmental data, unveiling distinct distributions and environmental drivers of individual taxa. Mechanisms that NCDs may use to fuel and regulate N2 fixation in response to oxygen and fixed nitrogen availability are discussed, based on a metabolic analysis of recently available Tara Oceans expedition data. The integration of multiple datasets provides a new perspective that enhances understanding of the biology, ecology, and biogeography of marine NCDs and provides tools and directions for future research. 
                        more » 
                        « less   
                    This content will become publicly available on December 1, 2025
                            
                            DiseaseNet: a transfer learning approach to noncommunicable disease classification
                        
                    
    
            Abstract As noncommunicable diseases (NCDs) pose a significant global health burden, identifying effective diagnostic and predictive markers for these diseases is of paramount importance. Epigenetic modifications, such as DNA methylation, have emerged as potential indicators for NCDs. These have previously been exploited in other contexts within the framework of neural network models that capture complex relationships within the data. Applications of neural networks have led to significant breakthroughs in various biological or biomedical fields but these have not yet been effectively applied to NCD modeling. This is, in part, due to limited datasets that are not amenable to building of robust neural network models. In this work, we leveraged a neural network trained on one class of NCDs, cancer, as the basis for a transfer learning approach to non-cancer NCD modeling. Our results demonstrate promising performance of the model in predicting three NCDs, namely, arthritis, asthma, and schizophrenia, for the respective blood samples, with an overall accuracy (f-measure) of 94.5%. Furthermore, a concept based explanation method called Testing with Concept Activation Vectors (TCAV) was used to investigate the importance of the sample sources and understand how future training datasets for multiple NCD models may be improved. Our findings highlight the effectiveness of transfer learning in developing accurate diagnostic and predictive models for NCDs. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 2051062
- PAR ID:
- 10518660
- Publisher / Repository:
- BMC
- Date Published:
- Journal Name:
- BMC Bioinformatics
- Volume:
- 25
- Issue:
- 1
- ISSN:
- 1471-2105
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Introduction Non-communicable disease (NCD) risk is influenced by environmental factors that are highly variable worldwide, yet prior research has focused mainly on high-income countries where most people are exposed to relatively homogeneous and static environments. Understanding the scope and complexity of environmental influences on NCD risk around the globe requires more data from people living in diverse and changing environments. Our project will investigate the prevalence and environmental causes of NCDs among the indigenous peoples of Peninsular Malaysia, known collectively as the Orang Asli, who are currently undergoing varying degrees of lifestyle and sociocultural changes that are predicted to increase vulnerability to NCDs, particularly metabolic disorders and musculoskeletal degenerative diseases. Methods and analysis Biospecimen sampling and screening for a suite of NCDs (eg, cardiovascular disease, type II diabetes, osteoarthritis and osteoporosis), combined with detailed ethnographic work to assess key lifestyle and sociocultural variables (eg, diet, physical activity and wealth), will take place in Orang Asli communities spanning a gradient from remote, traditional villages to acculturated, market-integrated urban areas. Analyses will first test for relationships between environmental variables, NCD risk factors and NCD occurrence to investigate how environmental changes are affecting NCD susceptibility among the Orang Asli. Second, we will examine potential molecular and physiological mechanisms (eg, epigenetics and systemic inflammation) that mediate environmental effects on health. Third, we will identify intrinsic (eg, age and sex) and extrinsic (eg, early-life experiences) factors that predispose certain people to NCDs in the face of environmental change to better understand which Orang Asli are at greatest risk of NCDs. Ethics and dissemination Approval was obtained from multiple ethical review boards including the Malaysian Ministry of Health. This study follows established principles for ethical biomedical research among vulnerable indigenous communities, including fostering collaboration, building cultural competency, enhancing transparency, supporting capacity building and disseminating research findings.more » « less
- 
            Cancer research encompasses data across various scales, modalities, and resolutions, from screening and diagnostic imaging to digitized histopathology slides to various types of molecular data and clinical records. The integration of these diverse data types for personalized cancer care and predictive modeling holds the promise of enhancing the accuracy and reliability of cancer screening, diagnosis, and treatment. Traditional analytical methods, which often focus on isolated or unimodal information, fall short of capturing the complex and heterogeneous nature of cancer data. The advent of deep neural networks has spurred the development of sophisticated multimodal data fusion techniques capable of extracting and synthesizing information from disparate sources. Among these, Graph Neural Networks (GNNs) and Transformers have emerged as powerful tools for multimodal learning, demonstrating significant success. This review presents the foundational principles of multimodal learning including oncology data modalities, taxonomy of multimodal learning, and fusion strategies. We delve into the recent advancements in GNNs and Transformers for the fusion of multimodal data in oncology, spotlighting key studies and their pivotal findings. We discuss the unique challenges of multimodal learning, such as data heterogeneity and integration complexities, alongside the opportunities it presents for a more nuanced and comprehensive understanding of cancer. Finally, we present some of the latest comprehensive multimodal pan-cancer data sources. By surveying the landscape of multimodal data integration in oncology, our goal is to underline the transformative potential of multimodal GNNs and Transformers. Through technological advancements and the methodological innovations presented in this review, we aim to chart a course for future research in this promising field. This review may be the first that highlights the current state of multimodal modeling applications in cancer using GNNs and transformers, presents comprehensive multimodal oncology data sources, and sets the stage for multimodal evolution, encouraging further exploration and development in personalized cancer care.more » « less
- 
            We introduce an active, semisupervised algorithm that utilizes Bayesian experimental design to address the shortage of annotated images required to train and validate Artificial Intelligence (AI) models for lung cancer screening with computed tomography (CT) scans. Our approach incorporates active learning with semisupervised expectation maximization to emulate the human in the loop for additional ground truth labels to train, evaluate, and update the neural network models. Bayesian experimental design is used to intelligently identify which unlabeled samples need ground truth labels to enhance the model’s performance. We evaluate the proposed Active Semi-supervised Expectation Maximization for Computer aided diagnosis (CAD) tasks (ASEM-CAD) using three public CT scans datasets: the National Lung Screening Trial (NLST), the Lung Image Database Consortium (LIDC), and Kaggle Data Science Bowl 2017 for lung cancer classification using CT scans. ASEM-CAD can accurately classify suspicious lung nodules and lung cancer cases with an area under the curve (AUC) of 0.94 (Kaggle), 0.95 (NLST), and 0.88 (LIDC) with significantly fewer labeled images compared to a fully supervised model. This study addresses one of the significant challenges in early lung cancer screenings using low-dose computed tomography (LDCT) scans and is a valuable contribution towards the development and validation of deep learning algorithms for lung cancer screening and other diagnostic radiology examinations.more » « less
- 
            Cancer diagnostics is an important field of cancer recovery and survival with many expensive procedures needed to administer the correct treatment. Machine Learning (ML) approaches can help with the diagnostic prediction from circulating tumor cells in liquid biopsy or from a primary tumor in solid biopsy. After predicting the metastatic potential from a deep learning model, doctors in a clinical setting can administer a safe and correct treatment for a specific patient. This paper investigates the use of deep convolutional neural networks for predicting a specific cancer cell line as a tool for label free identification. Specifically, deep learning strategies for weight initialization and performance metrics are described, with transfer learning and the accuracy metric utilized in this work. The equipment used for prediction involves brightfield microscopy without the use of chemical labels, advanced instruments, or time-consuming biological techniques, giving an advantage over current diagnostic methods. In the procedure, three different binary datasets of well-known cancer cell lines were collected, each having a difference in metastatic potential. Two different classification models were adopted (EfficientNetV2 and ResNet-50) with the analysis given for each stage in the ML architecture. The training results for each model and dataset are provided and systematically compared. We found that the test set accuracy showed favorable performance for both ML models with EfficientNetV2 accuracy reaching up to 99%. These test results allowed EfficientNetV2 to outperform ResNet-50 at an average percent increase of 3.5% for each dataset. The high accuracy obtained from the predictions demonstrates that the system can be retrained on a large-scale clinical dataset.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
