skip to main content


Search for: All records

Award ID contains: 2019609

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    A primary challenge in single-cell RNA sequencing (scRNA-seq) studies comes from the massive amount of data and the excess noise level. To address this challenge, we introduce an analysis framework, named single-cell Decomposition using Hierarchical Autoencoder (scDHA), that reliably extracts representative information of each cell. The scDHA pipeline consists of two core modules. The first module is a non-negative kernel autoencoder able to remove genes or components that have insignificant contributions to the part-based representation of the data. The second module is a stacked Bayesian autoencoder that projects the data onto a low-dimensional space (compressed). To diminish the tendency to overfit of neural networks, we repeatedly perturb the compressed space to learn a more generalized representation of the data. In an extensive analysis, we demonstrate that scDHA outperforms state-of-the-art techniques in many research sub-fields of scRNA-seq analysis, including cell segregation through unsupervised learning, visualization of transcriptome landscape, cell classification, and pseudo-time inference.

     
    more » « less
  2. Abstract Meteorological (MET) data is a crucial input for environmental exposure models. While modeling exposure potential using geospatial technology is a common practice, existing studies infrequently evaluate the impact of input MET data on the level of uncertainty on output results. The objective of this study is to determine the effect of various MET data sources on the potential exposure susceptibility predictions. Three sources of wind data are compared: The North American Regional Reanalysis (NARR) database, meteorological aerodrome reports (METARs) from regional airports, and data from local MET weather stations. These data sources are used as inputs into a machine learning (ML) driven GIS Multi-Criteria Decision Analysis (GIS-MCDA) geospatial model to predict potential exposure to abandoned uranium mine sites in the Navajo Nation. Results indicate significant variations in results derived from different wind data sources. After validating the results from each source using the National Uranium Resource Evaluation (NURE) database in a geographically weighted regression (GWR), METARs data combined with the local MET weather station data showed the highest accuracy, with an average R 2 of 0.74. We conclude that local direct measurement-based data (METARs and MET data) produce a more accurate prediction than the other sources evaluated in the study. This study has the potential to inform future data collection methods, leading to more accurate predictions and better-informed policy decisions surrounding environmental exposure susceptibility and risk assessment. 
    more » « less
    Free, publicly-accessible full text available July 1, 2024
  3. Abstract Background To date, cancer still is one of the leading causes of death worldwide, in which the cumulative of genes carrying mutations was said to be held accountable for the establishment and development of this disease mainly. From that, identification and analysis of driver genes were vital. Our previous study indicated disagreement on a unifying pipeline for these tasks and then introduced a complete one. However, this pipeline gradually manifested its weaknesses as being unfamiliar to non-technical users, time-consuming, and inconvenient. Results This study presented an R package named DrGA, developed based on our previous pipeline, to tackle the mentioned problems above. It wholly automated four widely used downstream analyses for predicted driver genes and offered additional improvements. We described the usage of the DrGA on driver genes of human breast cancer. Besides, we also gave the users another potential application of DrGA in analyzing genomic biomarkers of a complex disease in another organism. Conclusions DrGA facilitated the users with limited IT backgrounds and rapidly created consistent and reproducible results. DrGA and its applications, along with example data, were freely provided at https://github.com/huynguyen250896/DrGA . 
    more » « less
  4. Abstract Recent advances in biochemistry and single-cell RNA sequencing (scRNA-seq) have allowed us to monitor the biological systems at the single-cell resolution. However, the low capture of mRNA material within individual cells often leads to inaccurate quantification of genetic material. Consequently, a significant amount of expression values are reported as missing, which are often referred to as dropouts. To overcome this challenge, we develop a novel imputation method, named single-cell Imputation via Subspace Regression (scISR), that can reliably recover the dropout values of scRNA-seq data. The scISR method first uses a hypothesis-testing technique to identify zero-valued entries that are most likely affected by dropout events and then estimates the dropout values using a subspace regression model. Our comprehensive evaluation using 25 publicly available scRNA-seq datasets and various simulation scenarios against five state-of-the-art methods demonstrates that scISR is better than other imputation methods in recovering scRNA-seq expression profiles via imputation. scISR consistently improves the quality of cluster analysis regardless of dropout rates, normalization techniques, and quantification schemes. The source code of scISR can be found on GitHub at https://github.com/duct317/scISR . 
    more » « less
  5. Rocky Mountain spotted fever (RMSF) is a significant health problem in Sonora, Mexico. The tick vector, Rhipicephalus sanguineus, feeds almost exclusively on domestic dogs that, in this region, also serve as the reservoir for the tick-borne pathogen, Rickettsia rickettsii. A process-based mathematical model of the life cycle of R. sanguineus was developed to predict combinations of insecticidal dog collars and long-lasting insecticidal wall treatments resulting in suppression of indoor tick populations. Because of a high burden of RMSF in a rural community near the Sonora state capital of Hermosillo, a test area was treated with a combination of insecticidal dog collars and long-lasting insecticidal wall treatments from March 2018 to April 2019, with subsequent reduction in RMSF cases and deaths. An estimated 80% of the dogs in the area had collars applied and 15% of the houses were treated. Data on tick abundance on walls and dogs, collected during this intervention, were used to parameterize the model. Model results show a variety of treatment combinations likely to be as successful as the one carried out in the test community. 
    more » « less
  6. It has been evident that N6-methyladenosine (m6A)-modified long noncoding RNAs (m6A-lncRNAs) involves regulating tumorigenesis, invasion, and metastasis for various cancer types. In this study, we sought to pick computationally up a set of 13 hub m6A-lncRNAs in light of three state-of-the-art tools WGCNA, iWGCNA, and oCEM, and interrogated their prognostic values in brain low-grade gliomas (LGG). Of the 13 hub m6A-lncRNAs, we further detected three hub m6A-lncRNAs as independent prognostic risk factors, including HOXB-AS1, ELOA-AS1, and FLG-AS1 . Then, the m6ALncSig model was built based on these three hub m6A-lncRNAs. Patients with LGG next were divided into two groups, high- and low-risk, based on the median m6ALncSig score. As predicted, the high-risk group was more significantly related to mortality. The prognostic signature of m6ALncSig was validated using internal and external cohorts. In summary, our work introduces a high-confidence prognostic prediction signature and paves the way for using m6A-lncRNAs in the signature as new targets for treatment of LGG. 
    more » « less
  7. Lyme disease is the most important vector-borne disease in the United States and is increasing in incidence and geographic range. In the Pacific west, the western black-legged tick, Ixodes pacificus Cooley and Kohls, 1943 is an important vector of the causative agent of Lyme disease, the spirochete, Borrelia burgdorferi. Ixodes pacificus life cycle is expected to be more than a year long, and all three stages (larva, nymph, and adult) overlap in spring. The optimal habitat consists of forest cover, cooler temperatures, and annual precipitation in the range of 200–500 mm. Therefore, the coastal areas of California, Oregon, and Washington are well suited for these ticks. Immature stages commonly parasitize Western fence lizards (Sceloporus occidentalis) and gray squirrels (Sciurus griseus), while adults often feed on deer mice (Peromyscus maniculatus) and black-tailed deer (Odocoileus h. columbianus). Ixodes pacificus carry several pathogens of human significance, such as Borrelia burgdorferi, Bartonella, and Rickettsiales. These pathogens are maintained in the environment by many hosts, including small mammals, birds, livestock, and domestic animals. Although a great deal of work has been carried out on Ixodes ticks and the pathogens they transmit, understanding I. pacificus ecology outside California still lags. Additionally, the dynamic vector–host–pathogen system means that new factors will continue to arise and shift the epidemiological patterns within specific areas. Here, we review the ecology of I. pacificus and the pathogens this tick is known to carry to identify gaps in our knowledge. 
    more » « less