skip to main content

Search for: All records

Creators/Authors contains: "Luo, Feng"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    The high sequencing error rate has impeded the application of long noisy reads for diploid genome assembly. Most existing assemblers failed to generate high-quality phased assemblies using long noisy reads. Here, we present PECAT, aPhasedErrorCorrection andAssemblyTool, for reconstructing diploid genomes from long noisy reads. We design a haplotype-aware error correction method that can retain heterozygote alleles while correcting sequencing errors. We combine a corrected read SNP caller and a raw read SNP caller to further improve the identification of inconsistent overlaps in the string graph. We use a grouping method to assign reads to different haplotype groups. PECAT efficiently assembles diploid genomes using Nanopore R9, PacBio CLR or Nanopore R10 reads only. PECAT generates more contiguous haplotype-specific contigs compared to other assemblers. Especially, PECAT achieves nearly haplotype-resolved assembly onB. taurus(Bison×Simmental) using Nanopore R9 reads and phase block NG50 with 59.4/58.0 Mb for HG002 using Nanopore R10 reads.

    more » « less
  2. Free, publicly-accessible full text available September 22, 2024
  3. Free, publicly-accessible full text available August 6, 2024
  4. Abstract

    Source and sink interactions play a critical but mechanistically poorly understood role in the regulation of senescence. To disentangle the genetic and molecular mechanisms underlying source–sink-regulated senescence (SSRS), we performed a phenotypic, transcriptomic, and systems genetics analysis of senescence induced by the lack of a strong sink in maize (Zea mays). Comparative analysis of genotypes with contrasting SSRS phenotypes revealed that feedback inhibition of photosynthesis, a surge in reactive oxygen species, and the resulting endoplasmic reticulum (ER) stress were the earliest outcomes of weakened sink demand. Multienvironmental evaluation of a biparental population and a diversity panel identified 12 quantitative trait loci and 24 candidate genes, respectively, underlying SSRS. Combining the natural diversity and coexpression networks analyses identified 7 high-confidence candidate genes involved in proteolysis, photosynthesis, stress response, and protein folding. The role of a cathepsin B like protease 4 (ccp4), a candidate gene supported by systems genetic analysis, was validated by analysis of natural alleles in maize and heterologous analyses in Arabidopsis (Arabidopsis thaliana). Analysis of natural alleles suggested that a 700-bp polymorphic promoter region harboring multiple ABA-responsive elements is responsible for differential transcriptional regulation of ccp4 by ABA and the resulting variation in SSRS phenotype. We propose a model for SSRS wherein feedback inhibition of photosynthesis, ABA signaling, and oxidative stress converge to induce ER stress manifested as programed cell death and senescence. These findings provide a deeper understanding of signals emerging from loss of sink strength and offer opportunities to modify these signals to alter senescence program and enhance crop productivity.

    more » « less
  5. Abstract

    Long single-molecular sequencing technologies, such as PacBio circular consensus sequencing (CCS) and nanopore sequencing, are advantageous in detecting DNA 5-methylcytosine in CpGs (5mCpGs), especially in repetitive genomic regions. However, existing methods for detecting 5mCpGs using PacBio CCS are less accurate and robust. Here, we present ccsmeth, a deep-learning method to detect DNA 5mCpGs using CCS reads. We sequence polymerase-chain-reaction treated and M.SssI-methyltransferase treated DNA of one human sample using PacBio CCS for training ccsmeth. Using long (≥10 Kb) CCS reads, ccsmeth achieves 0.90 accuracy and 0.97 Area Under the Curve on 5mCpG detection at single-molecule resolution. At the genome-wide site level, ccsmeth achieves >0.90 correlations with bisulfite sequencing and nanopore sequencing using only 10× reads. Furthermore, we develop a Nextflow pipeline, ccsmethphase, to detect haplotype-aware methylation using CCS reads, and then sequence a Chinese family trio to validate it. ccsmeth and ccsmethphase can be robust and accurate tools for detecting DNA 5-methylcytosines.

    more » « less
  6. Abstract Sweet orange originated from the introgressive hybridizations of pummelo and mandarin resulting in a highly heterozygous genome. How alleles from the two species cooperate in shaping sweet orange phenotypes under distinct circumstances is unknown. Here, we assembled a chromosome-level phased diploid Valencia sweet orange (DVS) genome with over 99.999% base accuracy and 99.2% gene annotation BUSCO completeness. DVS enables allele-level studies for sweet orange and other hybrids between pummelo and mandarin. We first configured an allele-aware transcriptomic profiling pipeline and applied it to 740 sweet orange transcriptomes. On average, 32.5% of genes have a significantly biased allelic expression in the transcriptomes. Different cultivars, transgenic lineages, tissues, development stages, and disease status all impacted allelic expressions and resulted in diversified allelic expression patterns in sweet orange, but particularly citrus Huanglongbing (HLB) shifted the allelic expression of hundreds of genes in leaves and calyx abscission zones. In addition, we detected allelic structural mutations in an HLB-tolerant mutant (T19) and a more sensitive mutant (T78) through long-read sequencing. The irradiation-induced structural mutations mostly involved double-strand breaks, while most spontaneous structural mutations were transposon insertions. In the mutants, most genes with significant allelic expression ratio alterations (≥1.5-fold) were directly affected by those structural mutations. In T19, alleles located at a translocated segment terminal were upregulated, including CsDnaJ, CsHSP17.4B, and CsCEBPZ. Their upregulation is inferred to keep phloem protein homeostasis under the stress from HLB and enable subsequent stress responses observed in T19. DVS will advance allelic level studies in citrus. 
    more » « less
  7. Birol, Inanc (Ed.)
    Abstract Motivation Oxford Nanopore sequencing has great potential and advantages in population-scale studies. Due to the cost of sequencing, the depth of whole-genome sequencing for per individual sample must be small. However, the existing single nucleotide polymorphism (SNP) callers are aimed at high-coverage Nanopore sequencing reads. Detecting the SNP variants on low-coverage Nanopore sequencing data is still a challenging problem. Results We developed a novel deep learning-based SNP calling method, NanoSNP, to identify the SNP sites (excluding short indels) based on low-coverage Nanopore sequencing reads. In this method, we design a multi-step, multi-scale and haplotype-aware SNP detection pipeline. First, the pileup model in NanoSNP utilizes the naive pileup feature to predict a subset of SNP sites with a Bi-long short-term memory (LSTM) network. These SNP sites are phased and used to divide the low-coverage Nanopore reads into different haplotypes. Finally, the long-range haplotype feature and short-range pileup feature are extracted from each haplotype. The haplotype model combines two features and predicts the genotype for the candidate site using a Bi-LSTM network. To evaluate the performance of NanoSNP, we compared NanoSNP with Clair, Clair3, Pepper-DeepVariant and NanoCaller on the low-coverage (∼16×) Nanopore sequencing reads. We also performed cross-genome testing on six human genomes HG002–HG007, respectively. Comprehensive experiments demonstrate that NanoSNP outperforms Clair, Pepper-DeepVariant and NanoCaller in identifying SNPs on low-coverage Nanopore sequencing data, including the difficult-to-map regions and major histocompatibility complex regions in the human genome. NanoSNP is comparable to Clair3 when the coverage exceeds 16×. Availability and implementation Supplementary information Supplementary data are available at Bioinformatics online. 
    more » « less
  8. Abstract Fractionally doped perovskites oxides (FDPOs) have demonstrated ubiquitous applications such as energy conversion, storage and harvesting, catalysis, sensor, superconductor, ferroelectric, piezoelectric, magnetic, and luminescence. Hence, an accurate, cost-effective, and easy-to-use methodology to discover new compositions is much needed. Here, we developed a function-confined machine learning methodology to discover new FDPOs with high prediction accuracy from limited experimental data. By focusing on a specific application, namely solar thermochemical hydrogen production, we collected 632 training data and defined 21 desirable features. Our gradient boosting classifier model achieved a high prediction accuracy of 95.4% and a high F1 score of 0.921. Furthermore, when verified on additional 36 experimental data from existing literature, the model showed a prediction accuracy of 94.4%. With the help of this machine learning approach, we identified and synthesized 11 new FDPO compositions, 7 of which are relevant for solar thermochemical hydrogen production. We believe this confined machine learning methodology can be used to discover, from limited data, FDPOs with other specific application purposes. 
    more » « less
  9. Abstract

    Although long-read single-cell RNA isoform sequencing (scISO-Seq) can reveal alternative RNA splicing in individual cells, it suffers from a low read throughput. Here, we introduce HIT-scISOseq, a method that removes most artifact cDNAs and concatenates multiple cDNAs for PacBio circular consensus sequencing (CCS) to achieve high-throughput and high-accuracy single-cell RNA isoform sequencing. HIT-scISOseq can yield >10 million high-accuracy long-reads in a single PacBio Sequel II SMRT Cell 8M. We also report the development of scISA-Tools that demultiplex HIT-scISOseq concatenated reads into single-cell cDNA reads with >99.99% accuracy and specificity. We apply HIT-scISOseq to characterize the transcriptomes of 3375 corneal limbus cells and reveal cell-type-specific isoform expression in them. HIT-scISOseq is a high-throughput, high-accuracy, technically accessible method and it can accelerate the burgeoning field of long-read single-cell transcriptomics.

    more » « less