skip to main content

Search for: All records

Creators/Authors contains: "Zheng, Wei"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    The breakthrough in cryo-electron microscopy (cryo-EM) technology has led to an increasing number of density maps of biological macromolecules. However, constructing accurate protein complex atomic structures from cryo-EM maps remains a challenge. In this study, we extend our previously developed DEMO-EM to present DEMO-EM2, an automated method for constructing protein complex models from cryo-EM maps through an iterative assembly procedure intertwining chain- and domain-level matching and fitting for predicted chain models. The method was carefully evaluated on 27 cryo-electron tomography (cryo-ET) maps and 16 single-particle EM maps, where DEMO-EM2 models achieved an average TM-score of 0.92, outperforming those of state-of-the-art methods. The results demonstrate an efficient method that enables the rapid and reliable solution of challenging cryo-EM structure modeling problems.

    more » « less
  2. Abstract

    Leveraging iterative alignment search through genomic and metagenome sequence databases, we report the DeepMSA2 pipeline for uniform protein single- and multichain multiple-sequence alignment (MSA) construction. Large-scale benchmarks show that DeepMSA2 MSAs can remarkably increase the accuracy of protein tertiary and quaternary structure predictions compared with current state-of-the-art methods. An integrated pipeline with DeepMSA2 participated in the most recent CASP15 experiment and created complex structural models with considerably higher quality than the AlphaFold2-Multimer server (v.2.2.0). Detailed data analyses show that the major advantage of DeepMSA2 lies in its balanced alignment search and effective model selection, and in the power of integrating huge metagenomics databases. These results demonstrate a new avenue to improve deep learning protein structure prediction through advanced MSA construction and provide additional evidence that optimization of input information to deep learning-based structure prediction methods must be considered with as much care as the design of the predictor itself.

    more » « less
  3. Free, publicly-accessible full text available August 1, 2024
  4. In block designs, the responses of plots are potentially influenced by the treatments of neighbouring plots and the surrounding environment. Many researchers use two guarding plots next to the edge plots, for which we apply certain treatments to control these environmental effects. Thus, a design is presented as a collection of treatment sequences. For the estimation of total effects, existing results consider circular designs, whose constraints are unnecessary in common applications. In this paper, we construct optimal or highly efficient non‐circular designs under interference models. It is observed that the optimal non‐circular designs for the total effects outperform the optimal circular designs in many instances. In fact, a design containing a circular sequence cannot be optimal for .

    more » « less
  5. In many applications of block designs, the responses of plots are affected by treatments in neighbouring plots. What makes it more complicated is the border effect on the two edge plots caused by potential environmental impacts outside the blocks. For the latter, many researchers use two guarding plots next to the edge plots, for which we apply certain treatments to control these impacts. There have been extensive studies of designs under this set‐up; however, we observe that existing literature has been focusing on circular designs where the treatments applied to the border effects are the same as the edge plots on their opposite sides. This structural restriction is unnecessary in most applications. We consider non‐circular designs, where guarding plots are allowed to take any treatments by design. In this paper, optimal non‐circular designs are constructed for direct effects estimations. It is found that optimal non‐circular designs outperform optimal circular designs in many cases, especially for many commonly studied cases in the literature.

    more » « less
  6. Free, publicly-accessible full text available July 1, 2024
  7. The large demand of mobile devices creates significant concerns about the quality of mobile applications (apps). Developers heavily rely on bug reports in issue tracking systems to reproduce failures (e.g., crashes). However, the process of crash reproduction is often manually done by developers, making the resolution of bugs inefficient, especially given that bug reports are often written in natural language. To improve the productivity of developers in resolving bug reports, in this paper, we introduce a novel approach, called ReCDroid+, that can automatically reproduce crashes from bug reports for Android apps. ReCDroid+ uses a combination of natural language processing (NLP) , deep learning, and dynamic GUI exploration to synthesize event sequences with the goal of reproducing the reported crash. We have evaluated ReCDroid+ on 66 original bug reports from 37 Android apps. The results show that ReCDroid+ successfully reproduced 42 crashes (63.6% success rate) directly from the textual description of the manually reproduced bug reports. A user study involving 12 participants demonstrates that ReCDroid+ can improve the productivity of developers when resolving crash bug reports. 
    more » « less
  8. The U‐statistic has been an important part of the arsenal of statistical tools. Meanwhile, the computation of it could easily become expensive. As a remedy, the idea of incomplete U‐statistics has been adopted in practice, where only a small fraction of combinations of units are evaluated. Recently, researchers proposed a new type of incomplete U‐statistics called ICUDO, which needs substantially less time of computing than all existing methods. This paper aims to study the asymptotic distributions of ICUDO to facilitate the corresponding statistical inference. This is a non‐trivial task due to the restricted randomization in the sampling scheme of ICUDO. The bootstrap approach for the finite sample distribution of ICUDO is also discussed. Lastly, we observe some intrinsic connections between U‐statistics and computer experiments in the context of integration approximation. This allows us to generalize some existing theoretical results in the latter topic.

    more » « less
  9. Abstract

    Most proteins in nature contain multiple folding units (or domains). The revolutionary success of AlphaFold2 in single-domain structure prediction showed potential to extend deep-learning techniques for multi-domain structure modeling. This work presents a significantly improved method, DEMO2, which integrates analogous template structural alignments with deep-learning techniques for high-accuracy domain structure assembly. Starting from individual domain models, inter-domain spatial restraints are first predicted with deep residual convolutional networks, where full-length structure models are assembled using L-BFGS simulations under the guidance of a hybrid energy function combining deep-learning restraints and analogous multi-domain template alignments searched from the PDB. The output of DEMO2 contains deep-learning inter-domain restraints, top-ranked multi-domain structure templates, and up to five full-length structure models. DEMO2 was tested on a large-scale benchmark and the blind CASP14 experiment, where DEMO2 was shown to significantly outperform its predecessor and the state-of-the-art protein structure prediction methods. By integrating with new deep-learning techniques, DEMO2 should help fill the rapidly increasing gap between the improved ability of tertiary structure determination and the high demand for the high-quality multi-domain protein structures. The DEMO2 server is available at

    more » « less
  10. Abstract

    Deep learning techniques have significantly advanced the field of protein structure prediction. LOMETS3 ( is a new generation meta-server approach to template-based protein structure prediction and function annotation, which integrates newly developed deep learning threading methods. For the first time, we have extended LOMETS3 to handle multi-domain proteins and to construct full-length models with gradient-based optimizations. Starting from a FASTA-formatted sequence, LOMETS3 performs four steps of domain boundary prediction, domain-level template identification, full-length template/model assembly and structure-based function prediction. The output of LOMETS3 contains (i) top-ranked templates from LOMETS3 and its component threading programs, (ii) up to 5 full-length structure models constructed by L-BFGS (limited-memory Broyden–Fletcher–Goldfarb–Shanno algorithm) optimization, (iii) the 10 closest Protein Data Bank (PDB) structures to the target, (iv) structure-based functional predictions, (v) domain partition and assembly results, and (vi) the domain-level threading results, including items (i)–(iii) for each identified domain. LOMETS3 was tested in large-scale benchmarks and the blind CASP14 (14th Critical Assessment of Structure Prediction) experiment, where the overall template recognition and function prediction accuracy is significantly beyond its predecessors and other state-of-the-art threading approaches, especially for hard targets without homologous templates in the PDB. Based on the improved developments, LOMETS3 should help significantly advance the capability of broader biomedical community for template-based protein structure and function modelling.

    more » « less