skip to main content

Search for: All records

Creators/Authors contains: "Zhang, Li"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Entity linking, the task of linking potentially ambiguous mentions in texts to corresponding knowledge-base entities, is an important component for language understanding. We address two challenge in entity linking: how to leverage wider contexts surrounding a mention, and how to deal with limited training data. We propose a fully unsupervised model called SumMC that first generates a guided summary of the contexts conditioning on the mention, and then casts the task to a multiple-choice problem where the model chooses an entity from a list of candidates. In addition to evaluating our model on existing datasets that focus on named entities, we create a new dataset that links noun phrases from WikiHow to Wikidata. We show that our SumMC model achieves state-of-the-art unsupervised performance on our new dataset and on exiting datasets.
    Free, publicly-accessible full text available December 7, 2023
  2. Abstract Background Events of gene fusion have been reported in several organisms. However, the general role of gene fusion as part of new gene origination remains unknown. Results We conduct genome-wide interrogations of four Oryza genomes by designing and implementing novel pipelines to detect fusion genes. Based on the phylogeny of ten plant species, we detect 310 fusion genes across four Oryza species. The estimated rate of origination of fusion genes in the Oryza genus is as high as 63 fusion genes per species per million years, which is fixed at 16 fusion genes per species per million years and much higher than that in flies. By RNA sequencing analysis, we find more than 44% of the fusion genes are expressed and 90% of gene pairs show strong signals of purifying selection. Further analysis of CRISPR/Cas9 knockout lines indicates that newly formed fusion genes regulate phenotype traits including seed germination, shoot length and root length, suggesting the functional significance of these genes. Conclusions We detect new fusion genes that may drive phenotype evolution in Oryza. This study provides novel insights into the genome evolution of Oryza.
    Free, publicly-accessible full text available December 1, 2023
  3. The coronavirus disease 2019 (COVID-19) pandemic began in January 2020 in Wuhan, China, with a new coronavirus designated SARS-CoV-2. The principal cause of death from COVID-19 disease quickly emerged as acute respiratory distress syndrome (ARDS). A key ARDS pathogenic mechanism is the “Cytokine Storm”, which is a dramatic increase in inflammatory cytokines in the blood. In the last two years of the pandemic, a new pathology has emerged in some COVID-19 survivors, in which a variety of long-term symptoms occur, a condition called post-acute sequelae of COVID-19 (PASC) or “Long COVID”. Therefore, there is an urgent need to better understand the mechanisms of the virus. The spike protein on the surface of the virus is composed of joined S1–S2 subunits. Upon S1 binding to the ACE2 receptor on human cells, the S1 subunit is cleaved and the S2 subunit mediates the entry of the virus. The S1 protein is then released into the blood, which might be one of the pivotal triggers for the initiation and/or perpetuation of the cytokine storm. In this study, we tested the hypothesis that the S1 spike protein is sufficient to activate inflammatory signaling and cytokine production, independent of the virus. Our data support amore »possible role for the S1 spike protein in the activation of inflammatory signaling and cytokine production in human lung and intestinal epithelial cells in culture. These data support a potential role for the SARS-CoV-2 S1 spike protein in COVID-19 pathogenesis and PASC.« less
    Free, publicly-accessible full text available October 1, 2023
  4. The process of fermenting tofu extends back thousands of years and is an indispensable part of Chinese culture. Despite a cultural resurgence in fermented foods and interest in microbiomes, there is little knowledge on the microbial diversity represented in fermented ‘hairy’ tofu, known locally in China as Mao tofu. High-throughput metagenomic sequencing of the ITS, LSU and 16S rDNA was used to determine Mao tofu’s fungal and bacterial community diversity across four wet markets in Yunnan, China. The results show that hairy tofu in this region consists of around 170 fungal and 365 bacterial taxa, and that microbial taxa differ between markets. Diversity also differed based on the specific niche of the tofu block, comparing the outside rind-like niche to that of the inside of the tofu block. Machine learning random forest models were able to accurately classify both the market and niche of sample origin. An over-abundance of yeast and Geotrichum was found, and Mucor (Mucoromycota) was abundant in the outside rind-like niche, which consists of the visible ‘hairy’ mycelium. The majority of the bacterial OTUs belonged to Proteobacteria, Firmicutes, and Bacteroidetes, with Acinetobacter, Lactobacillus, Sphingobacterium and Flavobacterium the most abundant genera. Putative fungal pathogens of plants (Cercospora, Diaporthe,more »Fusarium) and animals (Metarhizium, Entomomortierella, Pyxidiophora, Candida, Clavispora) were also detected, as were putative bacterial pathogens identified as Legionella. Non-fungal eukaryotic taxa detected by LSU amplicon sequencing included soybean (Glycine max), Protozoa, Metazoa (e.g., Nematoda and Platyhelminthes), Rhizaria and Chromista, indicating that additional biodiversity exists in the hairy tofu microbiome.« less
  5. Free, publicly-accessible full text available August 10, 2023
  6. Free, publicly-accessible full text available January 1, 2024
  7. Free, publicly-accessible full text available July 13, 2023
  8. With the technology trend of hardware and workload consolidation for embedded systems and the rapid development of edge computing, there has been increasing interest in supporting parallel real-time tasks to better utilize the multi-core platforms while meeting the stringent real-time constraints. For parallel real-time tasks, the federated scheduling paradigm, which assigns each parallel task a set of dedicated cores, achieves good theoretical bounds by ensuring exclusive use of processing resources to reduce interferences. However, because cores share the last-level cache and memory bandwidth resources, in practice tasks may still interfere with each other despite executing on dedicated cores. Such resource interferences due to concurrent accesses can be even more severe for embedded platforms or edge servers, where the computing power and cache/memory space are limited. To tackle this issue, in this work, we present a holistic resource allocation framework for parallel real-time tasks under federated scheduling. Under our proposed framework, in addition to dedicated cores, each parallel task is also assigned with dedicated cache and memory bandwidth resources. Further, we propose a holistic resource allocation algorithm that well balances the allocation between different resources to achieve good schedulability. Additionally, we provide a full implementation of our framework by extending themore »federated scheduling system with Intel’s Cache Allocation Technology and MemGuard. Finally, we demonstrate the practicality of our proposed framework via extensive numerical evaluations and empirical experiments using real benchmark programs.« less
  9. Abstract Key message

    We investigate the genetic basis of panicle architecture in switchgrass in two mapping populations across a latitudinal gradient, and find many stable, repeatable genetic effects and limited genetic interactions with the environment.

    Abstract

    Grass species exhibit large diversity in panicle architecture influenced by genes, the environment, and their interaction. The genetic study of panicle architecture in perennial grasses is limited. In this study, we evaluate the genetic basis of panicle architecture including panicle length, primary branching number, and secondary branching number in an outcrossed switchgrass QTL population grown across ten field sites in the central USA through multi-environment mixed QTL analysis. We also evaluate genetic effects in a diversity panel of switchgrass grown at three of the ten field sites using genome-wide association (GWAS) and multivariate adaptive shrinkage. Furthermore, we search for candidate genes underlying panicle traits in both of these independent mapping populations. Overall, 18 QTL were detected in the QTL mapping population for the three panicle traits, and 146 unlinked genomic regions in the diversity panel affected one or more panicle trait. Twelve of the QTL exhibited consistent effects (i.e., no QTL by environment interactions or no QTL × E), and most (four of six) of the effects withmore »QTL × E exhibited site-specific effects. Most (59.3%) significant partially linked diversity panel SNPs had significant effects in all panicle traits and all field sites and showed pervasive pleiotropy and limited environment interactions. Panicle QTL co-localized with significant SNPs found using GWAS, providing additional power to distinguish between true and false associations in the diversity panel.

    « less
  10. Procedures are inherently hierarchical. To “make videos”, one may need to “purchase a camera”, which in turn may require one to “set a budget”. While such hierarchical knowledge is critical for reasoning about complex procedures, most existing work has treated procedures as shallow structures without modeling the parent-child relation. In this work, we attempt to construct an open-domain hierarchical knowledge-base (KB) of procedures based on wikiHow, a website containing more than 110k instructional articles, each documenting the steps to carry out a complex procedure. To this end, we develop a simple and efficient method that links steps (e.g., “purchase a camera”) in an article to other articles with similar goals (e.g., “how to choose a camera”), recursively constructing the KB. Our method significantly outperforms several strong baselines according to automatic evaluation, human judgment, and application to downstream tasks such as instructional video retrieval.