skip to main content


Search for: All records

Creators/Authors contains: "Li, Xian"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Product catalogs, conceptually in the form of text-rich tables, are self-reported by individual retailers and thus inevitably contain noisy facts. Verifying such textual attributes in product catalogs is essential to improve their reliability. However, popular methods for processing free-text content, such as pre-trained language models, are not particularly effective on structured tabular data since they are typically trained on free-form natural language texts. In this paper, we present Tab-Cleaner, a model designed to handle error detection over text-rich tabular data following a pre-training / fine-tuning paradigm. We train Tab-Cleaner on a real-world Amazon Product Catalog table w.r.t millions of products and show improvements over state-of-the-art methods by 16% on PR AUC over attribute applicability classification task and by 11% on PR AUC over attribute value validation task. 
    more » « less
  2. Knowledge graph embeddings (KGE) have been extensively studied to embed large-scale relational data for many real-world applications. Existing methods have long ignored the fact many KGs contain two fundamentally different views: high-level ontology-view concepts and fine-grained instance-view entities. They usually embed all nodes as vectors in one latent space. However, a single geometric representation fails to capture the structural differences between two views and lacks probabilistic semantics towards concepts’ granularity. We propose Concept2Box, a novel approach that jointly embeds the two views of a KG using dual geometric representations. We model concepts with box embeddings, which learn the hierarchy structure and complex relations such as overlap and disjoint among them. Box volumes can be interpreted as concepts’ granularity. Different from concepts, we model entities as vectors. To bridge the gap between concept box embeddings and entity vector embeddings, we propose a novel vector-to-box distance metric and learn both embeddings jointly. Experiments on both the public DBpedia KG and a newly-created industrial KG showed the effectiveness of Concept2Box. 
    more » « less
  3. Because of their central importance in chemistry and biology, water molecules have been the subject of decades of intense spectroscopic investigations. Rotational spectroscopy of water vapor has yielded detailed information about the structure and dynamics of isolated water molecules, as well as water dimers and clusters. Nonlinear rotational spectroscopy in the terahertz regime has been developed recently to investigate the rotational dynamics of linear and symmetric-top molecules whose rotational energy levels are regularly spaced. However, it has not been applied to water or other lower-symmetry molecules with irregularly spaced levels. We report the use of recently developed two-dimensional (2D) terahertz rotational spectroscopy to observe high-order rotational coherences and correlations between rotational transitions that were previously unobservable. The results include two-quantum (2Q) peaks at frequencies that are shifted slightly from the sums of distinct rotational transitions on two different molecules. These results directly reveal the presence of previously unseen metastable water complexes with lifetimes of 100 ps or longer. Several such peaks observed at distinct 2Q frequencies indicate that the complexes have multiple preferred bimolecular geometries. Our results demonstrate the sensitivity of rotational correlations measured in 2D terahertz spectroscopy to molecular interactions and complexation in the gas phase.

     
    more » « less
  4. null (Ed.)