- Award ID(s):
- 2134795
- PAR ID:
- 10479694
- Publisher / Repository:
- ACS Publications
- Date Published:
- Journal Name:
- Journal of Chemical Information and Modeling
- Volume:
- 63
- Issue:
- 21
- ISSN:
- 1549-9596
- Page Range / eLocation ID:
- 6555 to 6568
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Query understanding plays a key role in exploring users’ search intents. However, it is inherently challenging since it needs to capture semantic information from short and ambiguous queries and often requires massive task-specific labeled data. In recent years, pre-trained language models (PLMs) have advanced various natural language processing tasks because they can extract general semantic information from large-scale corpora. However, directly applying them to query understanding is sub-optimal because existing strategies rarely consider to boost the search performance. On the other hand, search logs contain user clicks between queries and urls that provide rich users’ search behavioral information on queries beyond their content. Therefore, in this paper, we aim to fill this gap by exploring search logs. In particular, we propose a novel graph-enhanced pre-training framework, GE-BERT, which leverages both query content and the query graph to capture both semantic information and users’ search behavioral information of queries. Extensive experiments on offline and online tasks have demonstrated the effectiveness of the proposed framework.more » « less
-
Defining the similarity between chemical entities is an essential task in polymer informatics, enabling ranking, clustering, and classification. Despite its importance, the pairwise chemical similarity of polymers remains an open problem. Here, a similarity function for polymers with well-defined backbones is designed based on polymers’ stochastic graph representations generated from canonical BigSMILES, a structurally based line notation for describing macromolecules. The stochastic graph representations are separated into three parts: repeat units, end groups, and polymer topology. The earth mover’s distance is utilized to calculate the similarity of the repeat units and end groups, while the graph edit distance is used to calculate the similarity of the topology. These three values can be linearly or nonlinearly combined to yield an overall pairwise chemical similarity score for polymers that is largely consistent with the chemical intuition of expert users and is adjustable based on the relative importance of different chemical features for a given similarity problem. This method gives a reliable solution to quantitatively calculate the pairwise chemical similarity score for polymers and represents a vital step toward building search engines and quantitative design tools for polymer data.more » « less
-
Given a database of vectors, a cosine threshold query returns all vectors in the database having cosine similarity to a query vector above a given threshold {\theta}. These queries arise naturally in many applications, such as document retrieval, image search, and mass spectrometry. The present paper considers the efficient evaluation of such queries, providing novel optimality guarantees and exhibiting good performance on real datasets. We take as a starting point Fagin's well-known Threshold Algorithm (TA), which can be used to answer cosine threshold queries as follows: an inverted index is first built from the database vectors during pre-processing; at query time, the algorithm traverses the index partially to gather a set of candidate vectors to be later verified for {\theta}-similarity. However, directly applying TA in its raw form misses significant optimization opportunities. Indeed, we first show that one can take advantage of the fact that the vectors can be assumed to be normalized, to obtain an improved, tight stopping condition for index traversal and to efficiently compute it incrementally. Then we show that one can take advantage of data skewness to obtain better traversal strategies. In particular, we show a novel traversal strategy that exploits a common data skewness condition which holds in multiple domains including mass spectrometry, documents, and image databases. We show that under the skewness assumption, the new traversal strategy has a strong, near-optimal performance guarantee. The techniques developed in the paper are quite general since they can be applied to a large class of similarity functions beyond cosine.more » « less
-
Abstract Topochemical polymerizations hold the promise of producing high molecular weight and stereoregular single crystalline polymers by first aligning monomers before polymerization. However, monomer modifications often alter the crystal packing and result in non‐reactive polymorphs. Here, we report a systematic study on the side chain functionalization of the bis(indandione) derivative system that can be polymerized under visible light. Precisely engineered side chains help organize the monomer crystals in a one‐dimensional fashion to maintain the topochemical reactivity. By optimizing the side chain length and end group of monomers, the elastic modulus of the resulting polymer single crystals can also be greatly enhanced. Lastly, using ultrasonication, insoluble polymer single crystals can be processed into free‐standing and robust polymer thin films. This work provides new insights on the molecular design of topochemical reactions and paves the way for future applications of this fascinating family of materials.
-
Abstract Topochemical polymerizations hold the promise of producing high molecular weight and stereoregular single crystalline polymers by first aligning monomers before polymerization. However, monomer modifications often alter the crystal packing and result in non‐reactive polymorphs. Here, we report a systematic study on the side chain functionalization of the bis(indandione) derivative system that can be polymerized under visible light. Precisely engineered side chains help organize the monomer crystals in a one‐dimensional fashion to maintain the topochemical reactivity. By optimizing the side chain length and end group of monomers, the elastic modulus of the resulting polymer single crystals can also be greatly enhanced. Lastly, using ultrasonication, insoluble polymer single crystals can be processed into free‐standing and robust polymer thin films. This work provides new insights on the molecular design of topochemical reactions and paves the way for future applications of this fascinating family of materials.