skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Contrastive Bootstrapping for Label Refinement
Traditional text classification typically categorizes texts into pre-defined coarse-grained classes, from which the produced models cannot handle the real-world scenario where finer categories emerge periodically for accurate services. In this work, we investigate the setting where fine-grained classification is done only using the annotation of coarse-grained categories and the coarse-to-fine mapping. We propose a lightweight contrastive clustering-based bootstrapping method to iteratively refine the labels of passages. During clustering, it pulls away negative passage-prototype pairs under the guidance of the mapping from both global and local perspectives. Experiments on NYT and 20News show that our method outperforms the state-of-the-art methods by a large margin.  more » « less
Award ID(s):
2105329
PAR ID:
10440669
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics
Page Range / eLocation ID:
976 to 985
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We address the challenge of representativity and dynamical consistency when un- bonded fine-grained particles are collected together into coarse-grained particles. We implement a hybrid procedure for identifying and tracking the underlying fine-grained particles—e.g., atoms or molecules—by exchanging them between the coarse-grained particles periodically at a characteristic time. The exchange involves a back-mapping of the coarse-grained particles into fine-grained particles, and a subsequent reassign- ment to coarse-grained particles conserving total mass and momentum. We find that an appropriate choice of the characteristic exchange time can lead to the correct effec- tive diffusion rate of the fine-grained particles when simulated in hybrid coarse-grained dynamics. In the compressed (supercritical) fluid regime, without the exchange term, fine-grained particles remain associated to a given coarse-grained particle, leading to substantially lower diffusion rates than seen in all-atom molecular dynamics of the fine- grained particles. Thus, this work confirms the need for addressing the representativity of fine-grained particles within coarse-grained particles, and offers a simple exchange mechanism so as to retain dynamical consistency between the fine- and coarse- grained scales. 
    more » « less
  2. Xue, Nianwen; Croft, William; Hajic, Jan; Huang, Chu-Ren; Oepen, Stephan; Palmer, Martha; Pustejovsky, James (Ed.)
    Developers of cross-lingual semantic annotation schemes face a number of issues not encountered in monolingual annotation. This paper discusses four such issues, related to the establishment of annotation labels, and the treatment of languages with more fine-grained, more coarse-grained, and cross-cutting categories. We propose that a lattice-like architecture of the annotation categories can adequately handle all four issues, and at the same time remain both intuitive for annotators and faithful to typological insights. This position is supported by a brief annotation experiment. 
    more » « less
  3. null (Ed.)
    The selection of coarse-grained (CG) mapping operators is a critical step for CG molecular dynamics (MD) simulation. It is still an open question about what is optimal for this choice and there is a need for theory. The current state-of-the art method is mapping operators manually selected by experts. In this work, we demonstrate an automated approach by viewing this problem as supervised learning where we seek to reproduce the mapping operators produced by experts. We present a graph neural network based CG mapping predictor called Deep Supervised Graph Partitioning Model (DSGPM) that treats mapping operators as a graph segmentation problem. DSGPM is trained on a novel dataset, Human-annotated Mappings (HAM), consisting of 1180 molecules with expert annotated mapping operators. HAM can be used to facilitate further research in this area. Our model uses a novel metric learning objective to produce high-quality atomic features that are used in spectral clustering. The results show that the DSGPM outperforms state-of-the-art methods in the field of graph segmentation. Finally, we find that predicted CG mapping operators indeed result in good CG MD models when used in simulation. 
    more » « less
  4. null (Ed.)
    Intelligent thought is the product of efficient neural information processing, which is embedded in fine-grained, topographically organized population responses and supported by fine-grained patterns of connectivity among cortical fields. Previous work on the neural basis of intelligence, however, has focused on coarse-grained features of brain anatomy and function because cortical topographies are highly idiosyncratic at a finer scale, obscuring individual differences in fine-grained connectivity patterns. We used a computational algorithm, hyperalignment, to resolve these topographic idiosyncrasies and found that predictions of general intelligence based on fine-grained (vertex-by-vertex) connectivity patterns were markedly stronger than predictions based on coarse-grained (region-by-region) patterns. Intelligence was best predicted by fine-grained connectivity in the default and frontoparietal cortical systems, both of which are associated with self-generated thought. Previous work overlooked fine-grained architecture because existing methods could not resolve idiosyncratic topographies, preventing investigation where the keys to the neural basis of intelligence are more likely to be found. 
    more » « less
  5. Abstract In many categorical response regression applications, the response categories admit a multiresolution structure. That is, subsets of the response categories may naturally be combined into coarser response categories. In such applications, practitioners are often interested in estimating the resolution at which a predictor affects the response category probabilities. In this paper, we propose a method for fitting the multinomial logistic regression model in high dimensions that addresses this problem in a unified and data-driven way. Our method allows practitioners to identify which predictors distinguish between coarse categories but not fine categories, which predictors distinguish between fine categories, and which predictors are irrelevant. For model fitting, we propose a scalable algorithm that can be applied when the coarse categories are defined by either overlapping or nonoverlapping sets of fine categories. Statistical properties of our method reveal that it can take advantage of this multiresolution structure in a way existing estimators cannot. We use our method to model cell-type probabilities as a function of a cell's gene expression profile (i.e., cell-type annotation). Our fitted model provides novel biological insights which may be useful for future automated and manual cell-type annotation methodology. 
    more » « less