skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Hierarchical Text Classification with Reinforced Label Assignment
While existing hierarchical text classification (HTC) methods attempt to capture label hierarchies for model training, they either make local decisions regarding each label or completely ignore the hierarchy information during inference. To solve the mismatch between training and inference as well as modeling label dependencies in a more principled way, we formulate HTC as a Markov decision process and propose to learn a Label Assignment Policy via deep reinforcement learning to determine where to place an object and when to stop the assignment process. The proposed method, HiLAP, explores the hierarchy during both training and inference time in a consistent manner and makes inter-dependent decisions. As a general framework, HiLAP can incorporate different neural encoders as base models for end-to-end training. Experiments on five public datasets and four base models show that HiLAP yields an average improvement of 33.4% in Macro-F1 over flat classifiers and outperforms state-of-the-art HTC methods by a large margin.  more » « less
Award ID(s):
1704532 1741317 1618481
PAR ID:
10160136
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proc. 2019 Conferenc. on Empirical Methods in Natural Language Processing and the 9th International Joint Conf. on Natural Language Processing, EMNLP-IJCNLP 2019,
Volume:
1
Issue:
1
Page Range / eLocation ID:
445 to 455
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Self-training is a standard approach to semi-supervised learning where the learner's own predictions on unlabeled data are used as supervision during training. In this paper, we reinterpret this label assignment process as an optimal transportation problem between examples and classes, wherein the cost of assigning an example to a class is mediated by the current predictions of the classifier. This formulation facilitates a practical annealing strategy for label assignment and allows for the inclusion of prior knowledge on class proportions via flexible upper bound constraints. The solutions to these assignment problems can be efficiently approximated using Sinkhorn iteration, thus enabling their use in the inner loop of standard stochastic optimization algorithms. We demonstrate the effectiveness of our algorithm on the CIFAR-10, CIFAR-100, and SVHN datasets in comparison with FixMatch, a state-of-the-art self-training algorithm. Our code is publicly available from github. 
    more » « less
  2. Flood mapping on Earth imagery is crucial for disaster management, but its efficacy is hampered by the lack of high-quality training labels. Given high-resolution Earth imagery with coarse and noisy training labels, a base deep neural network model, and a spatial knowledge base with label constraints, our problem is to infer the true high-resolution labels while training neural network parameters. Traditional methods are largely based on specific physical properties and thus fall short of capturing the rich domain constraints expressed by symbolic logic. Neural-symbolic models can capture rich domain knowledge, but existing methods do not address the unique spatial challenges inherent in flood mapping on high-resolution imagery. To fill this gap, we propose a spatial-logic-aware weakly supervised learning framework. Our framework integrates symbolic spatial logic inference into probabilistic learning in a weakly supervised setting. To reduce the time costs of logic inference on vast high-resolution pixels, we propose a multi-resolution spatial reasoning algorithm to infer true labels while training neural network parameters. Evaluations of real-world flood datasets show that our model outperforms several baselines in prediction accuracy. The code is available at https://github.com/spatialdatasciencegroup/SLWSL. 
    more » « less
  3. Large language models (LLMs) require alignment to effectively and safely follow user instructions. This process necessitates training an aligned version for every base model, resulting in significant computational overhead. In this work, we propose NUDGING, a simple, training-free algorithm that aligns any base model at inference time using a small aligned model. NUDGING is motivated by recent findings that alignment primarily alters the model’s behavior on a small subset of stylistic tokens (e.g., discourse markers). We find that base models are significantly more uncertain when generating these tokens. Building on this insight, NUDGING employs a small aligned model to generate nudging tokens to guide the base model’s output during decoding when the base model’s uncertainty is high, with only a minor additional inference overhead. We evaluate NUDGING across 3 model families on a diverse range of open-instruction tasks. Without any training, nudging a large base model with a 7×-14× smaller aligned model achieves zero-shot performance comparable to, and sometimes surpassing, that of large aligned models. By operating at the token level, NUDGING enables off-the-shelf collaboration between model families. For instance, nudging Gemma-2-27b with Llama-27b-chat outperforms Llama-2-70b-chat on various tasks. Overall, our work offers a modular and cost-efficient solution to LLM alignment. Our code and demo are available at: https://fywalter.github.io/nudging/. 
    more » « less
  4. Globerson, A; Mackey, L; Belgrave, D; Fan, A; Paquet, U; Tomczak, J; Zhang, C (Ed.)
    Last-layer retraining methods have emerged as an efficient framework for correcting existing base models. Within this framework, several methods have been proposed to deal with correcting models for subgroup fairness with and without group membership information. Importantly, prior work has demonstrated that many methods are susceptible to noisy labels. To this end, we propose a drop-in correction for label noise in last-layer retraining, and demonstrate that it achieves state-ofthe-art worst-group accuracy for a broad range of symmetric label noise and across a wide variety of datasets exhibiting spurious correlations. Our proposed approach uses label spreading on a latent nearest neighbors graph and has minimal computational overhead compared to existing methods. 
    more » « less
  5. Large-scale machine learning (ML) models are increasingly being used in critical domains like education, lending, recruitment, healthcare, criminal justice, etc. However, the training, deployment, and utilization of these models demand substantial computational resources. To decrease computation and memory costs, machine learning models with sparse weight matrices are widely used in the literature. Among sparse models, those with special sparse structures (e.g., models with block-wise sparse weight matrices) fit better with the hardware accelerators and can decrease the memory and computation costs during the inference. Unfortunately, while there are several efficient training methods, none of them are designed to train a block-wise sparse model efficiently. As a result, the current methods for training block-wise sparse models start with full and dense models leading to inefficient training. In this work, we focus on training models with block-wise sparse matrices and propose an efficient training algorithm to decrease both computation and memory costs during training and inference. In addition, we will show that our proposed method enables us to efficiently find the right block size for the sparsity pattern during the training process. Our extensive empirical and theoretical analyses show that our algorithms can decrease the computation and memory costs significantly without a performance drop compared to baselines. 
    more » « less