Toponym detection in scientific papers is an open task and a key first step in place entity enrichment of documents. We examine three common neural architectures in NLP: 1) convolutional neural network, 2) multi-layer perceptron (both applied in a sliding window context) and 3) bidirectional LSTM and apply contextual and non-contextual word embedding layers to these models. We find that deep contextual word embeddings improve the performance of the bi-LSTM with CRF neural architecture achieving the best performance when multiple layers of deep contextual embeddings are concatenated. Our best performing model achieves an average F1 of 0.910 when evaluated on overlap macro exceeding previous state-of-the-art models in the toponym detection task.
more »
« less
Improving Interpretability via Explicit Word Interaction Graph Layer
Recent NLP literature has seen growing interest in improving model interpretability. Along this direction, we propose a trainable neural network layer that learns a global interaction graph between words and then selects more informative words using the learned word interactions. Our layer, we call WIGRAPH, can plug into any neural network-based NLP text classifiers right after its word embedding layer. Across multiple SOTA NLP models and various NLP datasets, we demonstrate that adding the WIGRAPH layer substantially improves NLP models' interpretability and enhances models' prediction performance at the same time.
more »
« less
- Award ID(s):
- 2124538
- PAR ID:
- 10482171
- Publisher / Repository:
- AAAI
- Date Published:
- Journal Name:
- Proceedings of the AAAI Conference on Artificial Intelligence
- Volume:
- 37
- Issue:
- 11
- ISSN:
- 2159-5399
- Page Range / eLocation ID:
- 13528 to 13537
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We show how the spellings of known words can help us deal with unknown words in open-vocabulary NLP tasks. The method we propose can be used to extend any closedvocabulary generative model, but in this paper we specifically consider the case of neural language modeling. Our Bayesian generative story combines a standard RNN language model (generating the word tokens in each sentence) with an RNNbased spelling model (generating the letters in each word type). These two RNNs respectively capture sentence structure and word structure, and are kept separate as in linguistics. By invoking the second RNN to generate spellings for novel words in context, we obtain an open-vocabulary language model. For known words, embeddings are naturally inferred by combining evidence from type spelling and token context. Comparing to baselines (including a novel strong baseline), we beat previous work and establish state-of-the-art results on multiple datasets.more » « less
-
As we begin to see low-powered computing paradigms (Neuromorphic Computing, Spiking Neural Networks, etc.) becoming more popular, learning binary word embeddings has become increasingly important for supporting NLP applications at the edge. Existing binary word embeddings are mostly derived from pretrained real-valued embeddings through different simple transformations, which often break the semantic consistency and the so-called ``arithmetic'' properties learned by the original, real-valued embeddings. This paper aims to address this limitation by introducing a new approach to learn binary embeddings from scratch, preserving the semantic relationships between words as well as the arithmetic properties of the embeddings themselves. To achieve this, we propose a novel genetic algorithm to learn the relationships between words from existing word analogy data-sets, carefully making sure that the arithmetic properties of the relationships are preserved. Evaluating our generated 16, 32, and 64-bit binary word embeddings on Mikolov's word analogy task shows that more than 95% of the time, the best fit for the analogy is ranked in the top 5 most similar words in terms of cosine similarity."more » « less
-
What are the units of text that we want to model? From bytes to multi-word expressions, text can be analyzed and generated at many granularities. Until recently, most natural language processing (NLP) models operated over words, treating those as discrete and atomic tokens, but starting with byte-pair encoding (BPE), subword-based approaches have become dominant in many areas, enabling small vocabularies while still allowing for fast inference. Is the end of the road character-level model or byte-level processing? In this survey, we connect several lines of work from the pre-neural and neural era, by showing how hybrid approaches of words and characters as well as subword-based approaches based on learned segmentation have been proposed and evaluated. We conclude that there is and likely will never be a silver bullet singular solution for all applications and that thinking seriously about tokenization remains important for many applications.more » « less
-
Social media is the ultimate challenge for many natural language processing tools. The constant emergence of linguistic constructs challenge even the most sophisticated NLP tools. Predicting word embeddings for out of vocabulary words is one of those challenges. Word embedding models only include terms that occur a sufficient number of times in their training corpora. Word embedding vector models are unable to directly provide any useful information about a word not in their vocabularies. We propose a fast method for predicting vectors for out of vocabulary terms that makes use of the surrounding terms of the unknown term and the hidden context layer of the word2vec model. We propose this method as a strong baseline in the sense that 1) while it does not surpass all state-of-the-art methods, it surpasses several techniques for vector prediction on benchmark tasks, 2) even when it underperforms, the margin is very small retaining competitive performance in downstream tasks, and 3) it is inexpensive to compute, requiring no additional training stage. We also show that our technique can be incorporated into existing methods to achieve a new state-of-the-art on the word vector prediction problem.more » « less
An official website of the United States government

