skip to main content

This content will become publicly available on June 1, 2023

Title: Do Transformer Networks Improve the Discovery of Inference Rules from Text?
With their Discovery of Inference Rules from Text (DIRT) algorithm, Lin and Pantel (2001) made a seminal contribution to the field of rule acquisition from text, by adapting the distributional hypothesis of Harris (1954) to patterns that model binary relations such as X treat Y, where patterns are implemented as syntactic dependency paths. DIRT’s relevance is renewed in today’s neural era given the recent focus on interpretability in the field of natural language processing. We propose a novel take on the DIRT algorithm, where we implement the distributional hypothesis using the contextualized embeddings provided by BERT, a transformer-network-based language model (Vaswani et al., 2017; Devlin et al., 2018). In particular, we change the similarity measure between pairs of slots (i.e., the set of words matched by a pattern) from the original formula that relies on lexical items to a formula computed using contextualized embeddings. We empirically demonstrate that this new similarity method yields a better implementation of the distributional hypothesis, and this, in turn, yields patterns that outperform the original algorithm in the question answering-based evaluation proposed by Lin and Pantel (2001).
Award ID(s):
Publication Date:
Journal Name:
LREC proceedings
Sponsoring Org:
National Science Foundation
More Like this
  1. Computational models of verbal analogy and relational similarity judgments can employ different types of vector representations of word meanings (embeddings) generated by machine-learning algorithms. An important question is whether human-like relational processing depends on explicit representations of relations (i.e., representations separable from those of the concepts being related), or whether implicit relation representations suffice. Earlier machine-learning models produced static embeddings for individual words, identical across all contexts. However, more recent Large Language Models (LLMs), which use transformer architectures applied to much larger training corpora, are able to produce contextualized embeddings that have the potential to capture implicit knowledge of semantic relations. Here we compare multiple models based on different types of embeddings to human data concerning judgments of relational similarity and solutions of verbal analogy problems. For two datasets, a model that learns explicit representations of relations, Bayesian Analogy with Relational Transformations (BART), captured human performance more successfully than either a model using static embeddings (Word2vec) or models using contextualized embeddings created by LLMs (BERT, RoBERTa, and GPT-2). These findings support the proposal that human thinking depends on representations that separate relations from the concepts they relate.
  2. Abstract
    Excessive phosphorus (P) applications to croplands can contribute to eutrophication of surface waters through surface runoff and subsurface (leaching) losses. We analyzed leaching losses of total dissolved P (TDP) from no-till corn, hybrid poplar (Populus nigra X P. maximowiczii), switchgrass (Panicum virgatum), miscanthus (Miscanthus giganteus), native grasses, and restored prairie, all planted in 2008 on former cropland in Michigan, USA. All crops except corn (13 kg P ha−1 year−1) were grown without P fertilization. Biomass was harvested at the end of each growing season except for poplar. Soil water at 1.2 m depth was sampled weekly to biweekly for TDP determination during March–November 2009–2016 using tension lysimeters. Soil test P (0–25 cm depth) was measured every autumn. Soil water TDP concentrations were usually below levels where eutrophication of surface waters is frequently observed (> 0.02 mg L−1) but often higher than in deep groundwater or nearby streams and lakes. Rates of P leaching, estimated from measured concentrations and modeled drainage, did not differ statistically among cropping systems across years; 7-year cropping system means ranged from 0.035 to 0.072 kg P ha−1 year−1 with large interannual variation. Leached P was positively related to STP, which decreased over the 7 years in all systems. These results indicate that both P-fertilized and unfertilized cropping systems mayMore>>
  3. Abstract Two new schemes for identifying field lines involved in eruptions, the r -scheme and q -scheme, are proposed to analyze the eruptive and confined nature of solar flares, as extensions to the original r m scheme proposed in Lin et al. Motivated by three solar flares originating from NOAA Active Region 12192 that are misclassified by r m , we introduce refinements to the r -scheme employing the “magnetic twist flux” to approximate the force balance acting on a magnetic flux rope (MFR); in the q -scheme, the reconnected field is represented by those field lines that anchor in the flare ribbons. Based on data obtained by the Solar Dynamics Observatory/Helioseismic and Magnetic Imager, the coronal magnetic field for 51 flares larger than M5.0 class, from 29 distinct active regions, is constructed using a nonlinear force-free field extrapolation model. Statistical analysis based on linear discriminant function analysis is then performed, revealing that despite both schemes providing moderately successful classifications for the 51 flares, the coronal mass ejection-eruptivity classification for the three target events can only be improved with the q -scheme. We find that the highly twisted field lines and the flare-ribbon field lines have equal average force-free constantmore »α , but all of the flare-ribbon-related field lines are shorter than 150 Mm in length. The findings lead us to conclude that it is challenging to distinguish the MFR from the ambient magnetic field using any quantity based on common magnetic nonpotentiality measures.« less
  4. In standard methodology for natural language processing, entities in text are typically embedded in dense vector spaces with pre-trained models. The embeddings produced this way are effective when fed into downstream models, but they require end-task fine-tuning and are fundamentally difficult to interpret. In this paper, we present an approach to creating entity representations that are human readable and achieve high performance on entity-related tasks out of the box. Our representations are vectors whose values correspond to posterior probabilities over fine-grained entity types, indicating the confidence of a typing model’s decision that the entity belongs to the corresponding type. We obtain these representations using a fine-grained entity typing model, trained either on supervised ultra-fine entity typing data (Choi et al., 2018) or distantly-supervised examples from Wikipedia. On entity probing tasks involving recognizing entity identity, our embeddings used in parameter-free downstream models achieve competitive performance with ELMo- and BERT-based embeddings in trained models. We also show that it is possible to reduce the size of our type set in a learning-based way for particular domains. Finally, we show that these embeddings can be post-hoc modified through a small number of rules to incorporate domain knowledge and improve performance.
  5. While a reinvigoration of ocean circulation and CO 2 marine geologic carbon release over the last 20,000 years. Much of this evidence points to outgassing is the leading explanation for atmospheric CO rise since the Last Glacial Maximum (LGM), there is also evidence of regions of the mid-depth Pacific Ocean, where multiple radiocarbon (1 4 C) records show anomalously low 14 C/C values, potentially caused by the addition of carbon [1,2]. To better constrain this geologic carbon release hypothesis, we aim to place 14 C-free geologic an upper bound limit on the amount of carbon that may have been added, in addition to the geochemical pathway of that carbon. To do so, we numerical invert a carbon cycle model based on observational atmospheric CO 2 and 14 C records. Given these observational constraints, we use data assimilation techniques and an optimization algorithm to calculate the rate of carbon addition and its alkalinity-to-carbon ratio (R ) over the last 20,000 A/C years. Using the modeled planetary radiocarbon budget calculated in Hain et al. [3], we find observations allow for only ~300 Pg of carbon to be added, as a majority of the deglacial atmospheric 14 C decline is already explained bymore »magnetic field strength changes and ocean circulation changes [3]. However, when we adjust the initial state of the model by increasing C by 75‰ to match the observational C records, we find that observations 14 14 allow for ~3500 Pg of carbon addition with an average R of ~1.4. A/C These results allow for the possibility of a large release of 14C-free geologic carbon, which could provide local and regional 14C anomalies, as the records have in the Pacific [1,2]. As this geological carbon was added with a RA/C of ~1.4, these results also imply that 14C evidence for significant geologic carbon release since the LGM may not be taken as contributing to deglacial CO2 rise, unless there is evidence for significant local acidification and corrosion of seafloor sediments. If the geologic carbon cycle is indeed more dynamic than previously thought, we may also need to rethink the approach to estimate the land/ocean carbon repartitioning from the deglacial stable carbon isotope budget. [1] Rafter et al. (2019), GRL 46(23), 13950–13960. [2] Ronge et al. (2016), Nature Communications 7(1), 11487. [3] Hain et al. (2014), EPSL 394, 198–208.« less