Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Much recent work in NLP has documented dataset artifacts, bias, and spurious correlations between input features and output labels. However, how to tell which features have “spurious” instead of legitimate correlations is typically left unspecified. In this work we argue that for complex language understanding tasks, all simple feature correlations are spurious, and we formalize this notion into a class of problems which we call competency problems. For example, the word “amazing” on its own should not give information about a sentiment label independent of the context in which it appears, which could include negation, metaphor, sarcasm, etc. We theoretically analyze the difficulty of creating data for competency problems when human bias is taken into account, showing that realistic datasets will increasingly deviate from competency problems as dataset size increases. This analysis gives us a simple statistical test for dataset artifacts, which we use to show more subtle biases than were described in prior work, including demonstrating that models are inappropriately affected by these less extreme biases. Our theoretical treatment of this problem also allows us to analyze proposed solutions, such as making local edits to dataset instances, and to give recommendations for future data collection and model design efforts that target competency problems.more » « less
-
Modeling human language requires the ability to not only generate fluent text but also encode factual knowledge. However, traditional language models are only capable of remembering facts seen at training time, and often have difficulty recalling them. To address this, we introduce the knowledge graph language model (KGLM), a neural language model with mechanisms for selecting and copying facts from a knowledge graph that are relevant to the context. These mechanisms enable the model to render information it has never seen before, as well as generate out-of-vocabulary tokens. We also introduce the Linked WikiText-2 dataset, a corpus of annotated text aligned to the Wikidata knowledge graph whose contents (roughly) match the popular WikiText-2 benchmark. In experiments, we demonstrate that the KGLM achieves significantly better performance than a strong baseline language model. We additionally compare different language model’s ability to complete sentences requiring factual knowledge, showing that the KGLM outperforms even very large language models in generating facts.more » « less
-
Contextual word representations, typically trained on unstructured, unlabeled text, do not contain any explicit grounding to real world entities and are often unable to remember facts about those entities. We propose a general method to embed multiple knowledge bases (KBs) into large scale models, and thereby enhance their representations with structured, human-curated knowledge. For each KB, we first use an integrated entity linker to retrieve relevant entity embeddings, then update contextual word representations via a form of word-to-entity attention. In contrast to previous approaches, the entity linkers and self-supervised language modeling objective are jointly trained end-to-end in a multitask setting that combines a small amount of entity linking supervision with a large amount of raw text. After integrating WordNet and a subset of Wikipedia into BERT, the knowledge enhanced BERT (KnowBert) demonstrates improved perplexity, ability to recall facts as measured in a probing task and downstream performance on relationship extraction, entity typing, and word sense disambiguation. KnowBert’s runtime is comparable to BERT’s and it scales to large KBs.more » « less
-
Abstract Bismuth telluride is the working material for most Peltier cooling devices and thermoelectric generators. This is because Bi2Te3(or more precisely its alloys with Sb2Te3for p‐type and Bi2Se3for n‐type material) has the highest thermoelectric figure of merit,
zT , of any material around room temperature. Since thermoelectric technology will be greatly enhanced by improving Bi2Te3or finding a superior material, this review aims to identify and quantify the key material properties that make Bi2Te3such a good thermoelectric. The largezT can be traced to the high band degeneracy, low effective mass, high carrier mobility, and relatively low lattice thermal conductivity, which all contribute to its remarkably high thermoelectric quality factor. Using literature data augmented with newer results, these material parameters are quantified, giving clear insight into the tailoring of the electronic band structure of Bi2Te3by alloying, or reducing thermal conductivity by nanostructuring. For example, this analysis clearly shows that the minority carrier excitation across the small bandgap significantly limits the thermoelectric performance of Bi2Te3, even at room temperature, showing that larger bandgap alloys are needed for higher temperature operation. Such effective material parameters can also be used for benchmarking future improvements in Bi2Te3or new replacement materials. -
Abstract We synthesize insights from current understanding of drought impacts at stand‐to‐biogeographic scales, including management options, and we identify challenges to be addressed with new research. Large stand‐level shifts underway in western forests already are showing the importance of interactions involving drought, insects, and fire. Diebacks, changes in composition and structure, and shifting range limits are widely observed. In the eastern
US , the effects of increasing drought are becoming better understood at the level of individual trees, but this knowledge cannot yet be confidently translated to predictions of changing structure and diversity of forest stands. While eastern forests have not experienced the types of changes seen in western forests in recent decades, they too are vulnerable to drought and could experience significant changes with increased severity, frequency, or duration in drought. Throughout the continental United States, the combination of projected large climate‐induced shifts in suitable habitat from modeling studies and limited potential for the rapid migration of tree populations suggests that changing tree and forest biogeography could substantially lag habitat shifts already underway. Forest management practices can partially ameliorate drought impacts through reductions in stand density, selection of drought‐tolerant species and genotypes, artificial regeneration, and the development of multistructured stands. However, silvicultural treatments also could exacerbate drought impacts unless implemented with careful attention to site and stand characteristics. Gaps in our understanding should motivate new research on the effects of interactions involving climate and other species at the stand scale and how interactions and multiple responses are represented in models. This assessment indicates that, without a stronger empirical basis for drought impacts at the stand scale, more complex models may provide limited guidance.