skip to main content

Title: “Measuring the Evolution of a Scientific Field through Citation Frames.” Transactions of the Association for Computational Linguistics (TACL).
Citations have long been used to characterize the state of a scientific field and to identify influential works. However, writers use citations for different purposes, and this varied purpose influences uptake by future scholars. Unfortunately, our understanding of how scholars use and frame citations has been limited to small-scale manual citation analysis of individual papers. We perform the largest behavioral study of citations to date, analyzing how scientific works frame their contributions through different types of citations and how this framing affects the field as a whole. We introduce a new dataset of nearly 2,000 citations annotated for their function, and use it to develop a state-of-the-art classifier and label the papers of an entire field: Natural Language Processing. We then show how differences in framing affect scientific uptake and reveal the evolution of the publication venues and the field as a whole. We demonstrate that authors are sensitive to discourse structure and publication venue when citing, and that how a paper frames its work through citations is predictive of the citation count it will receive. Finally, we use changes in citation framing to show that the field of NLP is undergoing a significant increase in consensus.
Award ID(s):
Publication Date:
Journal Name:
Sponsoring Org:
National Science Foundation
More Like this
  1. Citations of scientific papers and patents reveal the knowledge flow and usually serve as the metric for evaluating their novelty and impacts in the field. Citation Forecasting thus has various applications in the real world. Existing works on citation forecasting typically exploit the sequential properties of citation events, without exploring the citation network. In this paper, we propose to explore both the citation network and the related citation event sequences which provide valuable information for future citation forecasting. We propose a novel Citation Network and Event Sequence (CINES) Model to encode signals in the citation network and related citation event sequences into various types of embeddings for decoding to the arrivals of future citations. Moreover, we propose a temporal network attention and three alternative designs of bidirectional feature propagation to aggregate the retrospective and prospective aspects of publications in the citation network, coupled with the citation event sequence embeddings learned by a two-level attention mechanism for the citation forecasting. We evaluate our models and baselines on both a U.S. patent dataset and a DBLP dataset. Experimental results show that our models outperform the state-of-the-art methods, i.e., RMTPP, CYAN-RNN, Intensity-RNN, and PC-RNN, reducing the forecasting error by 37.76% - 75.32%.
  2. Sutherland, Mary Elizabeth (Ed.)
    Theories of scientific and technological change view discovery and invention as endogenous processes1,2, wherein prior accumulated knowledge enables future progress by allowing researchers to, in Newton’s words, “stand on the shoulders of giants”3–7. Recent decades have witnessed exponential growth in the volume of new scientific and technological knowledge, thereby creating conditions that should be ripe for major advances8,9. Yet contrary to this view, studies suggest that progress is slowing in several major fields10,11. Here, we analyze these claims at scale across 6 decades, using data on 45 million papers and 3.9 million patents from 6 large-scale datasets, together with a novel quantitative metric—the CD index12—that characterizes how papers and patents change networks of citations in science and technology. We find that papers and patents are increasingly less likely to break with the past in ways that push science and technology in new directions. This pattern holds universally across fields and is robust across multiple different citation- and text-based metrics. Subsequently, we link this decline in disruptiveness to a narrowing in the use of prior knowledge, allowing us to reconcile the patterns we observe with the “shoulders of giants” view. We find that the observed declines are unlikely to be drivenmore »by changes in the quality of published science, citation practices, or field-specific factors. Overall, our results suggest that slowing rates of disruption may reflect a fundamental shift in the nature of science and technology.« less
  3. Accurate prediction of scientific impact is important for scientists, academic recommender systems, and granting organizations alike. Existing approaches rely on many years of leading citation values to predict a scientific paper’s citations (a proxy for impact), even though most papers make their largest contributions in the first few years after they are published. In this paper, we tackle a new problem: predicting a new paper’s citation time series from the date of publication (i.e., without leading values). We propose HINTS, a novel end-to-end deep learning framework that converts citation signals from dynamic heterogeneous information networks (DHIN) into citation time series. HINTS imputes pseudo-leading values for a paper in the years before it is published from DHIN embeddings, and then transforms these embeddings into the parameters of a formal model that can predict citation counts immediately after publication. Empirical analysis on two real-world datasets from Computer Science and Physics show that HINTS is competitive with baseline citation prediction models. While we focus on citations, our approach generalizes to other “cold start” time series prediction tasks where relational data is available and accurate prediction in early timestamps is crucial.
  4. Searching for relevant literature is a fundamental part of academic research. The search for relevant literature is becoming a more difficult and time-consuming task as millions of articles are published each year. As a solution, recommendation systems for academic papers attempt to help researchers find relevant papers quickly. This paper focuses on graph-based recommendation systems for academic papers using citation networks. This type of paper recommendation system leverages a graph of papers linked by citations to create a list of relevant papers. In this study, we explore recommendation systems for academic papers using citation networks incorporating citation relations. We define citation relation based on the number of times the origin paper cites the reference paper, and use this citation relation to measure the strength of the relation between the papers. We created a weighted network using citation relation as citation weight on edges. We evaluate our proposed method on a real-world publication data set, and conduct an extensive comparison with three state-of-the-art baseline methods. Our results show that citation network-based recommendation systems using citation weights perform better than the current methods.
  5. Communication of scientific findings is fundamental to scholarly discourse. In this article, we show that academic review articles, a quintessential form of interpretive scholarly output, perform curatorial work that substantially transforms the research communities they aim to summarize. Using a corpus of millions of journal articles, we analyze the consequences of review articles for the publications they cite, focusing on citation and co-citation as indicators of scholarly attention. Our analysis shows that, on the one hand, papers cited by formal review articles generally experience a dramatic loss in future citations. Typically, the review gets cited instead of the specific articles mentioned in the review. On the other hand, reviews curate, synthesize, and simplify the literature concerning a research topic. Most reviews identify distinct clusters of work and highlight exemplary bridges that integrate the topic as a whole. These bridging works, in addition to the review, become a shorthand characterization of the topic going forward and receive disproportionate attention. In this manner, formal reviews perform creative destruction so as to render increasingly expansive and redundant bodies of knowledge distinct and comprehensible.