ABSTRACT The citation of scientific papers is considered a simple and direct indicator of papers' impact. This paper predicts papers' citations through team‐related variables, team composition, and team structure. Team composition includes team size, male/female dominance, academia/industry collaboration, unique race number, and unique country number. Team structures are made up of team power level and team power hierarchy. Team members' previous citation number, H‐index, previous collaborators, career age, and previous paper numbers are a proxy of team power. We calculated the mean value and Gini coefficient to represent team power level (the collective team capability) and team power hierarchy (the vertical difference of power distribution within a team). Taking 1,675,035 CS teams in the DBLP dataset, we trained the XGBoost model to predict high/low citation. Our model has reached 0.71 in AUC and 70.45% in accuracy rate. Utilizing Explainable AI method SHAP to evaluate features' relative importance in predicting team citation categories, we found that team structure plays a more critical role than team composition in predicting team citation. High team power level, flat team power structure, diverse race background, large team, collaboration with industry, and male‐dominated teams can bring higher team citations. Our project can provide insights into how to form the best scientific teams and maximize team impact from team composition and team structure.
more »
« less
The Disruption Index measures displacement between a paper and its most-cited reference
Abstract The Disruption Index (D-index) provides the first quantitative framework for identifying breakthroughs in science and technology. As its use expands, questions have emerged about its meaning, strengths, and limitations. Because the D-index measures how a focal paper competes with its references for citation attention, some worry that it is distorted by historical changes in citation practices. For example, if papers cite more references over time—a trend known as “citation inflation”—then newer papers might appear less disruptive even when equally inventive. We show that this concern is unfounded. Citation counts follow a long-tailed distribution, meaning competition is overwhelmingly shaped by the focal paper and its most-cited reference, while other references are negligible. Thus, the D-index captures whether a paper overturns a dominant idea in its field. The metric is fundamentally relational: It measures competition with predecessors rather than innovation in a vacuum. From this perspective, breakthroughs arise not only from generating novel ideas but also from replacing established ones—much like light bulbs replacing candles. We support this interpretation with mathematical analysis and large-scale bibliometric evidence.
more »
« less
- Award ID(s):
- 2239418
- PAR ID:
- 10668235
- Publisher / Repository:
- MIT Press
- Date Published:
- Journal Name:
- Quantitative Science Studies
- ISSN:
- 2641-3337
- Page Range / eLocation ID:
- 1 to 11
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract ChatGPT has arrived in quantitative research evaluation. With the exploration in this Letter to the Editor, we would like to widen the spectrum of the possible use of ChatGPT in bibliometrics by applying it to identify disruptive papers. The identification of disruptive papers using publication and citation counts has become a popular topic in scientometrics. The disadvantage of the quantitative approach is its complexity in the computation. The use of ChatGPT might be an easy to use alternative.more » « less
-
Searching for relevant literature is a fundamental part of academic research. The search for relevant literature is becoming a more difficult and time-consuming task as millions of articles are published each year. As a solution, recommendation systems for academic papers attempt to help researchers find relevant papers quickly. This paper focuses on graph-based recommendation systems for academic papers using citation networks. This type of paper recommendation system leverages a graph of papers linked by citations to create a list of relevant papers. In this study, we explore recommendation systems for academic papers using citation networks incorporating citation relations. We define citation relation based on the number of times the origin paper cites the reference paper, and use this citation relation to measure the strength of the relation between the papers. We created a weighted network using citation relation as citation weight on edges. We evaluate our proposed method on a real-world publication data set, and conduct an extensive comparison with three state-of-the-art baseline methods. Our results show that citation network-based recommendation systems using citation weights perform better than the current methods.more » « less
-
Abstract Whether citations objectively and reliably reflect the quality of articles and researchers is questionable. Even so, citation counts are widely used to estimate the productivity of researchers and institutions, which creates a ‘grubby’ motivation to be well-cited. We examine this motivation using a generative model of citation that is agent-based. In this model, new nodes are added to an existing citation network. These new nodes act as autonomous agents that cite other nodes based on a composite bias for preferential attachment, recency, fitness (epistemic quality), and community structure. We use the model to ask whether strategic citation behaviors can support an interest in being well-cited. Results from this model suggest that while fitness is influential, the number of references and community effects are also influential in attracting citations. These results raise questions about similar effects in the real world.more » « less
-
A standard measure of the influence of a research paper is the number of times it is cited. However, papers may be cited for many reasons, and citation count offers limited information about the extent to which a paper affected the content of subsequent publications. We therefore propose a novel method to quantify linguistic influence in timestamped document collections. There are two main steps: first, identify lexical and semantic changes using contextual embeddings and word frequencies; second, aggregate information about these changes into per-document influence scores by estimating a high-dimensional Hawkes process with a low-rank parameter matrix. We show that this measure of linguistic influence is predictive of future citations: the estimate of linguistic influence from the two years after a paper’s publication is correlated with and predictive of its citation count in the following three years. This is demonstrated using an online evaluation with incremental temporal training/test splits, in comparison with a strong baseline that includes predictors for initial citation counts, topics, and lexical features.more » « less
An official website of the United States government

