skip to main content


Title: Is Self-Citation Biased? An Investigation via the Lens of Citation Polarity, Density, and Location
Traditional citation analysis methods have been criticized because their theoretical base of statistical counts does not reflect the motive or judgment of citing authors. In particular, self-citations may give undue credits to a cited article or mislead scientific development. This research aims to answer the question of whether self-citation is biased by probing into the motives and context of citations. It takes an integrated and fine-grained view of self-citations by examining them via multiple lenses—polarity, density, and location of citations. In addition, it explores potential moderating effects of citation level and associations among location contexts of citations to the same references for the first time. We analyzed academic publications across different topics and disciplines using both qualitative and quantitative methods. The results provide evidence that self-citations are free of bias in terms of citation density and polarity uncertainty, but they can be biased with respect to positivity and negativity of citations. Furthermore, this study reveals impacts of self-citing behavior on some citation patterns involving citation density, location concentration, and associations. The examination of self-citing behavior from those new perspectives shed new lights on the nature and function of self-citing behavior.  more » « less
Award ID(s):
1912898
NSF-PAR ID:
10095438
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Information Systems Frontiers
ISSN:
1387-3326
Page Range / eLocation ID:
1-14
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Citations have long been used to characterize the state of a scientific field and to identify influential works. However, writers use citations for different purposes, and this varied purpose influences uptake by future scholars. Unfortunately, our understanding of how scholars use and frame citations has been limited to small-scale manual citation analysis of individual papers. We perform the largest behavioral study of citations to date, analyzing how scientific works frame their contributions through different types of citations and how this framing affects the field as a whole. We introduce a new dataset of nearly 2,000 citations annotated for their function, and use it to develop a state-of-the-art classifier and label the papers of an entire field: Natural Language Processing. We then show how differences in framing affect scientific uptake and reveal the evolution of the publication venues and the field as a whole. We demonstrate that authors are sensitive to discourse structure and publication venue when citing, and that how a paper frames its work through citations is predictive of the citation count it will receive. Finally, we use changes in citation framing to show that the field of NLP is undergoing a significant increase in consensus. 
    more » « less
  2. Abstract Commonly used data citation practices rely on unverifiable retrieval methods which are susceptible to content drift, which occurs when the data associated with an identifier have been allowed to change. Based on our earlier work on reliable dataset identifiers, we propose signed citations, i.e., customary data citations extended to also include a standards-based, verifiable, unique, and fixed-length digital content signature. We show that content signatures enable independent verification of the cited content and can improve the persistence of the citation. Because content signatures are location- and storage-medium-agnostic, cited data can be copied to new locations to ensure their persistence across current and future storage media and data networks. As a result, content signatures can be leveraged to help scalably store, locate, access, and independently verify content across new and existing data infrastructures. Content signatures can also be embedded inside content to create robust, distributed knowledge graphs that can be cited using a single signed citation. We describe applications of signed citations to solve real-world data collection, identification, and citation challenges. 
    more » « less
  3. Accessibility research sits at the junction of several disciplines, drawing influence from HCI, disability studies, psychology, education, and more. To characterize the influences and extensions of accessibility research, we undertake a study of citation trends for accessibility and related HCI communities. We assess the diversity of venues and fields of study represented among the referenced and citing papers of 836 accessibility research papers from ASSETS and CHI, finding that though publications in computer science dominate these citation relationships, the relative proportion of citations from papers on psychology and medicine has grown over time. Though ASSETS is a more niche venue than CHI in terms of citational diversity, both conferences display standard levels of diversity among their incoming and outgoing citations when analyzed in the context of 53K papers from 13 accessibility and HCI conference venues. 
    more » « less
  4. Citations of scientific papers and patents reveal the knowledge flow and usually serve as the metric for evaluating their novelty and impacts in the field. Citation Forecasting thus has various applications in the real world. Existing works on citation forecasting typically exploit the sequential properties of citation events, without exploring the citation network. In this paper, we propose to explore both the citation network and the related citation event sequences which provide valuable information for future citation forecasting. We propose a novel Citation Network and Event Sequence (CINES) Model to encode signals in the citation network and related citation event sequences into various types of embeddings for decoding to the arrivals of future citations. Moreover, we propose a temporal network attention and three alternative designs of bidirectional feature propagation to aggregate the retrospective and prospective aspects of publications in the citation network, coupled with the citation event sequence embeddings learned by a two-level attention mechanism for the citation forecasting. We evaluate our models and baselines on both a U.S. patent dataset and a DBLP dataset. Experimental results show that our models outperform the state-of-the-art methods, i.e., RMTPP, CYAN-RNN, Intensity-RNN, and PC-RNN, reducing the forecasting error by 37.76% - 75.32%. 
    more » « less
  5. null (Ed.)
    Accurate prediction of scientific impact is important for scientists, academic recommender systems, and granting organizations alike. Existing approaches rely on many years of leading citation values to predict a scientific paper’s citations (a proxy for impact), even though most papers make their largest contributions in the first few years after they are published. In this paper, we tackle a new problem: predicting a new paper’s citation time series from the date of publication (i.e., without leading values). We propose HINTS, a novel end-to-end deep learning framework that converts citation signals from dynamic heterogeneous information networks (DHIN) into citation time series. HINTS imputes pseudo-leading values for a paper in the years before it is published from DHIN embeddings, and then transforms these embeddings into the parameters of a formal model that can predict citation counts immediately after publication. Empirical analysis on two real-world datasets from Computer Science and Physics show that HINTS is competitive with baseline citation prediction models. While we focus on citations, our approach generalizes to other “cold start” time series prediction tasks where relational data is available and accurate prediction in early timestamps is crucial. 
    more » « less