skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, October 10 until 2:00 AM ET on Friday, October 11 due to maintenance. We apologize for the inconvenience.


Title: YouTube and science: models for research impact
Video communication has been rapidly increasing over the past decade, with YouTube providing a medium where users can post, discover, share, and react to videos. There has also been an increase in the number of videos citing research articles, especially since it has become relatively commonplace for academic conferences to require video submissions. However, the relationship between research articles and YouTube videos is not clear, and the purpose of the present paper is to address this issue. We created new datasets using YouTube videos and mentions of research articles on various online platforms. We found that most of the articles cited in the videos are related to medicine and biochemistry. We analyzed these datasets through statistical techniques and visualization, and built machine learning models to predict (1) whether a research article is cited in videos, (2) whether a research article cited in a video achieves a level of popularity, and (3) whether a video citing a research article becomes popular. The best models achieved F1 scores between 80% and 94%. According to our results, research articles mentioned in more tweets and news coverage have a higher chance of receiving video citations. We also found that video views are important for predicting citations and increasing research articles’ popularity and public engagement with science.  more » « less
Award ID(s):
2022443
NSF-PAR ID:
10482189
Author(s) / Creator(s):
; ;
Publisher / Repository:
Springer https://link.springer.com/article/10.1007/s11192-022-04574-5#citeas
Date Published:
Journal Name:
Scientometrics
Volume:
128
Issue:
2
ISSN:
1588-2861
Page Range / eLocation ID:
933 to 955
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Traditional citation analysis methods have been criticized because their theoretical base of statistical counts does not reflect the motive or judgment of citing authors. In particular, self-citations may give undue credits to a cited article or mislead scientific development. This research aims to answer the question of whether self-citation is biased by probing into the motives and context of citations. It takes an integrated and fine-grained view of self-citations by examining them via multiple lenses—polarity, density, and location of citations. In addition, it explores potential moderating effects of citation level and associations among location contexts of citations to the same references for the first time. We analyzed academic publications across different topics and disciplines using both qualitative and quantitative methods. The results provide evidence that self-citations are free of bias in terms of citation density and polarity uncertainty, but they can be biased with respect to positivity and negativity of citations. Furthermore, this study reveals impacts of self-citing behavior on some citation patterns involving citation density, location concentration, and associations. The examination of self-citing behavior from those new perspectives shed new lights on the nature and function of self-citing behavior. 
    more » « less
  2. null (Ed.)
    Abstract Objective This study aims at reviewing novel coronavirus disease (COVID-19) datasets extracted from PubMed Central articles, thus providing quantitative analysis to answer questions related to dataset contents, accessibility and citations. Methods We downloaded COVID-19-related full-text articles published until 31 May 2020 from PubMed Central. Dataset URL links mentioned in full-text articles were extracted, and each dataset was manually reviewed to provide information on 10 variables: (1) type of the dataset, (2) geographic region where the data were collected, (3) whether the dataset was immediately downloadable, (4) format of the dataset files, (5) where the dataset was hosted, (6) whether the dataset was updated regularly, (7) the type of license used, (8) whether the metadata were explicitly provided, (9) whether there was a PubMed Central paper describing the dataset and (10) the number of times the dataset was cited by PubMed Central articles. Descriptive statistics about these seven variables were reported for all extracted datasets. Results We found that 28.5% of 12 324 COVID-19 full-text articles in PubMed Central provided at least one dataset link. In total, 128 unique dataset links were mentioned in 12 324 COVID-19 full text articles in PubMed Central. Further analysis showed that epidemiological datasets accounted for the largest portion (53.9%) in the dataset collection, and most datasets (84.4%) were available for immediate download. GitHub was the most popular repository for hosting COVID-19 datasets. CSV, XLSX and JSON were the most popular data formats. Additionally, citation patterns of COVID-19 datasets varied depending on specific datasets. Conclusion PubMed Central articles are an important source of COVID-19 datasets, but there is significant heterogeneity in the way these datasets are mentioned, shared, updated and cited. 
    more » « less
  3. When previous research is cited incorrectly, misinformation can infiltrate scientific discourse and undermine scholarly knowledge. One of the more damaging citation issues involves incorrectly citing article content (called quotation errors); therefore, investigating quotation accuracy is an important research endeavor. One field where quotation accuracy is needed is in the learning sciences given its impact on pedagogy. An integral article in pedagogical discussions surrounding how to teach at the college level is the meta-analysis on active learning by Freeman et al. The Freeman et al. meta-analysis compared active learning to traditional lecture in terms of its effects on student learning and has been important in national initiatives on STEM (science, technology, engineering, and mathematics) reform. Given its influence coupled with the impact quotation errors could have in scientific discourse, we used citation context analysis to analyze whether assertions in the citing text that related to the efficacy of lecture and active learning were supported by what was explicitly stated in the cited meta-analysis. Assertions were analyzed under supported, unsupported, or irrelevant for purposes of study categories. The most prevalent supported category related to active learning being more effective than lecture; the most prevalent unsupported category related to the effectiveness of specific activities/approaches other than the general approach of active learning. Overall, the percentage of supported assertions was 47.67%, and the percentage of unsupported assertions was 26.01%. Furthermore, the percentage of articles containing at least one unsupported assertion was 34.77%. Proactive measures are needed to reduce the incidence of quotation errors to ensure robust scientific integrity.

     
    more » « less
  4. null (Ed.)
    Communication of scientific findings is fundamental to scholarly discourse. In this article, we show that academic review articles, a quintessential form of interpretive scholarly output, perform curatorial work that substantially transforms the research communities they aim to summarize. Using a corpus of millions of journal articles, we analyze the consequences of review articles for the publications they cite, focusing on citation and co-citation as indicators of scholarly attention. Our analysis shows that, on the one hand, papers cited by formal review articles generally experience a dramatic loss in future citations. Typically, the review gets cited instead of the specific articles mentioned in the review. On the other hand, reviews curate, synthesize, and simplify the literature concerning a research topic. Most reviews identify distinct clusters of work and highlight exemplary bridges that integrate the topic as a whole. These bridging works, in addition to the review, become a shorthand characterization of the topic going forward and receive disproportionate attention. In this manner, formal reviews perform creative destruction so as to render increasingly expansive and redundant bodies of knowledge distinct and comprehensible. 
    more » « less
  5. YouTube is the most popular video sharing platform with more than 2 billion active users and 1 billion hours of video content watched daily. The dominance of YouTube has had a big impact on the performance of Internet protocols, algorithms, and systems. Understanding the interaction of users with YouTube is thus of much interest to the research community. In this context, we collect YouTube watch history data from 243 users spanning a 1.5 year period. The dataset comprises of a total of 1.8 million videos. We use the dataset to analyze and present key insights about user-level usage behavior. We also show that our analysis can be used by researchers to tackle a myriad of problems in the general domains of networking and communication. We present baseline characteristics and also substantiated directions to solve a few representative problems related to local caching techniques, prefetching strategies, the performance of YouTube's recommendation engine, the variability of user's video preferences and application specific load provisioning. 
    more » « less