skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Tracking researchers and their outputs: new insights from ORCIDs
The ability to identify scholarly authors is central to bibliometric analysis. Efforts to disambiguate author names using algorithms or national or societal registries become less effective with increases in the number of publications from China and other nations where shared and similar names are prevalent. This work analyzes the adoption and integration of an open source, cross-national identification system, the Open Researcher and Contributor ID system (ORCID), in Web of Science metadata. Results at the article level show greater adoption, to date, of the ORCID iD in Europe as compared with Asia and the US. Focusing analysis on individual highly cited researchers with the shared Chinese surname “Wang,” results indicate wide scope for greater adoption of ORCID. The mechanisms for integrating ORCID iDs into articles also come into question in an analysis of co-authors of one particular highly cited researcher who have varying percentages of articles with ORCID iDs attached. These results suggest that systematic variations in adoption and integration of ORCID into publication metadata should be considered in any bibliometric analysis based on it.  more » « less
Award ID(s):
1645237
PAR ID:
10036464
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Scientometrics
ISSN:
0138-9130
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Abstract How can we evaluate the performance of a disambiguation method implemented on big bibliographic data? This study suggests that the open researcher profile system, ORCID, can be used as an authority source to label name instances at scale. This study demonstrates the potential by evaluating the disambiguation performances of Author-ity2009 (which algorithmically disambiguates author names in MEDLINE) using 3 million name instances that are automatically labeled through linkage to 5 million ORCID researcher profiles. Results show that although ORCID-linked labeled data do not effectively represent the population of name instances in Author-ity2009, they do effectively capture the ‘high precision over high recall’ performances of Author-ity2009. In addition, ORCID-linked labeled data can provide nuanced details about the Author-ity2009’s performance when name instances are evaluated within and across ethnicity categories. As ORCID continues to be expanded to include more researchers, labeled data via ORCID-linkage can be improved in representing the population of a whole disambiguated data and updated on a regular basis. This can benefit author name disambiguation researchers and practitioners who need large-scale labeled data but lack resources for manual labeling or access to other authority sources for linkage-based labeling. The ORCID-linked labeled data for Author-ity2009 are publicly available for validation and reuse. 
    more » « less
  2. null (Ed.)
    This article describes the motivation, design, and progress of the Journal of Open Source Software (JOSS). JOSS is a free and open-access journal that publishes articles describing research software. It has the dual goals of improving the quality of the software submitted and providing a mechanism for research software developers to receive credit. While designed to work within the current merit system of science, JOSS addresses the dearth of rewards for key contributions to science made in the form of software. JOSS publishes articles that encapsulate scholarship contained in the software itself, and its rigorous peer review targets the software components: functionality, documentation, tests, continuous integration, and the license. A JOSS article contains an abstract describing the purpose and functionality of the software, references, and a link to the software archive. The article is the entry point of a JOSS submission, which encompasses the full set of software artifacts. Submission and review proceed in the open, on GitHub. Editors, reviewers, and authors work collaboratively and openly. Unlike other journals, JOSS does not reject articles requiring major revision; while not yet accepted, articles remain visible and under review until the authors make adequate changes (or withdraw, if unable to meet requirements). Once an article is accepted, JOSS gives it a digital object identifier (DOI), deposits its metadata in Crossref, and the article can begin collecting citations on indexers like Google Scholar and other services. Authors retain copyright of their JOSS article, releasing it under a Creative Commons Attribution 4.0 International License. In its first year, starting in May 2016, JOSS published 111 articles, with more than 40 additional articles under review. JOSS is a sponsored project of the nonprofit organization NumFOCUS and is an affiliate of the Open Source Initiative (OSI). 
    more » « less
  3. Context This research was conducted within the NSF-SEEKCommons Project, a research initiative dedicated to supporting Open Science and Open Access in disciplinary research. The project has a special interest in understanding the role that critical infrastructure has in supporting open initiatives. The Open Journal System (OJS) serves as a long-standing fundamental piece for Open Access throughout the globe. Hence, it provides valuable information about experiences developing, deploying, and maintaining open technologies.  Methods We used mixed methods for our research, triangulating repository data, installation data, interviews, and documentary analysis. We collected repository data using a report generator (Kopp [2018] 2024) that uses repository metadata to present general statistics about a Git project. The resulting information was manually curated, disambiguated, and annotated to have a homogeneous set of developers with information about their institutional affiliation and country.    Names are normalized based on the information in qualitative interviews and by browsing the full-extent commits in the GitHub repository. Other sources for this were the institutional materials (available in current and archived versions of the PKP website), meeting minutes, the user forum, and further project documentation available online. GitHub handles are homologated to their most comprehensive version. For institutional and country affiliation, we resorted to GitHub profiles, PKP documentation and forums, institutional domains available in emails, and researchers' ORCID IDs.  Available files Information about the codebase (number of files, lines of code, and timestamp) organized by month, quarter, and semester. See file: OJS_GitStats_04-24.csv Information about the historical evolution of the codebase (number of files, lines of code, and timestamp), including a description of the top committers for each month. Commiters are described by including their institutional affiliation and country of origin. See file: OJS_DevStats_Institution-Country_1.tsv Information about the historical evolution of the codebase focusing on top committers, along with their institution and country. This file is formatted to map the co-occurrence of developers and attributes by month between 2004-2024.See file: OJS_DevStats_Institution-Country_2.tsv Selected fields to describe working and regularly maintained plugins for OJS as of October 2024. Includes name of the plugin, homepage, description, maintainer, and institutional affiliation. See file: OJS_Plugins_2024_Processed.tsv Details of the aggregated information included in Table 5 of the article.See file: OJS_Plugins_2024_Table5.tsv Snapshot to XML information of the plugin gallery of OJS (October 21) retrieved from PKP website (Smecher 2024)See file: OJS_Plugins_2024.csv Funding The SEEKCommons Project is funded by the U.S. National Science Foundation (NSF), grant #2226425 
    more » « less
  4. Peer-reviewed publications and patents serve as important signatures of knowledge generation, and therefore the authors and their organizations can represent agents of intellectual transformation. Accurate tracking of these players enables scholars to follow knowledge evolution. However, while author name disambiguation has been discussed extensively, less is known about the impact of organization name on bibliometric studies. We expand here on the recently defined phenomenon of "onomastic profusion," high-frequency words used in organization names for semantic reasons, and thus contributing a non-random source of error to bibliographic studies. We use the Small Business Innovation Research (SBIR) Phase I awardees of the National Aeronautics and Space Administration (NASA) as a use case in the field of engineering innovation. We find that firms in California or Massachusetts experience a six percent decrease in the likelihood of using the word "Technologies" in their names. Furthermore, use of the words "Research" and "Science" is linked to doubling the number of awards. We illustrate that, in aggregate, firms executing rational strategic naming decisions can create deterministic bibliometric challenges. 
    more » « less
  5. null (Ed.)
    Abstract Objective This study aims at reviewing novel coronavirus disease (COVID-19) datasets extracted from PubMed Central articles, thus providing quantitative analysis to answer questions related to dataset contents, accessibility and citations. Methods We downloaded COVID-19-related full-text articles published until 31 May 2020 from PubMed Central. Dataset URL links mentioned in full-text articles were extracted, and each dataset was manually reviewed to provide information on 10 variables: (1) type of the dataset, (2) geographic region where the data were collected, (3) whether the dataset was immediately downloadable, (4) format of the dataset files, (5) where the dataset was hosted, (6) whether the dataset was updated regularly, (7) the type of license used, (8) whether the metadata were explicitly provided, (9) whether there was a PubMed Central paper describing the dataset and (10) the number of times the dataset was cited by PubMed Central articles. Descriptive statistics about these seven variables were reported for all extracted datasets. Results We found that 28.5% of 12 324 COVID-19 full-text articles in PubMed Central provided at least one dataset link. In total, 128 unique dataset links were mentioned in 12 324 COVID-19 full text articles in PubMed Central. Further analysis showed that epidemiological datasets accounted for the largest portion (53.9%) in the dataset collection, and most datasets (84.4%) were available for immediate download. GitHub was the most popular repository for hosting COVID-19 datasets. CSV, XLSX and JSON were the most popular data formats. Additionally, citation patterns of COVID-19 datasets varied depending on specific datasets. Conclusion PubMed Central articles are an important source of COVID-19 datasets, but there is significant heterogeneity in the way these datasets are mentioned, shared, updated and cited. 
    more » « less