skip to main content

Title: Writing a massively multi‐authored paper: Overcoming barriers to meaningful authorship for all

The value of large‐scale collaborations for solving complex problems is widely recognized, but many barriers hinder meaningful authorship for all on the resulting multi‐author publications. Because many professional benefits arise from authorship, much of the literature on this topic has focused on cheating, conflict and effort documentation. However, approaches specifically recognizing and creatively overcoming barriers to meaningful authorship have received little attention.

We have developed an inclusive authorship approach arising from 15 years of experience coordinating the publication of over 100 papers arising from a long‐term, international collaboration of hundreds of scientists.

This method of sharing a paper initially as a storyboard with clear expectations, assignments and deadlines fosters communication and creates unambiguous opportunities for all authors to contribute intellectually. By documenting contributions through this multi‐step process, this approach ensures meaningful engagement by each author listed on a publication.

The perception that co‐authors on large authorship publications have not meaningfully contributed underlies widespread institutional bias against multi‐authored papers, disincentivizing large collaborations despite their widely recognized value for advancing knowledge. Our approach identifies and overcomes key barriers to meaningful contributions, protecting the value of authorship even on massively multi‐authored publications.

more » « less
Award ID(s):
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  
Publisher / Repository:
Date Published:
Journal Name:
Methods in Ecology and Evolution
Medium: X Size: p. 1432-1442
["p. 1432-1442"]
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Purpose The ability to identify the scholarship of individual authors is essential for performance evaluation. A number of factors hinder this endeavor. Common and similarly spelled surnames make it difficult to isolate the scholarship of individual authors indexed on large databases. Variations in name spelling of individual scholars further complicates matters. Common family names in scientific powerhouses like China make it problematic to distinguish between authors possessing ubiquitous and/or anglicized surnames (as well as the same or similar first names). The assignment of unique author identifiers provides a major step toward resolving these difficulties. We maintain, however, that in and of themselves, author identifiers are not sufficient to fully address the author uncertainty problem. In this study we build on the author identifier approach by considering commonalities in fielded data between authors containing the same surname and first initial of their first name. We illustrate our approach using three case studies. Design/methodology/approach The approach we advance in this study is based on commonalities among fielded data in search results. We cast a broad initial net—i.e., a Web of Science (WOS) search for a given author’s last name, followed by a comma, followed by the first initial of his or her first name (e.g., a search for ‘John Doe’ would assume the form: ‘Doe, J’). Results for this search typically contain all of the scholarship legitimately belonging to this author in the given database (i.e., all of his or her true positives), along with a large amount of noise, or scholarship not belonging to this author (i.e., a large number of false positives). From this corpus we proceed to iteratively weed out false positives and retain true positives. Author identifiers provide a good starting point—e.g., if ‘Doe, J’ and ‘Doe, John’ share the same author identifier, this would be sufficient for us to conclude these are one and the same individual. We find email addresses similarly adequate—e.g., if two author names which share the same surname and same first initial have an email address in common, we conclude these authors are the same person. Author identifier and email address data is not always available, however. When this occurs, other fields are used to address the author uncertainty problem. Commonalities among author data other than unique identifiers and email addresses is less conclusive for name consolidation purposes. For example, if ‘Doe, John’ and ‘Doe, J’ have an affiliation in common, do we conclude that these names belong the same person? They may or may not; affiliations have employed two or more faculty members sharing the same last and first initial. Similarly, it’s conceivable that two individuals with the same last name and first initial publish in the same journal, publish with the same co-authors, and/or cite the same references. Should we then ignore commonalities among these fields and conclude they’re too imprecise for name consolidation purposes? It is our position that such commonalities are indeed valuable for addressing the author uncertainty problem, but more so when used in combination. Our approach makes use of automation as well as manual inspection, relying initially on author identifiers, then commonalities among fielded data other than author identifiers, and finally manual verification. To achieve name consolidation independent of author identifier matches, we have developed a procedure that is used with bibliometric software called VantagePoint (see While the application of our technique does not exclusively depend on VantagePoint, it is the software we find most efficient in this study. The script we developed to implement this procedure is designed to implement our name disambiguation procedure in a way that significantly reduces manual effort on the user’s part. Those who seek to replicate our procedure independent of VantagePoint can do so by manually following the method we outline, but we note that the manual application of our procedure takes a significant amount of time and effort, especially when working with larger datasets. Our script begins by prompting the user for a surname and a first initial (for any author of interest). It then prompts the user to select a WOS field on which to consolidate author names. After this the user is prompted to point to the name of the authors field, and finally asked to identify a specific author name (referred to by the script as the primary author) within this field whom the user knows to be a true positive (a suggested approach is to point to an author name associated with one of the records that has the author’s ORCID iD or email address attached to it). The script proceeds to identify and combine all author names sharing the primary author’s surname and first initial of his or her first name who share commonalities in the WOS field on which the user was prompted to consolidate author names. This typically results in significant reduction in the initial dataset size. After the procedure completes the user is usually left with a much smaller (and more manageable) dataset to manually inspect (and/or apply additional name disambiguation techniques to). Research limitations Match field coverage can be an issue. When field coverage is paltry dataset reduction is not as significant, which results in more manual inspection on the user’s part. Our procedure doesn’t lend itself to scholars who have had a legal family name change (after marriage, for example). Moreover, the technique we advance is (sometimes, but not always) likely to have a difficult time dealing with scholars who have changed careers or fields dramatically, as well as scholars whose work is highly interdisciplinary. Practical implications The procedure we advance has the ability to save a significant amount of time and effort for individuals engaged in name disambiguation research, especially when the name under consideration is a more common family name. It is more effective when match field coverage is high and a number of match fields exist. Originality/value Once again, the procedure we advance has the ability to save a significant amount of time and effort for individuals engaged in name disambiguation research. It combines preexisting with more recent approaches, harnessing the benefits of both. Findings Our study applies the name disambiguation procedure we advance to three case studies. Ideal match fields are not the same for each of our case studies. We find that match field effectiveness is in large part a function of field coverage. Comparing original dataset size, the timeframe analyzed for each case study is not the same, nor are the subject areas in which they publish. Our procedure is more effective when applied to our third case study, both in terms of list reduction and 100% retention of true positives. We attribute this to excellent match field coverage, and especially in more specific match fields, as well as having a more modest/manageable number of publications. While machine learning is considered authoritative by many, we do not see it as practical or replicable. The procedure advanced herein is both practical, replicable and relatively user friendly. It might be categorized into a space between ORCID and machine learning. Machine learning approaches typically look for commonalities among citation data, which is not always available, structured or easy to work with. The procedure we advance is intended to be applied across numerous fields in a dataset of interest (e.g. emails, coauthors, affiliations, etc.), resulting in multiple rounds of reduction. Results indicate that effective match fields include author identifiers, emails, source titles, co-authors and ISSNs. While the script we present is not likely to result in a dataset consisting solely of true positives (at least for more common surnames), it does significantly reduce manual effort on the user’s part. Dataset reduction (after our procedure is applied) is in large part a function of (a) field availability and (b) field coverage. 
    more » « less
  2. null (Ed.)
    Although computer science education (CSEd) is growing rapidly as a discipline, presently there are a limited number of formal programs available for students to pursue graduate degrees. To explore what options exist, we sought to develop a better understanding of the researchers and institutions currently working in CSEd. We collected publication data between 2015 and 2020 from the Innovation and Technology in Computer Science Education (ITiCSE) and ACM International Computing Education Research (ICER) conferences, and from the ACM Transactions on Computing Education (TOCE) journal. Using a total of 1,099 publications, we analyzed the authorship blocks and their affiliations. We created a comprehensive database, used for analysis on recent contributions to CSEd research. Among other findings, we observed that 2,068 distinct authors contributed, spanning 578 global institutions. From these, 963 of the authors came from 236 distinct universities in the United States. Moreover, we found that most often, new growth from international contributions resulted from the participation of additional universities, whereas in the United States most growth was the result of new contributors from the same universities. The results of this research are intended to encourage global collaborations, to provide an informative guide about recent publications in the field, and also to serve as a guidepost for graduate recruitment and further exploration into CSEd research and programs. 
    more » « less
  3. Information about grants funded by NSF to support SES research from 2000-2015. The grants included in this dataset are a subset that we identified as having an SES research focus from a set of grants accessed using the Dimensions platform ( CSV file with 35 columns and names in header row: "Grant Searched" lists the granting NSF program (text); "Grant Searched 2" lists a secondary granting NSF program, if applicable (text); "Grant ID" is the ID from the Dimensions platform (string); "Grant Number" is the NSF Award number (integer); "Number of Papers (NSF)" is the count of publications listed under "PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH" in the NSF Award Search page for the grant (integer); "Number of Pubs Tracked" is the count of publications from "Number of Papers (NSF)" included in our analysis (integer); "Publication notes" are our notes about the publication information. We used "subset" to denote when a grant was associated with >10 publications and we used a random sample of 10 publications in our analysis (text); "Unique ID" is our unique identifier for each grant in the dataset (integer); "Collaborative/Cross Program" denotes whether the grant was submitted as part of a set of collaborative or cross-program proposals. In this case, all linked proposals are given the same unique identifier and treated together in the analysis. (text); "Title" is the title of the grant (text); "Title translated" is the title of the grant translated to English, where applicable (text); "Abstract" is the abstract of the grant (text); "Abstract translated" is the abstract of the grant translated to English, where applicable (text); "Funding Amount" is the numeric value of funding awarded to the grant (integer); "Currency" is the currency associated with the field "Funding Amount" (text); "Funding Amount in USD" is the numeric value of funding awarded to the grant expressed in US Dollars (integer); "Start Date" is the start date of the grant (mm/dd/yyyy); "Start Year" is the year in which grant funding began (year); "End Date" is the end date of the grant (mm/dd/yyyy); "End Year" is the year in which the term of the grant expired (year); "Researchers" lists the Principal Investigators on the grant in First Name Last Name format, separated by semi-colons (text); "Research Organization - original" gives the affiliation of the lead PI as listed in the grant (text); "Research Organization - standardized" gives the affiliation of each PI in the list, separated by semi-colons (text); "GRID ID" is a list of the unique identifier for each the Research Organization in the Global Research Identifier Database [], separated by semi-colons (string); "Country of Research organization" is a list of the countries in which each Research Organization is located, separated by semi-colons (text); "Funder" gives the NSF Directorate that funded the grant (text); "Source Linkout" is a link to the NSF Award Search page with information about the grant (URL); "Dimensions URL" is a link to information about the grant in Dimensions (URL); "FOR (ANZSRC) Categories" is a list of Field of Research categories [from the Australian and New Zealand Standard Research Classification (ANZSRC) system] associated with each grant, separated by semi-colons (string); "FOR [1-5]" give the FOR categories separated. "NOTES" provide any other notes added by the authors of this dataset during our processing of these data. 
    more » « less
  4. From co-authored publications to sponsored projects involving multiple partner institutions, collaborative practice is an expected part of work in the academy. As evaluators of a National Science Foundation (NSF) Alliances for Graduate Education and the Professoriate (AGEP) grant awarded to four university partners in a large southern state, the authors recognized the increasing value of collaborative practice in the design, implementation, evaluation, and dissemination of findings in the partnership over time. When planning a program among partnering institutions, stakeholders may underestimate the need for, and value of, collaborative practice in facilitating partnership functioning. This method paper outlines an evaluative model to increase the use of collaborative practice in funded academic partnership programs. The model highlights collaborative practice across multiple stakeholder groups in the academic ecology: Sponsors of funded programs (S), Program partners and participants (P), Assessment and evaluation professionals (A), academic researchers (R), and the national and global Community (C). The SPARC model emphasizes evidence-based benefits of collaborative practice across multiple outcome domains. Tools and frameworks for evaluating collaborative practice take a view of optimizing partnership operational performance in achieving stated goals. Collaborative practice can also be an integral element of program activities that support the academic success and scholarly productivity, psychosocial adjustment, and physical and psychological well-being of stakeholders participating in the program. Given the goal of our alliance to promote diversification of the professoriate, the model highlights the use of collaborative practice in supporting stakeholders from groups historically underrepresented in STEM fields across these outcome domains. Using data from a mixed-methods program evaluation of our AGEP alliance over 4 years, the authors provide concrete examples of collaborative practice and their measurement. Results discuss important themes regarding collaborative practice that emerged in each stakeholder group. Authors operationalize the SPARC model with a checklist to assist program stakeholders in designing for and assessing collaborative practice in support of project goals in funded academic partnership projects, emphasizing the contributions of collaborative practice in promoting diversification of the professoriate. 
    more » « less
  5. Zhang, Yuji (Ed.)
    In recent years, United States federal funding agencies, including the National Institutes of Health (NIH) and the National Science Foundation (NSF), have implemented public access policies to make research supported by funding from these federal agencies freely available to the public. Enforcement is primarily through annual and final reports submitted to these funding agencies, where all peer-reviewed publications must be registered through the appropriate mechanism as required by the specific federal funding agency. Unreported and/or incorrectly reported papers can result in delayed acceptance of annual and final reports and even funding delays for current and new research grants. So, it’s important to make sure every peer-reviewed publication is reported properly and in a timely manner. For large collaborative research efforts, the tracking and proper registration of peer-reviewed publications along with generation of accurate annual and final reports can create a large administrative burden. With large collaborative teams, it is easy for these administrative tasks to be overlooked, forgotten, or lost in the shuffle. In order to help with this reporting burden, we have developed the Academic Tracker software package, implemented in the Python 3 programming language and supporting Linux, Windows, and Mac operating systems. Academic Tracker helps with publication tracking and reporting by comprehensively searching major peer-reviewed publication tracking web portals, including PubMed, Crossref, ORCID, and Google Scholar, given a list of authors. Academic Tracker provides highly customizable reporting templates so information about the resulting publications is easily transformed into appropriate formats for tracking and reporting purposes. The source code and extensive documentation is hosted on GitHub ( ) and is also available on the Python Package Index ( ) for easy installation. 
    more » « less