skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: How Data Scientists Review the Scholarly Literature
Keeping up with the research literature plays an important role in the workflow of scientists – allowing them to understand a field, formulate the problems they focus on, and develop the solutions that they contribute, which in turn shape the nature of the discipline. In this paper, we examine the literature review practices of data scientists. Data science represents a field seeing an exponential rise in papers, and increasingly drawing on and being applied in numerous diverse disciplines. Recent efforts have seen the development of several tools intended to help data scientists cope with a deluge of research and coordinated efforts to develop AI tools intended to uncover the research frontier. Despite these trends indicative of the information overload faced by data scientists, no prior work has examined the specific practices and challenges faced by these scientists in an interdisciplinary field with evolving scholarly norms. In this paper, we close this gap through a set of semi-structured interviews and think-aloud protocols of industry and academic data scientists (N = 20). Our results while corroborating other knowledge workers’ practices uncover several novel findings: individuals (1) are challenged in seeking and sensemaking of papers beyond their disciplinary bubbles, (2) struggle to understand papers in the face of missing details and mathematical content, (3) grapple with the deluge by leveraging the knowledge context in code, blogs, and talks, and (4) lean on their peers online and in-person. Furthermore, we outline future directions likely to help data scientists cope with the burgeoning research literature.  more » « less
Award ID(s):
1922090
PAR ID:
10473017
Author(s) / Creator(s):
; ; ; ; ;
Editor(s):
Gwizdka, Jacek; Rieh, Soo Young
Publisher / Repository:
ACM
Date Published:
Format(s):
Medium: X
Location:
Austin TX USA
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    In many academic fields, the number of papers published each year has increased significantly over time. Policy measures aim to increase the quantity of scientists, research funding, and scientific output, which is measured by the number of papers produced. These quantitative metrics determine the career trajectories of scholars and evaluations of academic departments, institutions, and nations. Whether and how these increases in the numbers of scientists and papers translate into advances in knowledge is unclear, however. Here, we first lay out a theoretical argument for why too many papers published each year in a field can lead to stagnation rather than advance. The deluge of new papers may deprive reviewers and readers the cognitive slack required to fully recognize and understand novel ideas. Competition among many new ideas may prevent the gradual accumulation of focused attention on a promising new idea. Then, we show data supporting the predictions of this theory. When the number of papers published per year in a scientific field grows large, citations flow disproportionately to already well-cited papers; the list of most-cited papers ossifies; new papers are unlikely to ever become highly cited, and when they do, it is not through a gradual, cumulative process of attention gathering; and newly published papers become unlikely to disrupt existing work. These findings suggest that the progress of large scientific fields may be slowed, trapped in existing canon. Policy measures shifting how scientific work is produced, disseminated, consumed, and rewarded may be called for to push fields into new, more fertile areas of study. 
    more » « less
  2. Exposure to ideas in domains outside a scientist's own may benefit her in reformulating existing research problems in novel ways and discovering new application domains for existing solution ideas. While improved performance in scholarly search engines can help scientists efficiently identify relevant advances in domains they may already be familiar with, it may fall short of helping them explore diverse ideas \textit{outside} such domains. In this paper we explore the design of systems aimed at augmenting the end-user ability in cross-domain exploration with flexible query specification. To this end, we develop an exploratory search system in which end-users can select a portion of text core to their interest from a paper abstract and retrieve papers that have a high similarity to the user-selected core aspect but differ in terms of domains. Furthermore, end-users can `zoom in' to specific domain clusters to retrieve more papers from them and understand nuanced differences within the clusters. Our case studies with scientists uncover opportunities and design implications for systems aimed at facilitating cross-domain exploration and inspiration. 
    more » « less
  3. This special issue aims to extend the active discourse on applying behavioral science-based tools to policymaking in the fields of food, agriculture, and agri-environmental issues. Papers in this special issue evaluate the impact of behavioral science-based tools to understand their effectiveness and limitations. Additionally, for this introductory paper, we collected and analyzed data from the 91 submissions we received for this special issue to identify knowledge gaps and priorities for future policy research. Our findings show that behavioral interventions have small effect sizes but, when coupled with other policy tools, can have larger effects. We highlight that future research in these areas must aim to overcome the current shortcomings of the literature in terms of the use of hypothetical or low stakes in experiments, the focus on only measuring short-term behavior change, and the general lack of discussion on cost-effectiveness and mechanisms. Furthermore, we found that most behavioral science interventions in these submitted papers focused either on consumers or producers and thus offered little insight into other actors in the supply chain. We argue that a focus on better research practices is needed to improve policy-oriented behavioral science-based research in the future and note that accepted papers in this special issue were more likely to employ these practices. Finally, we offer six insights and recommendations for researchers and practitioners that arise from this special issue 
    more » « less
  4. Science gateways have been a crucial tool that lowers the barriers of computer language proficiency for researchers and scientists alike to implement digital tools to further their research agendas. However, gateways remain somewhat esoteric and difficult to use for many potential users. A chatbot has been proposed as a solution to aid gateway users and for the improvement of gateway usability. Via in-depth interviews with 10 medical professionals, we investigated the challenges they faced when extracting data, namely, slow speed, limited scope, and mixed quality of data. We suggest future gateway developments to address the issues that medical professionals face when searching for publications and data. Findings suggest that gateways could serve practitioners (i.e., clinicians, healthcare providers in this case), beyond the original vision for research and education. Moreover, gateway projects could consider conducting similar market research interviews to better understand the work context (including challenges) faced by the intended users of specific gateways. 
    more » « less
  5. Background: Code refactoring is widely recognized as an essential software engineering practice to improve the understandability and maintainability of the source code. The Extract Method refactoring is considered as “Swiss army knife” of refactorings, as developers often apply it to improve their code quality, e.g., decompose long code fragments, reduce code complexity, eliminate duplicated code, etc. In recent years, several studies attempted to recommend Extract Method refactorings allowing the collection, analysis, and revelation of actionable data-driven insights about refactoring practices within software projects. Aim: In this paper, we aim at reviewing the current body of knowledge on existing Extract Method refactoring research and explore their limitations and potential improvement opportunities for future research efforts. That is, Extract Method is considered one of the most widely-used refactorings, but difficult to apply in practice as it involves low-level code changes such as statements, variables, parameters, return types, etc. Hence, researchers and practitioners begin to be aware of the state-of-the-art and identify new research opportunities in this context. Method: We review the body of knowledge related to Extract Method refactoring in the form of a systematic literature review (SLR). After compiling an initial pool of 1,367 papers, we conducted a systematic selection and our final pool included 83 primary studies. We define three sets of research questions and systematically develop and refine a classification schema based on several criteria including their methodology, applicability, and degree of automation. Results: The results construct a catalog of 83 Extract Method approaches indicating that several techniques have been proposed in the literature. Our results show that: (i) 38.6% of Extract Method refactoring studies primarily focus on addressing code clones; (ii) Several of the Extract Method tools incorporate the developer's involvement in the decision-making process when applying the method extraction, and (iii) the existing benchmarks are heterogeneous and do not contain the same type of information, making standardizing them for the purpose of benchmarking difficult. Conclusions: Our study serves as an “index” to the body of knowledge in this area for researchers and practitioners in determining the Extract Method refactoring approach that is most appropriate for their needs. Our findings also empower the community with information to guide the future development of refactoring tools. 
    more » « less