NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Navigating the Landscape of Reproducible Research: A Predictive Modeling Approach

https://doi.org/10.1145/3627673.3679831

Akella, Akhil Pandey; Choudhury, Sagnik Ray; Koop, David; Alhoori, Hamed (October 2024, ACM CIKM)

The reproducibility of scientific articles is central to the advancement of science. Despite this importance, evaluating reproducibility remains challenging due to the scarcity of ground truth data. Predictive models can address this limitation by streamlining the tedious evaluation process. Typically, a paper’s reproducibility is inferred based on the availability of artifacts such as code, data, or supplemental information, often without extensive empirical investigation. To address these issues, we utilized artifacts of papers as fundamental units to develop a novel, dual-spectrum framework that focuses on author-centric and external-agent perspectives. We used the author-centric spectrum, followed by the external-agent spectrum, to guide a structured, model-based approach to quantify and assess reproducibility. We explored the interdependencies between different factors influencing reproducibility and found that linguistic features such as readability and lexical diversity are strongly correlated with papers achieving the highest statuses on both spectrums. Our work provides a model-driven pathway for evaluating the reproducibility of scientific research.
more » « less
Full Text Available
Laying Foundations to Quantify the “Effort of Reproducibility”

https://doi.org/10.1109/JCDL57899.2023.00018

Akella, Akhil Pandey; Koop, David; Alhoori, Hamed (October 2023, 2023 ACM/IEEE Joint Conference on Digital Libraries (JCDL))

Why are some research studies easy to reproduce while others are difficult? Casting doubt on the accuracy of scientific work is not fruitful, especially when an individual researcher cannot reproduce the claims made in the paper. There could be many subjective reasons behind the inability to reproduce a scientific paper. The field of Machine Learning (ML) faces a reproducibility crisis, and surveying a portion of published articles has resulted in a group realization that although sharing code repositories would be appreciable, code bases are not the end all be all for determining the reproducibility of an article. Various parties involved in the publication process have come forward to address the reproducibility crisis and solutions such as badging articles as reproducible, reproducibility checklists at conferences (NeurIPS, ICML, ICLR, etc.), and sharing artifacts on OpenReview come across as promising solutions to the core problem. The breadth of literature on reproducibility focuses on measures required to avoid ir-reproducibility, and there is not much research into the effort behind reproducing these articles. In this paper, we investigate the factors that contribute to the easiness and difficulty of reproducing previously published studies and report on the foundational framework to quantify effort of reproducibility.
more » « less
Full Text Available
Facilitating Dependency Exploration in Computational Notebooks

https://doi.org/10.1145/3597465.3605222

Brown, Colin; Alhoori, Hamed; Koop, David (June 2023, HILDA '23: Proceedings of the Workshop on Human-In-the-Loop Data Analytics)

Computational notebooks promote exploration by structuring code, output, and explanatory text, into cells. The input code and rich outputs help users iteratively investigate ideas as they explore or analyze data. The links between these cells–how the cells depend on each other–are important in understanding how analyses have been developed and how the results can be reproduced. Specifically, a code cell that uses a particular identifier depends on the cell where that identifier is defined or mutated. Because notebooks promote fluid editing where cells can be moved and run in any order, cell dependencies are not always clear or easy to follow. We examine different tools that seek to address this problem by extending Jupyter notebooks and evaluate how well they support users in accomplishing tasks that require understanding dependencies. We also evaluate visualization techniques that provide views of the dependencies to help users navigate cell dependencies.
more » « less
Full Text Available
Reproducibility Signals in Science: A preliminary analysis

Akella, Akhil Pandey; Alhoori, Hamed; Koop, David (November 2022, The first Workshop on Information Extraction from Scientific Publications)
Tirthankar Ghosal, Sergi Blanco-Cuaresma (Ed.)
Reproducibility is an important feature of science; experiments are retested, and analyses are repeated. Trust in the findings increases when consistent results are achieved. Despite the importance of reproducibility, significant work is often involved in these efforts, and some published findings may not be reproducible due to oversights or errors. In this paper, we examine a myriad of features in scholarly articles published in computer science conferences and journals and test how they correlate with reproducibility. We collected data from three different sources that labeled publications as either reproducible or irreproducible and employed statistical significance tests to identify features of those publications that hold clues about reproducibility. We found the readability of the scholarly article and accessibility of the software artifacts through hyperlinks to be strong signals noticeable amongst reproducible scholarly articles.
more » « less
Full Text Available
Toward Systematic Design Considerations of Organizing Multiple Views

https://doi.org/10.1109/VIS54862.2022.00030

Shaikh, Abdul Rahman; Koop, David; Alhoori, Hamed; Sun, Maoyuan (October 2022, 2022 IEEE Visualization and Visual Analytics (VIS))

Multiple-view visualization (MV) has been used for visual analytics in various fields (e.g., bioinformatics, cybersecurity, and intelligence analysis). Because each view encodes data from a particular per-spective, analysts often use a set of views laid out in 2D space to link and synthesize information. The difficulty of this process is impacted by the spatial organization of these views. For instance, connecting information from views far from each other can be more challenging than neighboring ones. However, most visual analysis tools currently either fix the positions of the views or completely delegate this organization of views to users (who must manually drag and move views). This either limits user involvement in managing the layout of MV or is overly flexible without much guidance. Then, a key design challenge in MV layout is determining the factors in a spatial organization that impact understanding. To address this, we review a set of MV-based systems and identify considerations for MV layout rooted in two key concerns: perception, which considers how users perceive view relationships, and content, which considers the relationships in the data. We show how these allow us to study and analyze the design of MV layout systematically.
more » « less
Full Text Available
Towards Systematic Design Considerations for Visualizing Cross-View Data Relationships

https://doi.org/10.1109/TVCG.2021.3102966

Sun, Maoyuan; Namburi, Akhil; Koop, David; Zhao, Jian; Li, Tianyi; Chung, Haeyong (December 2022, IEEE Transactions on Visualization and Computer Graphics)

Full Text Available
Notebook Archaeology: Inferring Provenance from Computational Notebooks

https://doi.org/10.1007/978-3-030-80960-7_7

Koop, David (July 2021, Provenance and Annotation of Data and Processes. IPAW 2020, IPAW 2021.)
null (Ed.)
Full Text Available
Interactive Bicluster Aggregation in Bipartite Graphs

https://doi.org/10.1109/VISUAL.2019.8933546

Sun, Maoyuan; Koop, David; Zhao, Jian; North, Chris; Ramakrishnan, Naren (December 2019, 2019 IEEE Visualization Conference (VIS))

Exploring coordinated relationships is important for sense making of data in various fields, such as intelligence analysis. To support such investigations, visual analysis tools use biclustering to mine relationships in bipartite graphs and visualize the resulting biclusters with standard graph visualization techniques. Due to overlaps among biclusters, such visualizations can be cluttered (e.g., with many edge crossings), when there are a large number of biclusters. Prior work attempted to resolve this problem by automatically ordering nodes in a bipartite graph. However, visual clutter is still a serious problem, since the number of displayed biclusters remains unchanged. We propose bicluster aggregation as an alternative approach, and have developed two methods of interactively merging biclusters. These interactive bicluster aggregations help organize similar biclusters and reduce the number of displayed biclusters. Initial expert feedback indicates potential usefulness of these techniques in practice.
more » « less
Full Text Available

Search for: All records