Motivation Annotations of biochemical models provide details of chemical species, documentation of chemical reactions, and other essential information. Unfortunately, the vast majority of biochemical models have few, if any, annotations, or the annotations provide insufficient detail to understand the limitations of the model. The quality and quantity of annotations can be improved by developing tools that recommend annotations. For example, recommender tools have been developed for annotations of genes. Although annotating genes is conceptually similar to annotating biochemical models, there are important technical differences that make it difficult to directly apply this prior work.
Results We present AMAS, a system that predicts annotations for elements of models represented in the Systems Biology Markup Language (SBML) community standard. We provide a general framework for predicting model annotations for a query element based on a database of annotated reference elements and a match score function that calculates the similarity between the query element and reference elements. The framework is instantiated to specific element types (e.g., species, reactions) by specifying the reference database (e.g., ChEBI for species) and the match score function (e.g., string similarity). We analyze the computational efficiency and prediction quality of AMAS for species and reactions in BiGG and BioModels and find that it has sub-second response times and accuracy between 80% and 95% depending on specifics of what is predicted. We have incorporated AMAS into an open-source, pip-installable Python package that can run as a command-line tool that predicts and adds annotations to species and reactions to an SBML model.
Availability Our project is hosted at https://github.com/sys-bio/AMAS, where we provide examples, documentation, and source code files. Our source code is licensed under the MIT open-source license.
more »
« less
Papercode: Generating Paper-Based User Interfaces for Code Review, Annotation, and Teaching
Paper can be a powerful and flexible user interface that lets programmers read through large amounts of code. Using off-the-shelf equipment, how can we generate a paper-based UI that supports code review, annotation, and teaching? To address this question, we ran formative studies and developed Papercode, a system that formats source code for printing on standard paper. Users can interact with that code on paper, make freehand annotations, then transfer annotations back to the computer by taking photos with a normal phone camera. Papercode optimizes source code for on-paper readability with tunable heuristics such as code-aware line wraps and page breaks, quick references to function and global definitions, moving comments and short function calls into margins, and topologically sorting functions in dependency order.
more »
« less
- Award ID(s):
- 1845900
- PAR ID:
- 10210726
- Date Published:
- Journal Name:
- ACM Symposium on User Interface Software and Technology (adjunct proceedings)
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Much software, whether beneficent or malevolent, is distributed only as binaries, sans source code. Absent source code, understanding binaries' behavior can be quite challenging, especially when compiled under higher levels of compiler optimization. These optimizations can transform comprehensible, ``natural" source constructions into something entirely unrecognizable. Reverse engineering binaries, especially those suspected of being malevolent or guilty of intellectual property theft, are important and time-consuming tasks. There is a great deal of interest in tools to ``decompile" binaries back into more natural source code to aid reverse engineering. Decompilation involves several desirable steps, including recreating source-language constructions, variable names, and perhaps even comments. One central step in creating binaries is optimizing function calls, using steps such as inlining. Recovering these (possibly inlined) function calls from optimized binaries is an essential task that most state-of-the-art decompiler tools try to do but do not perform very well. In this paper, we evaluate a supervised learning approach to the problem of recovering optimized function calls. We leverage open-source software and develop an automated labeling scheme to generate a reasonably large dataset of binaries labeled with actual function usages. We augment this large but limited labeled dataset with a pre-training step, which learns the decompiled code statistics from a much larger unlabeled dataset. Thus augmented, our learned labeling model can be combined with an existing decompilation tool, Ghidra, to achieve substantially improved performance in function call recovery, especially at higher levels of optimization.more » « less
-
Abstract Gradual typing allows programs to enjoy the benefits of both static typing and dynamic typing. While it is often desirable to migrate a program from more dynamically typed to more statically typed or vice versa, gradual typing itself does not provide a way to facilitate this migration. This places the burden on programmers who have to manually add or remove type annotations. Besides the general challenge of adding type annotations to dynamically typed code, there are subtle interactions between these annotations in gradually typed code that exacerbate the situation. For example, to migrate a program to be as static as possible, in general, all possible combinations of adding or removing type annotations from parameters must be tried out and compared. In this paper, we address this problem by developing migrational typing , which efficiently types all possible ways of replacing dynamic types with fully static types for a gradually typed program. The typing result supports automatically migrating a program to be as static as possible or introducing the least number of dynamic types necessary to remove a type error. The approach can be extended to support user-defined criteria about which annotations to modify. We have implemented migrational typing and evaluated it on large programs. The results show that migrational typing scales linearly with the size of the program and takes only 2–4 times longer than plain gradual typing.more » « less
-
null (Ed.)Recent research in empirical software engineering is applying techniques from neurocognitive science and breaking new grounds in the ways that researchers can model and analyze the cognitive processes of developers as they interact with software artifacts. However, given the novelty of this line of research, only one tool exists to help researchers represent and analyze this kind of multi-modal biometric data. While this tool does help with visualizing temporal eyetracking and physiological data, it does not allow for the mapping of physiological data to source code elements, instead projecting information over images of code. One drawback of this is that researchers are still unable to meaningfully combine and map physiological and eye tracking data to source code artifacts. The use of images also bars the support of long or multiple code files, which prevents researchers from analyzing data from experiments conducted in realistic settings. To address these drawbacks, we propose VITALSE, a tool for the interactive visualization of combined multi-modal biometric data for software engineering tasks. VITALSE provides interactive and customizable temporal heatmaps created with synchronized eyetracking and biometric data. The tool supports analysis on multiple files, user defined annotations for points of interest over source code elements, and high level customizable metric summaries for the provided dataset. VITALSE, a video demonstration, and sample data to demonstrate its capabilities can be found at http://www.vitalse.app.more » « less
-
To accelerate software development, much research has been performed to help people understand and reuse the huge amount of available code resources. Two important tasks have been widely studied: code retrieval, which aims to retrieve code snippets relevant to a given natural language query from a code base, and code annotation, where the goal is to annotate a code snippet with a natural language description. Despite their advancement in recent years, the two tasks are mostly explored separately. In this work, we investigate a novel perspective of Code annotation for Code retrieval (hence called “CoaCor”), where a code annotation model is trained to generate a natural language annotation that can represent the semantic meaning of a given code snippet and can be leveraged by a code retrieval model to better distinguish relevant code snippets from others. To this end, we propose an effective framework based on reinforcement learning, which explicitly encourages the code annotation model to generate annotations that can be used for the retrieval task. Through extensive experiments, we show that code annotations generated by our framework are much more detailed and more useful for code retrieval, and they can further improve the performance of existing code retrieval models significantly.more » « less