skip to main content


Title: Let’s Do Ranking & Selection
Many tutorials and survey papers have been written on ranking & selection because it is such a useful tool for simulation optimization when the number of feasible solutions or “systems” is small enough that all of them can be simulated. Cheap, ubiquitous, parallel computing has greatly increased the “all of them can be simulated” limit. Naturally these tutorials and surveys have focused on the underlying theory of R&S and have provided pseudocode procedures. This tutorial, by contrast, emphasizes applications, programming and interpretation of R&S, using the R programming language for illustration. Readers (and the audience) can download the code and follow along with the examples, but no experience with R is needed.  more » « less
Award ID(s):
1854562
NSF-PAR ID:
10427805
Author(s) / Creator(s):
Editor(s):
Feng, B.; Pedrielli, G; Peng, Y.; Shashaani, S.; Song, E.; Corlu, C.; Lee, L.; Chew, E.; Roeder, T.; Lendermann, P.
Date Published:
Journal Name:
Proceedings of the 2022 Winter Simulation Conference
Page Range / eLocation ID:
180-191
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. B. Feng, B ; G. Pedrielli, G ; Peng, Y ; Shashaani, S. ; Song, E. ; Corlu, C. ; Lee, L. ; Chew, E. ; Roeder, T. ; Lendermann, P. (Ed.)
    Many tutorials and survey papers have been written on ranking & selection because it is such a useful tool for simulation optimization when the number of feasible solutions or “systems” is small enough that all of them can be simulated. Cheap, ubiquitous, parallel computing has greatly increased the “all of them can be simulated” limit. Naturally these tutorials and surveys have focused on the underlying theory of R&S and have provided pseudocode procedures. This tutorial, by contrast, emphasizes applications, programming and interpretation of R&S, using the R programming language for illustration. Readers (and the audience) can download the code and follow along with the examples, but no experience with R is needed. 
    more » « less
  2. Data science courses and tutorials have grown popular in recent years, yet they are still taught using production-grade programming tools (e.g., R, MATLAB, and Python IDEs) within desktop computing environments. Although powerful, these tools present high barriers to entry for novices, forcing them to grapple with the extrinsic complexities of software installation and configuration, data file management, data parsing, and Unix-like command-line interfaces. To lower the barrier for novices to get started with learning data science, we created DS.js, a bookmarklet that embeds a data science programming environment directly into any existing webpage. By transforming any webpage into an examplecentric IDE, DS.js eliminates the aforementioned complexities of desktop-based environments and turns the entire web into a rich substrate for learning data science. DS.js automatically parses HTML tables and CSV/TSV data sets on the target webpage, attaches code editors to each data set, provides a data table manipulation and visualization API designed for novices, and gives instructional scaffolding in the form of bidirectional previews of how the user’s code and data relate. 
    more » « less
  3. Ecological Dynamics and Forecasting' is a semester-long course to introduce students to the fundamentals of ecological dynamics and forecasting. This course implements paper-based discussion to introduce students to concepts and ideas and R-based tutorials for hands-on application and training. The course material includes a reading list with prompting questions for discussions, teachers notes for guiding discussions, lecture notes for live coding demonstrations, and video presentations of all R tutorials. This course material can be used either as self-directed learning or as all or part of a college or university course. Individual learners have access to all of the necessary material - including discussion questions and instructor notes - on the website. The course focuses on papers with an open-access or free-to-read version where possible, though some materials still rely on access to closed-access papers. The course is structured around two sessions per week, with most weeks consisting of a one hour paper discussion session and a 1-2 hour session focused on applications in R. R tutorials use publicly available ecological datasets to provide realistic applications. Because the material is organized around content themes, instructors can modify and remix materials based on their course goals and student levels of background knowledge. These course materials have been taught for several years at the authors’ university and have also generated significant online engagement with course videos tens of thousands of times. 
    more » « less
  4. Abstract Background

    Exploring metagenomic contigs and “binning” them into metagenome-assembled genomes (MAGs) are essential for the delineation of functional and evolutionary guilds within microbial communities. Despite the advances in automated binning algorithms, their capabilities in recovering MAGs with accuracy and biological relevance are so far limited. Researchers often find that human involvement is necessary to achieve representative binning results. This manual process however is expertise demanding and labor intensive, and it deserves to be supported by software infrastructure.

    Results

    We present BinaRena, a comprehensive and versatile graphic interface dedicated to aiding human operators to explore metagenome assemblies via customizable visualization and to associate contigs with bins. Contigs are rendered as an interactive scatter plot based on various data types, including sequence metrics, coverage profiles, taxonomic assignments, and functional annotations. Various contig-level operations are permitted, such as selection, masking, highlighting, focusing, and searching. Binning plans can be conveniently edited, inspected, and compared visually or using metrics including silhouette coefficient and adjusted Rand index. Completeness and contamination of user-selected contigs can be calculated in real time.

    In demonstration of BinaRena’s usability, we show that it facilitated biological pattern discovery, hypothesis generation, and bin refinement in a complex tropical peatland metagenome. It enabled isolation of pathogenic genomes within closely related populations from the gut microbiota of diarrheal human subjects. It significantly improved overall binning quality after curating results of automated binners using a simulated marine dataset.

    Conclusions

    BinaRena is an installation-free, dependency-free, client-end web application that operates directly in any modern web browser, facilitating ease of deployment and accessibility for researchers of all skill levels. The program is hosted athttps://github.com/qiyunlab/binarena, together with documentation, tutorials, example data, and a live demo. It effectively supports human researchers in intuitive interpretation and fine tuning of metagenomic data.

     
    more » « less
  5. null (Ed.)
    The importance of fault tolerance continues to increase for HPC applications. The continued growth in size and complexity of HPC systems, and of the applications them- selves, is leading to an increased likelihood of failures during execution. However, most HPC programming models do not have a built-in fault tolerance mechanism. Instead, application developers usually rely on external support such as application- level checkpoint-restart (C/R) libraries to make their codes fault tolerant. However, this increases the burden on the application developer, who must use the libraries carefully to ensure correct behavior and to minimize the overheads. The C/R routines will be employed to save the values of all needed program variables at the places in the code where they are invoked. It is important for correctness that the program data is in a consistent state at these places. It is non-trivial to determine such points in OpenSHMEM, which relies upon single-sided communications to provide high performance. The amount of data to be collected, and the frequency with which this is performed, must also be carefully tuned, as the overheads introduced by C/R calls can be extremely high. There is very little prior work on checkpoint-restart support in the context of the OpenSHMEM programming interface. In this paper, we introduce OpenSHMEM and describe the challenges it poses for checkpointing. We identify the safest places for inserting C/R calls in an OpenSHMEM program and describe a straightforward approach for identifying the data that needs to be checkpointed at these positions in the code. We provide these two functionalities in a tool that exploits compiler analyses to propose checkpoints and the sets of data for saving at them, to the application developer. 
    more » « less