skip to main content


Title: Randomization‐based statistical inference: A resampling and simulation infrastructure
Summary

Statistical inference involves drawing scientifically‐based conclusions describing natural processes or observable phenomena from datasets with intrinsic random variation. We designed, implemented, and validated a new portable randomization‐based statistical inference infrastructure (http://socr.umich.edu/HTML5/Resampling_Webapp) that blends research‐driven data analytics and interactive learning, and provides a backend computational library for managing large amounts of simulated or user‐provided data.

We designed, implemented and validated a new portable randomization‐based statistical inference infrastructure (http://socr.umich.edu/HTML5/Resampling_Webapp) that blends research‐driven data analytics and interactive learning, and provides a backend computational library for managing large amounts of simulated or user‐provided data. The core of this framework is a modern randomization webapp, which may be invoked on any device supporting a JavaScript‐enabled web browser. We demonstrate the use of these resources to analyse proportion, mean and other statistics using simulated (virtual experiments) and observed (e.g. Acute Myocardial Infarction, Job Rankings) data. Finally, we draw parallels between parametric inference methods and their distribution‐free alternatives.

The Randomization and Resampling webapp can be used for data analytics, as well as for formal, in‐class and informal, out‐of‐the‐classroom learning and teaching of different scientific concepts. Such concepts include sampling, random variation, computational statistical inference and data‐driven analytics. The entire scientific community may utilize, test, expand, modify or embed these resources (data, source‐code, learning activity, webapp) without any restrictions.

 
more » « less
NSF-PAR ID:
10246842
Author(s) / Creator(s):
 ;  ;  ;  
Publisher / Repository:
Wiley-Blackwell
Date Published:
Journal Name:
Teaching Statistics
Volume:
40
Issue:
2
ISSN:
0141-982X
Page Range / eLocation ID:
p. 64-73
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Background

    Direct-sequencing technologies, such as Oxford Nanopore’s, are delivering long RNA reads with great efficacy and convenience. These technologies afford an ability to detect post-transcriptional modifications at a single-molecule resolution, promising new insights into the functional roles of RNA. However, realizing this potential requires new tools to analyze and explore this type of data.

    Result

    Here, we present Sequoia, a visual analytics tool that allows users to interactively explore nanopore sequences. Sequoia combines a Python-based backend with a multi-view visualization interface, enabling users to import raw nanopore sequencing data in a Fast5 format, cluster sequences based on electric-current similarities, and drill-down onto signals to identify properties of interest. We demonstrate the application of Sequoia by generating and analyzing ~ 500k reads from direct RNA sequencing data of human HeLa cell line. We focus on comparing signal features from m6A and m5C RNA modifications as the first step towards building automated classifiers. We show how, through iterative visual exploration and tuning of dimensionality reduction parameters, we can separate modified RNA sequences from their unmodified counterparts. We also document new, qualitative signal signatures that characterize these modifications from otherwise normal RNA bases, which we were able to discover from the visualization.

    Conclusions

    Sequoia’s interactive features complement existing computational approaches in nanopore-based RNA workflows. The insights gleaned through visual analysis should help users in developing rationales, hypotheses, and insights into the dynamic nature of RNA. Sequoia is available athttps://github.com/dnonatar/Sequoia.

     
    more » « less
  2. Abstract

    Communicating and interpreting uncertainty in ecological model predictions is notoriously challenging, motivating the need for new educational tools, which introduce ecology students to core concepts in uncertainty communication. Ecological forecasting, an emerging approach to estimate future states of ecological systems with uncertainty, provides a relevant and engaging framework for introducing uncertainty communication to undergraduate students, as forecasts can be used as decision support tools for addressing real‐world ecological problems and are inherently uncertain. To provide critical training on uncertainty communication and introduce undergraduate students to the use of ecological forecasts for guiding decision‐making, we developed a hands‐on teaching module within the Macrosystems Environmental Data‐Driven Inquiry and Exploration (EDDIE;MacrosystemsEDDIE.org) educational program. Our module used an active learning approach by embedding forecasting activities in an R Shiny application to engage ecology students in introductory data science, ecological modeling, and forecasting concepts without needing advanced computational or programming skills. Pre‐ and post‐module assessment data from more than 250 undergraduate students enrolled in ecology, freshwater ecology, and zoology courses indicate that the module significantly increased students' ability to interpret forecast visualizations with uncertainty, identify different ways to communicate forecast uncertainty for diverse users, and correctly define ecological forecasting terms. Specifically, students were more likely to describe visual, numeric, and probabilistic methods of uncertainty communication following module completion. Students were also able to identify more benefits of ecological forecasting following module completion, with the key benefits of using forecasts for prediction and decision‐making most commonly described. These results show promise for introducing ecological model uncertainty, data visualizations, and forecasting into undergraduate ecology curricula via software‐based learning, which can increase students' ability to engage and understand complex ecological concepts.

     
    more » « less
  3. Abstract

    Floral odours play an important role in attracting insect pollinators. Because pollinators visit flowers to obtain pollen and nectar rewards, they should prefer floral odour profiles associated with the highest‐rewarding flowers (honest signals). In previous work, bumblebees exhibited a preference for flowers from outbred over inbredMimulus guttatusplants. Pollen is the only floral reward inM. guttatus, and pollen viability (a reliable indicator of protein content) is reduced in inbred plants. Yet, differences in pollen viability did not explain the observed preferences.

    In this study, we examined the floral volatile profiles of inbred and outbredM. guttatusto identify inbreeding effects and associations between volatile compounds and the number of viable pollen grains per flower, designated “PRQ” (pollen reward quality). We also conducted pairwise choice tests withBombus impatiensto evaluate the ability of bees to discriminate between odours of rewarding and non‐rewarding flowers and to determine whether bumblebee preferences are explained by differences in the floral odours of inbred and outbred plants.

    Inbred plants exhibited reduced emission of β‐trans‐bergamotene, the second‐most abundant compound in the volatile blend of outbred plants. Furthermore, pollen and fertile anthers emitted nonadecane. Six other compounds in the floral blend were positively correlated withPRQ. There was no overlap between compounds affected by inbreeding and compounds associated withPRQ.

    Even when given prior experience foraging onM. guttatus, bumblebees did not distinguish between the floral odours of rewarding and non‐rewarding outbred plants. However, they preferred floral odours from non‐rewarding outbred plants over rewarding inbred plants. Bumblebees without prior experience of flowers preferred volatile blends with higher versus lower amounts of β‐trans‐bergamotene.

    Taken together, these results suggest that the volatile emissions ofM. guttatusprovide reliable indicators of pollen rewards (potential honest signals), but that the preference of bumblebees for outbred plants is not driven by these cues but rather by a sensory bias for β‐trans‐bergamotene. This may represent a subtle form of deceit‐pollination that allows plants to attract pollinators while minimizing investment in costly rewards.

    Aplain language summaryis available for this article.

     
    more » « less
  4. Abstract

    The Soybean Gene Atlas project provides a comprehensive map for understanding gene expression patterns in major soybean tissues from flower, root, leaf, nodule, seed, and shoot and stem. The RNA‐Seq data generated in the project serve as a valuable resource for discovering tissue‐specific transcriptome behavior of soybean genes in different tissues. We developed a computational pipeline for Soybean context‐specific network (SoyCSN) inference with a suite of prediction tools to analyze, annotate, retrieve, and visualize soybean context‐specific networks at both transcriptome and interactome levels. BicMix and Cross‐Conditions Cluster Detection algorithms were applied to detect modules based on co‐expression relationships across all the tissues. Soybean context‐specific interactomes were predicted by combining soybean tissue gene expression and protein–protein interaction data. Functional analyses of these predicted networks provide insights into soybean tissue specificities. For example, under symbiotic, nitrogen‐fixing conditions, the constructed soybean leaf network highlights the connection between the photosynthesis function and rhizobium–legume symbiosis. SoyCSN data and all its results are publicly available via an interactive web service within the Soybean Knowledge Base (SoyKB) athttp://soykb.org/SoyCSN. SoyCSN provides a useful web‐based access for exploring context specificities systematically in gene regulatory mechanisms and gene relationships for soybean researchers and molecular breeders.

     
    more » « less
  5. Abstract

    Toytreeis a lightweight Python library for programmatically visualizing and manipulating tree‐based data structures. It implements a minimalist design aesthetic and modern plotting architecture suited for interactive coding in IPython/Jupyter.

    Tree drawings are generated in HTML using thetoyplotlibrary backend, and display natively in Jupyter notebooks with interactivity features. Tree drawings can be combined with other plotting functions from thetoyplotlibrary (e.g. scatterplots, histograms) to create composite figures on a shared coordinate grid, and can be exported to additional formats including PNG, PDF and SVG.

    To parse and store tree data,toytreeuses a modified fork of theete3TreeNode object, which includes functions for manipulating, annotating and comparing trees.Toytreeintegrates these functions with a plotting layout to allow node values to be extracted from trees in the correct order to style nodes for plotting. In addition,toytreeprovides functions for parsing additional tree formats, generating random trees, inferring consensus trees and drawing grids or clouds from multiple trees to visualize discordance.

    The goal oftoytreeis to provide a simple Python equivalent to commonly used tree manipulation and plotting libraries in R, and in doing so, to promote further development of phylogenetic and other tree‐based methods in Python.Toytreeis released under the GPLv3 license. Source code is available on GitHub and documentation is available athttps://toytree.readthedocs.io.

     
    more » « less