skip to main content


Title: Experimental Survey on Power Dissipation of k-mer-Handling Data Structures for Mobile Bioinformatics
Mobile sequencing technologies, including Oxford Nanopore’s MinION, MklC, and SmidgION, are bringing genomics in the palm of a hand, opening unprecedented new opportunities in clinical and ecological research and translational applications. While sequencers now need only a USB outlet and provide on-board preprocessing (e.g., base calling), the main data analysis phases are tied to an available broadband Internet connection and cloud computing. Yet the ubiquity of tablets and smartphones, along with their increase in computational power, makes them a perfect candidate for enabling mobile/edge mobile bioinformatics analytics. Also, in on site experimental settings tablets and smartphones are preferable to standard computers due to resilience to humidity or spills, and ease of sterilization. We here present an experimental study on power dissipation, aiming at reducing the battery consumption that currently impedes the execution of intensive bioinformatics analytics pipelines. In particular, we investigated the effects of assorted data structures (including hash tables, vectors, balanced trees, tries) employed in some of the most common tasks of a bioinformatics pipeline, the k- mer representation and counting. By employing a thermal camera, we show how different k-mer-handling data structures impact the power dissipation on a smartphone, finding that a cache-oblivious data structure reduces power dissipation (up to 26% better than others). In conclusion, the choice of data structures in mobile bioinformatics must consider not only computing efficiency (e.g., succinct data structures to reduce RAM usage), but also power consumption of mobile devices that heavily rely on batteries in order to function.  more » « less
Award ID(s):
2013998
NSF-PAR ID:
10389121
Author(s) / Creator(s):
;
Date Published:
Journal Name:
2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
Page Range / eLocation ID:
3201 to 3206
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Autonomous mobile robots (AMRs) have been widely utilized in industry to execute various on-board computer-vision applications including autonomous guidance, security patrol, object detection, and face recognition. Most of the applications executed by an AMR involve the analysis of camera images through trained machine learning models. Many research studies on machine learning focus either on performance without considering energy efficiency or on techniques such as pruning and compression to make the model more energy-efficient. However, most previous work do not study the root causes of energy inefficiency for the execution of those applications on AMRs. The computing stack on an AMR accounts for 33% of the total energy consumption and can thus highly impact the battery life of the robot. Because recharging an AMR may disrupt the application execution, it is important to efficiently utilize the available energy for maximized battery life. In this paper, we first analyze the breakdown of power dissipation for the execution of computer-vision applications on AMRs and discover three main root causes of energy inefficiency: uncoordinated access to sensor data, performance-oriented model inference execution, and uncoordinated execution of concurrent jobs. In order to fix these three inefficiencies, we propose E2M, an energy-efficient middleware software stack for autonomous mobile robots. First, E2M regulates the access of different processes to sensor data, e.g., camera frames, so that the amount of data actually captured by concurrently executing jobs can be minimized. Second, based on a predefined per-process performance metric (e.g., safety, accuracy) and desired target, E2M manipulates the process execution period to find the best energy-performance trade off. Third, E2M coordinates the execution of the concurrent processes to maximize the total contiguous sleep time of the computing hardware for maximized energy savings. We have implemented a prototype of E2M on a real-world AMR. Our experimental results show that, compared to several baselines, E2M leads to 24% energy savings for the computing platform, which translates into an extra 11.5% of battery time and 14 extra minutes of robot runtime, with a performance degradation lower than 7.9% for safety and 1.84% for accuracy. 
    more » « less
  2. Abstract Motivation Despite numerous RNA-seq samples available at large databases, most RNA-seq analysis tools are evaluated on a limited number of RNA-seq samples. This drives a need for methods to select a representative subset from all available RNA-seq samples to facilitate comprehensive, unbiased evaluation of bioinformatics tools. In sequence-based approaches for representative set selection (e.g. a k-mer counting approach that selects a subset based on k-mer similarities between RNA-seq samples), because of the large numbers of available RNA-seq samples and of k-mers/sequences in each sample, computing the full similarity matrix using k-mers/sequences for the entire set of RNA-seq samples in a large database (e.g. the SRA) has memory and runtime challenges; this makes direct representative set selection infeasible with limited computing resources. Results We developed a novel computational method called ‘hierarchical representative set selection’ to handle this challenge. Hierarchical representative set selection is a divide-and-conquer-like algorithm that breaks representative set selection into sub-selections and hierarchically selects representative samples through multiple levels. We demonstrate that hierarchical representative set selection can achieve summarization quality close to that of direct representative set selection, while largely reducing runtime and memory requirements of computing the full similarity matrix (up to 8.4× runtime reduction and 5.35× memory reduction for 10 000 and 12 000 samples respectively that could be practically run with direct subset selection). We show that hierarchical representative set selection substantially outperforms random sampling on the entire SRA set of RNA-seq samples, making it a practical solution to representative set selection on large databases like the SRA. Availability and implementation The code is available at https://github.com/Kingsford-Group/hierrepsetselection and https://github.com/Kingsford-Group/jellyfishsim. Supplementary information Supplementary data are available at Bioinformatics online. 
    more » « less
  3. Characterization of antibiotic resistance genes (ARGs) from high-throughput sequencing data of metagenomics and cultured bacterial samples is a challenging task, with the need to account for both computational (e.g., string algorithms) and biological (e.g., gene transfers, rearrangements) aspects. Curated ARG databases exist together with assorted ARG classification approaches (e.g., database alignment, machine learning). Besides ARGs that naturally occur in bacterial strains or are acquired through mobile elements, there are chromosomal genes that can render a bacterium resistant to antibiotics through point mutations, i.e., ARG variants (ARGVs). While ARG repositories also collect ARGVs, there are only a few tools that are able to identify ARGVs from metagenomics and high throughput sequencing data, with a number of limitations (e.g., pre-assembly,a posterioriverification of mutations, or specification of species). In this work we present thek-mer, i.e., strings of fixed lengthk, ARGV analyzer – KARGVA – an open-source, multi-platform tool that provides: (i) anad hoc, large ARGV database derived from multiple sources; (ii) input capability for various types of high-throughput sequencing data; (iii) a three-way, hash-based,k-mer search setup to process data efficiently, linkingk-mers to ARGVs,k-mers to point mutations, and ARGVs tok-mers, respectively; (iv) a statistical filter on sequence classification to reduce type I and II errors. On semi-synthetic data, KARGVA provides very high accuracy even in presence of high sequencing errors or mutations (99.2 and 86.6% accuracy within 1 and 5% base change rates, respectively), and genome rearrangements (98.2% accuracy), with robust performance onad hocfalse positive sets. On data from the worldwide MetaSUB consortium, comprising 3,700+ metagenomics experiments, KARGVA identifies more ARGVs than Resistance Gene Identifier (4.8x) and PointFinder (6.8x), yet all predictions are below the expected false positive estimates. The prevalence of ARGVs is correlated to ARGs but ecological characteristics do not explain well ARGV variance. KARGVA is publicly available athttps://github.com/DataIntellSystLab/KARGVAunder MIT license.

     
    more » « less
  4. Although Augmented Reality (AR) can be easily implemented with most smartphones and tablets today, the investigation of distance perception with these types of devices has been limited. In this paper, we question whether the distance of a virtual human, e.g., avatar, seen through a smartphone or tablet display is perceived accurately. We also investigate, due to the Covid-19 pandemic and increased sensitivity to distances to others, whether a coughing avatar that either does or does not have a mask on affects distance estimates compared to a static avatar. We performed an experiment in which all participants estimated the distances to avatars that were either static or coughing, with and without masks on. Avatars were placed at a range of distances that would be typical for interaction, i.e., action space. Data on judgments of distance to the varying avatars was collected in a distributed manner by deploying an app for smartphones. Results showed that participants were fairly accurate in estimating the distance to all avatars, regardless of coughing condition or mask condition. Such findings suggest that mobile AR applications can be used to obtain accurate estimations of distances to virtual others "in the wild," which is promising for using AR for simulations and training applications that require precise distance estimates. 
    more » « less
  5. Recent advances in computer vision has led to a growth of interest in deploying visual analytics model on mobile devices. However, most mobile devices have limited computing power, which prohibits them from running large scale visual analytics neural networks. An emerging approach to solve this problem is to offload the computation of these neural networks to computing resources at an edge server. Efficient computation offloading requires optimizing the trade-off between multiple objectives including compressed data rate, analytics performance, and computation speed. In this work, we consider a “split computation” system to offload a part of the computation of the YOLO object detection model. We propose a learnable feature compression approach to compress the intermediate YOLO features with lightweight computation. We train the feature compression and decompression module together with the YOLO model to optimize the object detection accuracy under a rate constraint. Compared to baseline methods that apply either standard image compression or learned image compression at the mobile and perform image de-compression and YOLO at the edge, the proposed system achieves higher detection accuracy at the low to medium rate range. Furthermore, the proposed system requires substantially lower computation time on the mobile device with CPU only. 
    more » « less