skip to main content


Title: Visualizing the tape of life: exploring evolutionary history with virtual reality
Understanding the evolutionary dynamics created by a given evolutionary algorithm is a critical step in determining which ones are most likely to produce desirable outcomes for a given problem. While it is relatively easy to come up with hypotheses that could plausibly explain observed evolutionary outcomes, we often fail to take the next step of confirming that our proposed mechanism accurately describes the underlying evolutionary dynamics. Visualization is a powerful tool for exploring evolutionary history as it actually played out. We can create visualizations that summarize the evolutionary history of a population or group of populations by drawing representative lineages on top of the fitness landscape being traversed. This approach integrates information about the adaptations that took place with information about the evolutionary pressures they were being subjected to as they evolved. However, these visualizations can be challenging to depict on a two-dimensional surface, as they integrate multiple forms of three-dimensional (or more) data. Here, we propose an alternative: taking advantage of recent advances in virtual reality to view evolutionary history in three dimensions. This technique produces an intuitive and detailed illustration of evolutionary processes. A demo of our visualization is available here: https://emilydolson.github.io/fitness_landscape_visualizations.  more » « less
Award ID(s):
1655715
NSF-PAR ID:
10104620
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Proceedings of the Genetic and Evolutionary Computation Conference Companion on - GECCO '18
Page Range / eLocation ID:
1553 to 1559
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The ongoing COVID-19 pandemic has justifiably captured the attention of people around the world since late 2019. It has produced in many people a new perspective on or, indeed, a new realization about our potential vulnerability to emerging infectious diseases. However, our species has experienced numerous catastrophic disease pandemics in the past, and in addition to concerns about the harm being produced during the pandemic and the potential long-term sequelae of the disease, what has been frustrating for many public health experts, anthropologists, and historians is awareness that many of the outcomes of COVID-19 are not inevitable and might have been preventable had we actually heeded lessons from the past. We are currently witnessing variation in exposure risk, symptoms, and mortality from COVID-19, but these patterns are not surprising given what we know about past pandemics. We review here the literature on the demographic and evolutionary consequences of the Second Pandemic of Plague (ca. fourteenth–nineteenth centuries C.E.) and the 1918 influenza pandemic, two of the most devastating pandemics in recorded human history. These both provide case studies of the ways in which sociocultural and environmental contexts shape the experiences and outcomes of pandemic disease. Many of the factors at work during these past pandemics continue to be reproduced in modern contexts, and ultimately our hope is that by highlighting the outcomes that are at least theoretically preventable, we can leverage our knowledge about past experiences to prepare for and respond to disease today.

     
    more » « less
  2. Obeid, Iyad ; Selesnick, Ivan ; Picone, Joseph (Ed.)
    The Temple University Hospital Seizure Detection Corpus (TUSZ) [1] has been in distribution since April 2017. It is a subset of the TUH EEG Corpus (TUEG) [2] and the most frequently requested corpus from our 3,000+ subscribers. It was recently featured as the challenge task in the Neureka 2020 Epilepsy Challenge [3]. A summary of the development of the corpus is shown below in Table 1. The TUSZ Corpus is a fully annotated corpus, which means every seizure event that occurs within its files has been annotated. The data is selected from TUEG using a screening process that identifies files most likely to contain seizures [1]. Approximately 7% of the TUEG data contains a seizure event, so it is important we triage TUEG for high yield data. One hour of EEG data requires approximately one hour of human labor to complete annotation using the pipeline described below, so it is important from a financial standpoint that we accurately triage data. A summary of the labels being used to annotate the data is shown in Table 2. Certain standards are put into place to optimize the annotation process while not sacrificing consistency. Due to the nature of EEG recordings, some records start off with a segment of calibration. This portion of the EEG is instantly recognizable and transitions from what resembles lead artifact to a flat line on all the channels. For the sake of seizure annotation, the calibration is ignored, and no time is wasted on it. During the identification of seizure events, a hard “3 second rule” is used to determine whether two events should be combined into a single larger event. This greatly reduces the time that it takes to annotate a file with multiple events occurring in succession. In addition to the required minimum 3 second gap between seizures, part of our standard dictates that no seizure less than 3 seconds be annotated. Although there is no universally accepted definition for how long a seizure must be, we find that it is difficult to discern with confidence between burst suppression or other morphologically similar impressions when the event is only a couple seconds long. This is due to several reasons, the most notable being the lack of evolution which is oftentimes crucial for the determination of a seizure. After the EEG files have been triaged, a team of annotators at NEDC is provided with the files to begin data annotation. An example of an annotation is shown in Figure 1. A summary of the workflow for our annotation process is shown in Figure 2. Several passes are performed over the data to ensure the annotations are accurate. Each file undergoes three passes to ensure that no seizures were missed or misidentified. The first pass of TUSZ involves identifying which files contain seizures and annotating them using our annotation tool. The time it takes to fully annotate a file can vary drastically depending on the specific characteristics of each file; however, on average a file containing multiple seizures takes 7 minutes to fully annotate. This includes the time that it takes to read the patient report as well as traverse through the entire file. Once an event has been identified, the start and stop time for the seizure is stored in our annotation tool. This is done on a channel by channel basis resulting in an accurate representation of the seizure spreading across different parts of the brain. Files that do not contain any seizures take approximately 3 minutes to complete. Even though there is no annotation being made, the file is still carefully examined to make sure that nothing was overlooked. In addition to solely scrolling through a file from start to finish, a file is often examined through different lenses. Depending on the situation, low pass filters are used, as well as increasing the amplitude of certain channels. These techniques are never used in isolation and are meant to further increase our confidence that nothing was missed. Once each file in a given set has been looked at once, the annotators start the review process. The reviewer checks a file and comments any changes that they recommend. This takes about 3 minutes per seizure containing file, which is significantly less time than the first pass. After each file has been commented on, the third pass commences. This step takes about 5 minutes per seizure file and requires the reviewer to accept or reject the changes that the second reviewer suggested. Since tangible changes are made to the annotation using the annotation tool, this step takes a bit longer than the previous one. Assuming 18% of the files contain seizures, a set of 1,000 files takes roughly 127 work hours to annotate. Before an annotator contributes to the data interpretation pipeline, they are trained for several weeks on previous datasets. A new annotator is able to be trained using data that resembles what they would see under normal circumstances. An additional benefit of using released data to train is that it serves as a means of constantly checking our work. If a trainee stumbles across an event that was not previously annotated, it is promptly added, and the data release is updated. It takes about three months to train an annotator to a point where their annotations can be trusted. Even though we carefully screen potential annotators during the hiring process, only about 25% of the annotators we hire survive more than one year doing this work. To ensure that the annotators are consistent in their annotations, the team conducts an interrater agreement evaluation periodically to ensure that there is a consensus within the team. The annotation standards are discussed in Ochal et al. [4]. An extended discussion of interrater agreement can be found in Shah et al. [5]. The most recent release of TUSZ, v1.5.2, represents our efforts to review the quality of the annotations for two upcoming challenges we hosted: an internal deep learning challenge at IBM [6] and the Neureka 2020 Epilepsy Challenge [3]. One of the biggest changes that was made to the annotations was the imposition of a stricter standard for determining the start and stop time of a seizure. Although evolution is still included in the annotations, the start times were altered to start when the spike-wave pattern becomes distinct as opposed to merely when the signal starts to shift from background. This cuts down on background that was mislabeled as a seizure. For seizure end times, all post ictal slowing that was included was removed. The recent release of v1.5.2 did not include any additional data files. Two EEG files had been added because, originally, they were corrupted in v1.5.1 but were able to be retrieved and added for the latest release. The progression from v1.5.0 to v1.5.1 and later to v1.5.2, included the re-annotation of all of the EEG files in order to develop a confident dataset regarding seizure identification. Starting with v1.4.0, we have also developed a blind evaluation set that is withheld for use in competitions. The annotation team is currently working on the next release for TUSZ, v1.6.0, which is expected to occur in August 2020. It will include new data from 2016 to mid-2019. This release will contain 2,296 files from 2016 as well as several thousand files representing the remaining data through mid-2019. In addition to files that were obtained with our standard triaging process, a part of this release consists of EEG files that do not have associated patient reports. Since actual seizure events are in short supply, we are mining a large chunk of data for which we have EEG recordings but no reports. Some of this data contains interesting seizure events collected during long-term EEG sessions or data collected from patients with a history of frequent seizures. It is being mined to increase the number of files in the corpus that have at least one seizure event. We expect v1.6.0 to be released before IEEE SPMB 2020. The TUAR Corpus is an open-source database that is currently available for use by any registered member of our consortium. To register and receive access, please follow the instructions provided at this web page: https://www.isip.piconepress.com/projects/tuh_eeg/html/downloads.shtml. The data is located here: https://www.isip.piconepress.com/projects/tuh_eeg/downloads/tuh_eeg_artifact/v2.0.0/. 
    more » « less
  3. Fine-scale evolutionary dynamics can be challenging to tease out when focused on the broad brush strokes of whole populations over long time spans. We propose a suite of diagnostic analysis techniques that operate on lineages and phylogenies in digital evolution experiments, with the aim of improving our capacity to quantitatively explore the nuances of evolutionary histories in digital evolution experiments. We present three types of lineage measurements: lineage length, mutation accumulation, and phenotypic volatility. Additionally, we suggest the adoption of four phylogeny measurements from biology: phylogenetic richness, phylogenetic divergence, phylogenetic regularity, and depth of the most-recent common ancestor. In addition to quantitative metrics, we also discuss several existing data visualizations that are useful for understanding lineages and phylogenies: state sequence visualizations, fitness landscape overlays, phylogenetic trees, and Muller plots. We examine the behavior of these metrics (with the aid of data visualizations) in two well-studied computational contexts: (1) a set of two-dimensional, real-valued optimization problems under a range of mutation rates and selection strengths, and (2) a set of qualitatively different environments in the Avida digital evolution platform. These results confirm our intuition about how these metrics respond to various evolutionary conditions and indicate their broad value. 
    more » « less
  4. Abstract. While data on human behavior in COVID-19 rich environments have been captured and publicly released, spatial components of such data are recorded in two-dimensions. Thus, the complete roles of the built and natural environment cannot be readily ascertained. This paper introduces a mechanism for the three-dimensional (3D) visualization of egress behaviors of individuals leaving a COVID-19 exposed healthcare facility in Spring 2020 in New York City. Behavioral data were extracted and projected onto a 3D aerial laser scanning point cloud of the surrounding area rendered with Potree, a readily available open-source Web Graphics Library (WebGL) point cloud viewer. The outcomes were 3D heatmap visualizations of the built environment that indicated the event locations of individuals exhibiting specific characteristics (e.g., men vs. women; public transit users vs. private vehicle users). These visualizations enabled interactive navigation through the space accessible through any modern web browser supporting WebGL. Visualizing egress behavior in this manner may highlight patterns indicative of correlations between the environment, human behavior, and transmissible diseases. Findings using such tools have the potential to identify high-exposure areas and surfaces such as doors, railings, and other physical features. Providing flexible visualization capabilities with 3D spatial context can enable analysts to quickly advise and communicate vital information across a broad range of use cases. This paper presents such an application to extract the public health information necessary to form localized responses to reduce COVID-19 infection and transmission rates in urban areas. 
    more » « less
  5. null (Ed.)
    In the United States, every state has a tourism website. These sites highlight the main attractions of the state, travel tips, and blog posts among other relevant information. The funding for these websites often comes from occupancy taxes, a form of taxes that comes from tourists who stay in hotels and visit attractions. Therefore, current and past tourists fund the efforts to draw future tourists into the state. Since state tourism is funded by the success of past tourism efforts, it is important for researchers to spend their time and resources on finding out what efforts were successful and which weren’t. With this comes the importance of seeing trends in past tourism endeavors. By examining past tourism websites, patterns can be drawn about information that changed, from season to season and year to year. These patterns can be used to see what researchers deemed as successful tourism efforts, and help guide future state tourism decisions. Our client, Dr. Florian Zach of the Howard Feiertag Department of Hospitality and Tourism Management, wants to use this historical analysis on state tourism information to help with his research on trends in state tourism website content. Iterations of the California state tourism website, among other sites, are stored as snapshots on the Internet Archive and can be accessed to see changes in websites over time. Our team was given Parquet files of these snapshots dating back to 2008. The goal of the project was to assist Dr. Zach by using the California state tourism website, visitcalifornia.com, and these snapshots as an avenue to explore data extraction and visualization techniques on tourism patterns to later be expanded to other states’ tourism websites. Python’s Pandas library was utilized to examine and extract relevant pieces of data from the given Parquet files. Once the data was extracted, we used Python’s Natural Language Processing Toolkit to remove non-English words, punctuation, and a set of unimportant “stop words”. With this refined data, we were able to make visualizations regarding the frequency of words in the headers and body of the website snapshots. The data was examined in its entirety as well as in groups of seasons and years. Microsoft Excel functions were utilized to examine and visualize the data in these formats. These data extraction and visualization techniques that we became familiar with will be passed down to a future team. The research on state tourism site information can be expanded to different metadata sets and to other states. 
    more » « less