skip to main content


Search for: All records

Award ID contains: 2007481

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    Current complex prediction models are the result of fitting deep neural networks, graph convolutional networks or transducers to a set of training data. A key challenge with these models is that they are highly parameterized, which makes describing and interpreting the prediction strategies difficult. We use topological data analysis to transform these complex prediction models into a simplified topological view of the prediction landscape. The result is a map of the predictions that enables inspection of the model results with more specificity than dimensionality-reduction methods such as tSNE and UMAP. The methods scale up to large datasets across different domains. We present a case study of a transformer-based model previously designed to predict expression levels of a piece of DNA in thousands of genomic tracks. When the model is used to study mutations in theBRCA1gene, our topological analysis shows that it is sensitive to the location of a mutation and the exon structure ofBRCA1in ways that cannot be found with tools based on dimensionality reduction. Moreover, the topological framework offers multiple ways to inspect results, including an error estimate that is more accurate than model uncertainty. Further studies show how these ideas produce useful results in graph-based learning and image classification.

     
    more » « less
    Free, publicly-accessible full text available November 17, 2024
  2. Free, publicly-accessible full text available May 13, 2025
  3. We provide a processed JSON version of the 3234 page PDF document of Anthony Fauci's emails that were released in 2021 to provide a better understanding of the United States government response to the COVID-19 pandemic. The main JSON file contains a collection of 1289 email threads with 2761 emails among the threads, which includes 101 duplicate emails. For each email, we provide information about the sender, recipients, CC-list, subject, email body text, and email time stamp (when available). We also provide a number of derived datasets stored in individual JSON files: 5 different types of derived email networks, 1 email hypergraph, 1 temporal graph, and 3 tensors. Details for the data conversion process, the construction of the derived datasets, and subsequent analyses can all be found in an online technical report at https://arxiv.org/abs/2108.01239. Updated code for processing and analyzing the data can be found at https://github.com/nveldt/fauci-email.

    Research additionally supported by ARO Award W911NF-19-1-0057, ARO MURI, and NSF CAREER Award IIS-2045555, as well as NSF awards CCF-1909528, IIS-2007481, and the Sloan Foundation. 
    more » « less
  4. null (Ed.)