skip to main content


Search for: All records

Creators/Authors contains: "Shasha, Dennis"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Database and data structure research can improve machine learning performance in many ways. One way is to design better algorithms on data structures. This paper combines the use of incremental computation as well as sequential and probabilistic filtering to enable “forgetful” tree-based learning algorithms to cope with streaming data that suffers from concept drift. (Concept drift occurs when the functional mapping from input to classification changes over time). The forgetful algorithms described in this paper achieve high performance while maintaining high quality predictions on streaming data. Specifically, the algorithms are up to 24 times faster than state-of-the-art incremental algorithms with, at most, a 2% loss of accuracy, or are at least twice faster without any loss of accuracy. This makes such structures suitable for high volume streaming applications. 
    more » « less
    Free, publicly-accessible full text available June 1, 2024
  2. Abstract

    Film post-production can be time- and money-inefficient. The reason is that a lot of the work involves a person or group of people, called metadata taggers, going through each individual piece of media and marking it up with relevant tags, such as the scene number, transcripts, and the type of shot for video footage. Such a task is particularly time-consuming for films with high shooting ratios (i.e., footage shot/footage shown). AutoTag automates much of the tagging process across 16 languages, saving both time and money. We describe the algorithms and implementation of AutoTag and report on some case studies.

     
    more » « less
  3. Abstract

    This study evaluates both a variety of existing base causal inference methods and a variety of ensemble methods. We show that: (i) base network inference methods vary in their performance across different datasets, so a method that works poorly on one dataset may work well on another; (ii) a non-homogeneous ensemble method in the form of a Naive Bayes classifier leads overall to as good or better results than using the best single base method or any other ensemble method; (iii) for the best results, the ensemble method should integrate all methods that satisfy a statistical test of normality on training data. The resulting ensemble modelEnsInfereasily integrates all kinds of RNA-seq data as well as new and existing inference methods. The paper categorizes and reviews state-of-the-art underlying methods, describes theEnsInferensemble approach in detail, and presents experimental results. The source code and data used will be made available to the community upon publication.

     
    more » « less
  4. Multicopy search structures such as log-structured merge (LSM) trees are optimized for high insert/update/delete (collectively known as upsert) performance. In such data structures, an upsert on key k , which adds ( k , v ) where v can be a value or a tombstone, is added to the root node even if k is already present in other nodes. Thus there may be multiple copies of k in the search structure. A search on k aims to return the value associated with the most recent upsert. We present a general framework for verifying linearizability of concurrent multicopy search structures that abstracts from the underlying representation of the data structure in memory, enabling proof-reuse across diverse implementations. Based on our framework, we propose template algorithms for (a) LSM structures forming arbitrary directed acyclic graphs and (b) differential file structures, and formally verify these templates in the concurrent separation logic Iris. We also instantiate the LSM template to obtain the first verified concurrent in-memory LSM tree implementation. 
    more » « less
  5. null (Ed.)