skip to main content


Title: Discovery and Utilization of Jazz Motifs for Computer-Generated Solos
Building on previous work in computer generated jazz solos using probabilistic grammars, this paper describes research extending the capabilities of the current learning process and grammar representation used in the Impro-Visor educational music software with the concepts of motifs and motif patterns. An approach has been developed using clustering, best match search techniques, and probabilistic grammar rules to identify motifs and incorporate them into computer generated solos. The abilities of this technique are further expanded through the use of motif patterns. Motif patterns are used to induce coherence in generated solos by learning the patterns in which motifs were used in a given set of transcriptions. This approach is implemented as a feature of the Impro- Visor software.  more » « less
Award ID(s):
1659805
NSF-PAR ID:
10089617
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Computer Simulation of Musical Creativity
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Building on previous work in computer generated jazz solos using probabilistic grammars, this paper describes research extending the capabilities of the current learning process and grammar representation used in the Impro-Visor educational music software with the concepts of motifs and motif patterns. An approach has been developed using clustering, best match search techniques, and probabilistic grammar rules to identify motifs and incorporate them into computer generated solos. The abilities of this technique are further expanded through the use of motif patterns. Motif patterns are used to induce coherence in generated solos by learning the patterns in which motifs were used in a given set of transcriptions. This approach is implemented as a feature of the Impro- Visor software. 
    more » « less
  2. Abstract Motivation

    MHC Class I protein plays an important role in immunotherapy by presenting immunogenic peptides to anti-tumor immune cells. The repertoires of peptides for various MHC Class I proteins are distinct, which can be reflected by their diverse binding motifs. To characterize binding motifs for MHC Class I proteins, in vitro experiments have been conducted to screen peptides with high binding affinities to hundreds of given MHC Class I proteins. However, considering tens of thousands of known MHC Class I proteins, conducting in vitro experiments for extensive MHC proteins is infeasible, and thus a more efficient and scalable way to characterize binding motifs is needed.

    Results

    We presented a de novo generation framework, coined PepPPO, to characterize binding motif for any given MHC Class I proteins via generating repertoires of peptides presented by them. PepPPO leverages a reinforcement learning agent with a mutation policy to mutate random input peptides into positive presented ones. Using PepPPO, we characterized binding motifs for around 10 000 known human MHC Class I proteins with and without experimental data. These computed motifs demonstrated high similarities with those derived from experimental data. In addition, we found that the motifs could be used for the rapid screening of neoantigens at a much lower time cost than previous deep-learning methods.

    Availability and implementation

    The software can be found in https://github.com/minrq/pMHC.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  3. Abstract Motivation

    Given a protein of unknown function, fast identification of similar protein structures from the Protein Data Bank (PDB) is a critical step for inferring its biological function. Such structural neighbors can provide evolutionary insights into protein conformation, interfaces and binding sites that are not detectable from sequence similarity. However, the computational cost of performing pairwise structural alignment against all structures in PDB is prohibitively expensive. Alignment-free approaches have been introduced to enable fast but coarse comparisons by representing each protein as a vector of structure features or fingerprints and only computing similarity between vectors. As a notable example, FragBag represents each protein by a ‘bag of fragments’, which is a vector of frequencies of contiguous short backbone fragments from a predetermined library. Despite being efficient, the accuracy of FragBag is unsatisfactory because its backbone fragment library may not be optimally constructed and long-range interacting patterns are omitted.

    Results

    Here we present a new approach to learning effective structural motif presentations using deep learning. We develop DeepFold, a deep convolutional neural network model to extract structural motif features of a protein structure. We demonstrate that DeepFold substantially outperforms FragBag on protein structural search on a non-redundant protein structure database and a set of newly released structures. Remarkably, DeepFold not only extracts meaningful backbone segments but also finds important long-range interacting motifs for structural comparison. We expect that DeepFold will provide new insights into the evolution and hierarchical organization of protein structural motifs.

    Availability and implementation

    https://github.com/largelymfs/DeepFold

     
    more » « less
  4. null (Ed.)
    Abstract Background Identification of motifs and quantification of their occurrences are important for the study of genetic diseases, gene evolution, transcription sites, and other biological mechanisms. Exact formulae for estimating count distributions of motifs under Markovian assumptions have high computational complexity and are impractical to be used on large motif sets. Approximated formulae, e.g. based on compound Poisson, are faster, but reliable p value calculation remains challenging. Here, we introduce ‘motif_prob’, a fast implementation of an exact formula for motif count distribution through progressive approximation with arbitrary precision. Our implementation speeds up the exact calculation, usually impractical, making it feasible and posit to substitute currently employed heuristics. Results We implement motif_prob in both Perl and C+ + languages, using an efficient error-bound iterative process for the exact formula, providing comparison with state-of-the-art tools (e.g. MoSDi) in terms of precision, run time benchmarks, along with a real-world use case on bacterial motif characterization. Our software is able to process a million of motifs (13–31 bases) over genome lengths of 5 million bases within the minute on a regular laptop, and the run times for both the Perl and C+ + code are several orders of magnitude smaller (50–1000× faster) than MoSDi, even when using their fast compound Poisson approximation (60–120× faster). In the real-world use cases, we first show the consistency of motif_prob with MoSDi, and then how the p-value quantification is crucial for enrichment quantification when bacteria have different GC content, using motifs found in antimicrobial resistance genes. The software and the code sources are available under the MIT license at https://github.com/DataIntellSystLab/motif_prob . Conclusions The motif_prob software is a multi-platform and efficient open source solution for calculating exact frequency distributions of motifs. It can be integrated with motif discovery/characterization tools for quantifying enrichment and deviation from expected frequency ranges with exact p values, without loss in data processing efficiency. 
    more » « less
  5. Al-Kadi, Omar Sultan (Ed.)
    In this paper, we capture and explore the painterly depictions of materials to enable the study of depiction and perception of materials through the artists’ eye. We annotated a dataset of 19k paintings with 200k+ bounding boxes from which polygon segments were automatically extracted. Each bounding box was assigned a coarse material label (e.g., fabric) and half was also assigned a fine-grained label (e.g., velvety, silky). The dataset in its entirety is available for browsing and downloading at materialsinpaintings.tudelft.nl . We demonstrate the cross-disciplinary utility of our dataset by presenting novel findings across human perception, art history and, computer vision. Our experiments include a demonstration of how painters create convincing depictions using a stylized approach. We further provide an analysis of the spatial and probabilistic distributions of materials depicted in paintings, in which we for example show that strong patterns exists for material presence and location. Furthermore, we demonstrate how paintings could be used to build more robust computer vision classifiers by learning a more perceptually relevant feature representation. Additionally, we demonstrate that training classifiers on paintings could be used to uncover hidden perceptual cues by visualizing the features used by the classifiers. We conclude that our dataset of painterly material depictions is a rich source for gaining insights into the depiction and perception of materials across multiple disciplines and hope that the release of this dataset will drive multidisciplinary research. 
    more » « less