skip to main content


Title: Decomposing and Recomposing Event Structure
Abstract We present an event structure classification empirically derived from inferential properties annotated on sentence- and document-level Universal Decompositional Semantics (UDS) graphs. We induce this classification jointly with semantic role, entity, and event-event relation classifications using a document-level generative model structured by these graphs. To support this induction, we augment existing annotations found in the UDS1.0 dataset, which covers the entirety of the English Web Treebank, with an array of inferential properties capturing fine-grained aspects of the temporal and aspectual structure of events. The resulting dataset (available at decomp.io) is the largest annotation of event structure and (partial) event coreference to date.  more » « less
Award ID(s):
2040831
NSF-PAR ID:
10331160
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Transactions of the Association for Computational Linguistics
Volume:
10
ISSN:
2307-387X
Page Range / eLocation ID:
17 to 34
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Survival models are used to analyze time-to-event data in a variety of disciplines. Proportional hazard models provide interpretable parameter estimates, but proportional hazard assumptions are not always appropriate. Non-parametric models are more flexible but often lack a clear inferential framework. We propose a Bayesian treed hazards partition model that is both flexible and inferential. Inference is obtained through the posterior tree structure and flexibility is preserved by modeling the log-hazard function in each partition using a latent Gaussian process. An efficient reversible jump Markov chain Monte Carlo algorithm is accomplished by marginalizing the parameters in each partition element via a Laplace approximation. Consistency properties for the estimator are established. The method can be used to help determine subgroups as well as prognostic and/or predictive biomarkers in time-to-event data. The method is compared with some existing methods on simulated data and a liver cirrhosis dataset.

     
    more » « less
  2. Federated learning has emerged as an important paradigm for training machine learning models in different domains. For graph-level tasks such as graph classification, graphs can also be regarded as a special type of data samples, which can be collected and stored in separate local systems. Similar to other domains, multiple local systems, each holding a small set of graphs, may benefit from collaboratively training a powerful graph mining model, such as the popular graph neural networks (GNNs). To provide more motivation towards such endeavors, we analyze real-world graphs from different domains to confirm that they indeed share certain graph properties that are statistically significant compared with random graphs. However, we also find that different sets of graphs, even from the same domain or same dataset, are non-IID regarding both graph structures and node features. To handle this, we propose a graph clustered federated learning (GCFL) framework that dynamically finds clusters of local systems based on the gradients of GNNs, and theoretically justify that such clusters can reduce the structure and feature heterogeneity among graphs owned by the local systems. Moreover, we observe the gradients of GNNs to be rather fluctuating in GCFL which impedes high-quality clustering, and design a gradient sequence-based clustering mechanism based on dynamic time warping (GCFL+). Extensive experimental results and in-depth analysis demonstrate the effectiveness of our proposed frameworks. 
    more » « less
  3. null (Ed.)
    Overall, this document will serve as an analysis of the combination between machine learning principles and computer network analysis in their ability to detect a network anomaly, such as a network attack. The research provided in this document will highlight the key elements of network analysis and provide an overview of common network analysis techniques. Specifically, this document will highlight a study conducted by the University of Luxembourg and an attempt to recreate the study with a slightly different list of parameters against a different dataset for network anomaly detection using NetFlow data. Alongside network analysis, is the emerging field of machine learning. This document will be investigating common machine learning techniques and implement a support vector machine algorithm to detect anomaly and intrusion within the network. MatLab was an utilized machine learning tool for identifying how to coordinate network analysis data with Support Vector Machines. The resulting graphs represent tests conducted using Support vector machines in a method similar to that of the University of Luxembourg. The difference between the tests is within the metrics used for anomaly detection. The University of Luxembourg utilized the IP addresses and the volume of traffic of a specific NetFlow dataset. The resulting graphs utilize a metric based on the duration of transmitted bytes, and the ratio of the incoming and outgoing bytes during the transmission. The algorithm created and defined metrics proved to not be as efficient as planned against the NetFlow dataset. The use of the conducted tests did not provide a clear classification of an anomaly. However, many other factors contributing to network anomalies were highlighted. 
    more » « less
  4. Abstract

    We introduce a new method based on nonnegative matrix factorization, Neural NMF, for detecting latent hierarchical structure in data. Datasets with hierarchical structure arise in a wide variety of fields, such as document classification, image processing, and bioinformatics. Neural NMF recursively applies NMF in layers to discover overarching topics encompassing the lower-level features. We derive a backpropagation optimization scheme that allows us to frame hierarchical NMF as a neural network. We test Neural NMF on a synthetic hierarchical dataset, the 20 Newsgroups dataset, and the MyLymeData symptoms dataset. Numerical results demonstrate that Neural NMF outperforms other hierarchical NMF methods on these data sets and offers better learned hierarchical structure and interpretability of topics.

     
    more » « less
  5. null (Ed.)
    Developing algorithms that are able to generalize to a novel task given only a few labeled examples represents a fundamental challenge in closing the gap between machine- and human-level performance. The core of human cognition lies in the structured, reusable concepts that help us to rapidly adapt to new tasks and provide reasoning behind our decisions. However, existing meta-learning methods learn complex representations across prior labeled tasks without imposing any structure on the learned representations. Here we propose COMET, a meta-learning method that improves generalization ability by learning to learn along human-interpretable concept dimensions. Instead of learning a joint unstructured metric space, COMET learns mappings of high-level concepts into semi-structured metric spaces, and effectively combines the outputs of independent concept learners. We evaluate our model on few-shot tasks from diverse domains, including fine-grained image classification, document categorization and cell type annotation on a novel dataset from a biological domain developed in our work. COMET significantly outperforms strong meta-learning baselines, achieving 6–15% relative improvement on the most challenging 1-shot learning tasks, while unlike existing methods providing interpretations behind the model’s predictions. 
    more » « less