Multi-label classification is a challenging structured prediction task in which a set of output class labels are predicted for each input. Real-world datasets often have natural or latent taxonomic relationships between labels, making it desirable for models to employ label representations capable of capturing such taxonomies. Most existing multi-label classification methods do not do so, resulting in label predictions that are inconsistent with the taxonomic constraints, thus failing to accurately represent the fundamentals of problem setting. In this work, we introduce the multi-label box model (MBM), a multi-label classification method that combines the encoding power of neural networks with the inductive bias and probabilistic semantics of box embeddings (Vilnis, et al 2018). Box embeddings can be understood as trainable Venn-diagrams based on hyper-rectangles. Representing labels by boxes rather than vectors, MBM is able to capture taxonomic relations among labels. Furthermore, since box embeddings allow these relations to be learned by stochastic gradient descent from data, and to be read as calibrated conditional probabilities, our model is endowed with a high degree of interpretability. This interpretability also facilitates the injection of partial information about label-label relationships into model training, to further improve its consistency. We provide theoretical grounding for our methodmore »
Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling
Linear encoding of sparse vectors is widely popular, but is commonly data-independent – missing any possible extra (but a priori unknown) structure beyond sparsity. In this paper we present a new method to learn linear encoders that adapt to data, while still performing well with the widely used l1 decoder. The convex l1 decoder prevents gradient propagation as needed in standard gradient-based training. Our method is based on the insight that unrolling the convex decoder into T projected subgradient steps can address this issue. Our method can be seen as a data-driven way to learn a compressed sensing measurement matrix. We compare the empirical performance of 10 algorithms over 6 sparse datasets (3 synthetic and 3 real). Our experiments show that there is indeed additional structure beyond sparsity in the real datasets; our method is able to discover it and exploit it to create excellent reconstructions with fewer measurements (by a factor of 1.1-3x) compared to the previous state-of-the-art methods. We illustrate an application of our method in learning label embeddings for extreme multi-label classification, and empirically show that our method is able to match or outperform the precision scores of SLEEC, which is one of the state-of-the-art embedding-based approaches.
- Award ID(s):
- Publication Date:
- NSF-PAR ID:
- Journal Name:
- International Conference on Machine Learning (ICML)
- Sponsoring Org:
- National Science Foundation
More Like this
Obeid, Iyad Selesnick (Ed.)Electroencephalography (EEG) is a popular clinical monitoring tool used for diagnosing brain-related disorders such as epilepsy . As monitoring EEGs in a critical-care setting is an expensive and tedious task, there is a great interest in developing real-time EEG monitoring tools to improve patient care quality and efficiency . However, clinicians require automatic seizure detection tools that provide decisions with at least 75% sensitivity and less than 1 false alarm (FA) per 24 hours . Some commercial tools recently claim to reach such performance levels, including the Olympic Brainz Monitor  and Persyst 14 . In this abstract, we describe our efforts to transform a high-performance offline seizure detection system  into a low latency real-time or online seizure detection system. An overview of the system is shown in Figure 1. The main difference between an online versus offline system is that an online system should always be causal and has minimum latency which is often defined by domain experts. The offline system, shown in Figure 2, uses two phases of deep learning models with postprocessing . The channel-based long short term memory (LSTM) model (Phase 1 or P1) processes linear frequency cepstral coefficients (LFCC)  features from each EEGmore »
A major challenge in real-world reinforcement learning (RL) is the sparsity of reward feedback. Often, what is available is an intuitive but sparse reward function that only indicates whether the task is completed partially or fully. However, the lack of carefully designed, fine grain feedback implies that most existing RL algorithms fail to learn an acceptable policy in a reasonable time frame. This is because of the large number of exploration actions that the policy has to perform before it gets any useful feedback that it can learn from. In this work, we address this challenging problem by developing an algorithm that exploits the offline demonstration data generated by a sub-optimal behavior policy for faster and efficient online RL in such sparse reward settings. The proposed algorithm, which we call the Learning Online with Guidance Offline (LOGO) algorithm, merges a policy improvement step with an additional policy guidance step by using the offline demonstration data. The key idea is that by obtaining guidance from - not imitating - the offline data, LOGO orients its policy in the manner of the sub-optimal policy, while yet being able to learn beyond and approach optimality. We provide a theoretical analysis of our algorithm,more »
A Manifold Laplacian Regularized Semi-supervised Sparse Image Classification Method with a Variant Trace Lasso NormSince the cost of labeling data is getting higher and higher, we hope to make full use of the large amount of unlabeled data and improve image classification effect through adding some unlabeled samples for training. In addition, we expect to uniformly realize two tasks, namely the clustering of the unlabeled data and the recognition of the query image. We achieve the goal by designing a novel sparse model based on manifold assumption, which has been proved to work well in many tasks. Based on the assumption that images of the same class lie on a sub-manifold and an image can be approximately represented as the linear combination of its neighboring data due to the local linear property of manifold, we proposed a sparse representation model on manifold. Specifically, there are two regularizations, i.e., a variant Trace lasso norm and the manifold Laplacian regularization. The first regularization term enables the representation coefficients satisfying sparsity between groups and density within a group. And the second term is manifold Laplacian regularization by which label can be accurately propagated from labeled data to unlabeled data. Augmented Lagrange Multiplier (ALM) scheme and Gauss Seidel Alternating Direction Method of Multiplier (GS-ADMM) are given to solvemore »
While deep neural networks (DNNs) have achieved state-of-the-art results in many fields, they are typically over-parameterized. Parameter redundancy, in turn, leads to inefficiency. Sparse signal recovery (SSR) techniques, on the other hand, find compact solutions to over-complete linear problems. Therefore, a logical step is to draw the connection between SSR and DNNs. In this paper, we explore the application of iterative reweighting methods popular in SSR to learning efficient DNNs. By efficient, we mean sparse networks that require less computation and storage than the original, dense network. We propose a reweighting framework to learn sparse connections within a given architecture without biasing the optimization process, by utilizing the affine scaling transformation strategy. The resulting algorithm, referred to as Sparsity-promoting Stochastic Gradient Descent (SSGD), has simple gradient-based updates which can be easily implemented in existing deep learning libraries. We demonstrate the sparsification ability of SSGD on image classification tasks and show that it outperforms existing methods on the MNIST and CIFAR-10 datasets.