Convolutional neural networks (CNNs) rely on the depth
of the architecture to obtain complex features. It results in
computationally expensive models for low-resource IoT devices. Convolutional operators are local and restricted in
the receptive field, which increases with depth. We explore
partial differential equations (PDEs) that offer a global receptive field without the added overhead of maintaining
large kernel convolutional filters. We propose a new feature
layer, called the Global layer, that enforces PDE constraints
on the feature maps, resulting in rich features. These constraints are solved by embedding iterative schemes in the
network. The proposed layer can be embedded in any deep
CNN to transform it into a shallower network. Thus, resulting in compact and computationally efficient architectures
achieving similar performance as the original network. Our
experimental evaluation demonstrates that architectures with
global layers require 2 − 5× less computational and storage
budget without any significant loss in performance
more »
« less
This content will become publicly available on May 3, 2025
MetaStore: Analyzing Deep Learning Meta-Data at Scale
The process of training deep learning models produces a huge amount of meta-data, including but not limited to losses, hidden feature embeddings, and gradients. Model diagnosis tools have been developed to analyze losses and feature embeddings with the aim to improve the performance of these models. However, gradients, despite carrying rich information that is potentially relevant for model interpretation and data debugging, have yet to be fully explored due to their size and complexity. Each single gradient has a size as large as the number of parameters of the neural net - often measured in the tens of millions. This makes it extremely challenging to efficiently collect, store, and analyze large numbers of gradients in these models. In this work, we develop MetaStore to fill this gap. MetaStore leverages our observation that storing certain compact intermediate results produced in the back propagation process, namely, the prefix and suffix gradients, is sufficient for the exact restoration of the original gradient. These prefix and suffix gradients are much more compact than the original gradients, thus allowing us to address the gradient collection and storage challenges. Furthermore, MetaStore features a rich set of analytics operators that allow the users to analyze the gradients for data debugging or model interpretation. Rather than first having to restore the original gradients and then run analytics on top of this decompressed view, MetaStore directly executes these operators on the compact prefix and suffix structures, making gradient-based analytics efficient and scalable. Our experiments on popular deep learning models such as VGG, BERT, and ResNet and benchmark image and text datasets demonstrate that MetaStore outperforms strong baseline methods from 4 to 678x in storage costs and from 2 to 1000x in running time.
more »
« less
- Award ID(s):
- 1910880
- PAR ID:
- 10535331
- Publisher / Repository:
- Proceedings of the VLDB Endowment
- Date Published:
- Journal Name:
- Proceedings of the VLDB Endowment
- Volume:
- 17
- Issue:
- 6
- ISSN:
- 2150-8097
- Page Range / eLocation ID:
- 1446–1459
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Accurate selectivity estimation for string predicates is a long-standing research challenge in databases. Supporting pattern matching on strings (such as prefix, substring, and suffix) makes this problem much more challenging, thereby necessitating a dedicated study. Traditional approaches often build pruned summary data structures such as tries followed by selectivity estimation using statistical correlations. However, this produces insufficiently accurate cardinality estimates resulting in the selection of sub-optimal plans by the query optimizer. Recently proposed deep learning based approaches leverage techniques from natural language processing such as embeddings to encode the strings and use it to train a model. While this is an improvement over traditional approaches, there is a large scope for improvement. We propose Astrid, a framework for string selectivity estimation that synthesizes ideas from traditional and deep learning based approaches. We make two complementary contributions. First, we propose an embedding algorithm that is query-type (prefix, substring, and suffix) and selectivity aware. Consider three strings 'ab', 'abc' and 'abd' whose prefix frequencies are 1000, 800 and 100 respectively. Our approach would ensure that the embedding for 'ab' is closer to 'abc' than 'abd'. Second, we describe how neural language models could be used for selectivity estimation. While they work well for prefix queries, their performance for substring queries is sub-optimal. We modify the objective function of the neural language model so that it could be used for estimating selectivities of pattern matching queries. We also propose a novel and efficient algorithm for optimizing the new objective function. We conduct extensive experiments over benchmark datasets and show that our proposed approaches achieve state-of-the-art results.more » « less
-
Telemetry systems are widely used to collect data from distributed endpoints, analyze data in conjunction to gain valuable insights, and store data for historical analytics. These systems consist of four stages (Figure 1): collection, transmission, analysis, and storage. Collectors at the endpoint collect various types of data, which is then transmitted to a central server for analysis. This data is used for multiple downstream tasks, such as dashboard monitoring and anomaly detection. Finally, this data is stored in long-term storage to aid retrospective analytics and debugging.more » « less
-
Annotating medical images for the purposes of training computer vision models is an extremely laborious task that takes time and resources away from expert clinicians. Active learning (AL) is a machine learning paradigm that mitigates this problem by deliberately proposing data points that should be labeled in order to maximize model performance. We propose a novel AL algorithm for segmentation, ALGES, that utilizes gradient embeddings to effectively select laparoscopic images to be labeled by some external oracle while reducing annotation effort. Given any unlabeled image, our algorithm treats predicted segmentations as truth and computes gradients with respect to the model parameters of the last layer in a segmentation network. The norms of these per-pixel gradient vectors correspond to the magnitude of the induced change in model parameters and contain rich information about the model’s predictive uncertainty. Our algorithm then computes gradients embeddings in two ways, and we employ a center-finding algorithm with these embeddings to procure representative and diverse batches in each round of AL. An advantage of our approach is extensibility to any model architecture and differentiable loss scheme for semantic segmentation. We apply our approach to a public data set of laparoscopic cholecystectomy images and show that it outperforms current AL algorithms in selecting the most informative data points for improving the segmentation model. Our code is available at https://github.com/josaklil-ai/surg-active-learning.more » « less
-
Deep learning (DL) models have achieved paradigm-changing performance in many fields with high dimensional data, such as images, audio, and text. However, the black-box nature of deep neural networks is not only a barrier to adoption in applications such as medical diagnosis, where interpretability is essential, but it also impedes diagnosis of under performing models. The task of diagnosing or explaining DL models requires the computation of additional artifacts, such as activation values and gradients. These artifacts are large in volume, and their computation, storage, and querying raise significant data management challenges. In this paper, we develop a novel data sampling technique that produces approximate but accurate results for these model debugging queries. Our sampling technique utilizes the lower dimension representation learned by the DL model and focuses on model decision boundaries for the data in this lower dimensional space.more » « less