It has been shown by many researchers that transformers perform as well as convolutional neural networks in many computer vision tasks. Meanwhile, the large computational costs of its attention module hinder further studies and applications on edge devices. Some pruning methods have been developed to construct efficient vision transformers, but most of them have considered image classification tasks only. Inspired by these results, we propose SiDT, a method for pruning vision transformer backbones on more complicated vision tasks like object detection, based on the search of transformer dimensions. Experiments on CIFAR-100 and COCO datasets show that the backbones with 20% or 40% dimensions/parameters pruned can have similar or even better performance than the unpruned models. Moreover, we have also provided the complexity analysis and comparisons with the previous pruning methods.
more »
« less
PGT: Pseudo relevance feedback using a graph-based transforme
Most research on pseudo relevance feedback (PRF) has been done in vector space and probabilistic retrieval models. This paper shows that Transformer-based rerankers can also benefit from the extra context that PRF provides. It presents PGT, a graph-based Transformer that sparsifies attention between graph nodes to enable PRF while avoiding the high computational complexity of most Transformer architectures. Experiments show that PGT improves upon non-PRF Transformer reranker, and it is at least as accurate as Transformer PRF models that use full attention, but with lower computational costs.
more »
« less
- Award ID(s):
- 1815528
- PAR ID:
- 10273607
- Date Published:
- Journal Name:
- Advances in Information Retrieval – 43rd European Conference on IR Research
- Page Range / eLocation ID:
- 440-447
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Quantum Computing has attracted much research attention because of its potential to achieve fundamental speed and efficiency improvements in various domains. Among different quantum algorithms, Parameterized Quantum Circuits (PQC) for Quantum Machine Learning (QML) show promises to realize quantum advantages on the current Noisy Intermediate-Scale Quantum (NISQ) Machines. Therefore, to facilitate the QML and PQC research, a recent python library called TorchQuantum has been released. It can construct, simulate, and train PQC for machine learning tasks with high speed and convenient debugging supports. Besides quantum for ML, we want to raise the community's attention on the reversed direction: ML for quantum. Specifically, the TorchQuantum library also supports using data-driven ML models to solve problems in quantum system research, such as predicting the impact of quantum noise on circuit fidelity and improving the quantum circuit compilation efficiency. This paper presents a case study of the ML for quantum part in TorchQuantum. Since estimating the noise impact on circuit reliability is an essential step toward understanding and mitigating noise, we propose to leverage classical ML to predict noise impact on circuit fidelity. Inspired by the natural graph representation of quantum circuits, we propose to leverage a graph transformer model to predict the noisy circuit fidelity. We firstly collect a large dataset with a variety of quantum circuits and obtain their fidelity on noisy simulators and real machines. Then we embed each circuit into a graph with gate and noise properties as node features, and adopt a graph transformer to predict the fidelity. We can avoid exponential classical simulation cost and efficiently estimate fidelity with polynomial complexity. Evaluated on 5 thousand random and algorithm circuits, the graph transformer predictor can provide accurate fidelity estimation with RMSE error 0.04 and outperform a simple neural network-based model by 0.02 on average. It can achieve 0.99 and 0.95 R2 scores for random and algorithm circuits, respectively. Compared with circuit simulators, the predictor has over 200× speedup for estimating the fidelity. The datasets and predictors can be accessed in the TorchQuantum library.more » « less
-
Smart grids can be vulnerable to attacks and accidents, and any initial failures in smart grids can grow to a large blackout because of cascading failure. Because of the importance of smart grids in modern society, it is crucial to protect them against cascading failures. Simulation of cascading failures can help identify the most vulnerable transmission lines and guide prioritization in protection planning, hence, it is an effective approach to protect smart grids from cascading failures. However, due to the enormous number of ways that the smart grids may fail initially, it is infeasible to simulate cascading failures at a large scale nor identify the most vulnerable lines efficiently. In this paper, we aim at 1) developing a method to run cascading failure simulations at scale and 2) building simplified, diffusion based cascading failure models to support efficient and theoretically bounded identification of most vulnerable lines. The goals are achieved by first constructing a novel connection between cascading failures and natural languages, and then adapting the powerful transformer model in NLP to learn from cascading failure data. Our trained transformer models have good accuracy in predicting the total number of failed lines in a cascade and identifying the most vulnerable lines. We also constructed independent cascade (IC) diffusion models based on the attention matrices of the transformer models, to support efficient vulnerability analysis with performance bounds.more » « less
-
Dense retrieval systems conduct first-stage retrieval using embedded representations and simple similarity metrics to match a query to documents. Its effectiveness depends on encoded embeddings to capture the semantics of queries and documents, a challenging task due to the shortness and ambiguity of search queries. This paper proposes ANCE-PRF, a new query encoder that uses pseudo relevance feedback (PRF) to improve query representations for dense retrieval. ANCE-PRF uses a BERT encoder that consumes the query and the top retrieved documents from a dense retrieval model, ANCE, and it learns to produce better query embeddings directly from relevance labels. It also keeps the document index unchanged to reduce overhead. ANCE-PRF significantly outperforms ANCE and other recent dense retrieval systems on several datasets. Analysis shows that the PRF encoder effectively captures the relevant and complementary information from PRF documents, while ignoring the noise with its learned attention mechanism.more » « less
-
Graph Transformer (GT) recently has emerged as a new paradigm of graph learning algorithms, outperforming the previously popular Message Passing Neural Network (MPNN) on multiple benchmarks. Previous work shows that with proper position embedding, GT can approximate MPNN arbitrarily well, implying that GT is at least as powerful as MPNN. In this paper, we study the inverse connection and show that MPNN with virtual node (VN), a commonly used heuristic with little theoretical understanding, is powerful enough to arbitrarily approximate the self-attention layer of GT. In particular, we first show that if we consider one type of linear transformer, the so-called Performer/Linear Transformer, then MPNN+ VN with only depth and width can approximate a self-attention layer in Performer/Linear Transformer. Next, via a connection between MPNN+ VN and DeepSets, we prove the MPNN+ VN with width and depth can approximate the self-attention layer arbitrarily well, where is the input feature dimension. Lastly, under some assumptions, we provide an explicit construction of MPNN+ VN with width and depth approximating the self-attention layer in GT arbitrarily well. On the empirical side, we demonstrate that 1) MPNN+ VN is a surprisingly strong baseline, outperforming GT on the recently proposed Long Range Graph Benchmark (LRGB) dataset, 2) our MPNN+ VN improves over early implementation on a wide range of OGB datasets and 3) MPNN+ VN outperforms Linear Transformer and MPNN on the climate modeling task.more » « less