skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Attention-based quantum tomography
Abstract With rapid progress across platforms for quantum systems, the problem of many-body quantum state reconstruction for noisy quantum states becomes an important challenge. There has been a growing interest in approaching the problem of quantum state reconstruction using generative neural network models. Here we propose the ‘attention-based quantum tomography’ (AQT), a quantum state reconstruction using an attention mechanism-based generative network that learns the mixed state density matrix of a noisy quantum state. AQT is based on the model proposed in ‘Attention is all you need’ by Vaswani et al (2017 NIPS ) that is designed to learn long-range correlations in natural language sentences and thereby outperform previous natural language processing (NLP) models. We demonstrate not only that AQT outperforms earlier neural-network-based quantum state reconstruction on identical tasks but that AQT can accurately reconstruct the density matrix associated with a noisy quantum state experimentally realized in an IBMQ quantum computer. We speculate the success of the AQT stems from its ability to model quantum entanglement across the entire quantum system much as the attention model for NLP captures the correlations among words in a sentence.  more » « less
Award ID(s):
1719875 1934714
PAR ID:
10325397
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Machine Learning: Science and Technology
Volume:
3
Issue:
1
ISSN:
2632-2153
Page Range / eLocation ID:
01LT01
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Social media offer an abundant source of valuable raw data, however informal writing can quickly become a bottleneck for many natural language processing (NLP) tasks. Off-theshelf tools are usually trained on formal text and cannot explicitly handle noise found in short online posts. Moreover, the variety of frequently occurring linguistic variations presents several challenges, even for humans who might not be able to comprehend the meaning of such posts, especially when they contain slang and abbreviations. Text Normalization aims to transform online user-generated text to a canonical form. Current text normalization systems rely on string or phonetic similarity and classification models that work on a local fashion. We argue that processing contextual information is crucial for this task and introduce a social media text normalization hybrid word-character attention-based encoder-decoder model that can serve as a pre-processing step for NLP applications to adapt to noisy text in social media. Our character-based component is trained on synthetic adversarial examples that are designed to capture errors commonly found in online user-generated text. Experiments show that our model surpasses neural architectures designed for text normalization and achieves comparable performance with state-of-the-art related work. 
    more » « less
  2. Assessing the correctness of student answers in a dialog-based intelligent tutoring system (ITS) is a well-defined Natural Language Processing (NLP) task that has attracted the attention of many researchers in the field. Inspired by Vaswani’s transformer, we propose in this paper an attention-based transformer neural network with a multi-head attention mechanism for the task of student answer assessment. Results show the competitiveness of our proposed model. A highest accuracy of 71.5% was achieved when using ELMo embeddings, 10 heads of attention, and 2 layers. This is very competitive and rivals the highest accuracy achieved by a previously proposed BI-GRU-Capsnet deep network (72.5%) on the same dataset. The main advantages of using transformers over BI-GRU-Capsnet is reducing the training time and giving more space for parallelization. 
    more » « less
  3. Abstract Self‐supervised neural language models have recently achieved unprecedented success from natural language processing to learning the languages of biological sequences and organic molecules. These models have demonstrated superior performance in the generation, structure classification, and functional predictions for proteins and molecules with learned representations. However, most of the masking‐based pre‐trained language models are not designed for generative design, and their black‐box nature makes it difficult to interpret their design logic. Here a Blank‐filling Language Model for Materials (BLMM) Crystal Transformer is proposed, a neural network‐based probabilistic generative model for generative and tinkering design of inorganic materials. The model is built on the blank‐filling language model for text generation and has demonstrated unique advantages in learning the “materials grammars” together with high‐quality generation, interpretability, and data efficiency. It can generate chemically valid materials compositions with as high as 89.7% charge neutrality and 84.8% balanced electronegativity, which are more than four and eight times higher compared to a pseudo‐random sampling baseline. The probabilistic generation process of BLMM allows it to recommend materials tinkering operations based on learned materials chemistry, which makes it useful for materials doping. The model is applied to discover a set of new materials as validated using the Density Functional Theory (DFT) calculations. This work thus brings the unsupervised transformer language models based generative artificial intelligence to inorganic materials. A user‐friendly web app for tinkering materials design has been developed and can be accessed freely atwww.materialsatlas.org/blmtinker. 
    more » « less
  4. null (Ed.)
    Abstract Deep neural networks provide state-of-the-art performance for image denoising, where the goal is to recover a near noise-free image from a noisy observation. The underlying principle is that neural networks trained on large data sets have empirically been shown to be able to generate natural images well from a low-dimensional latent representation of the image. Given such a generator network, a noisy image can be denoised by (i) finding the closest image in the range of the generator or by (ii) passing it through an encoder-generator architecture (known as an autoencoder). However, there is little theory to justify this success, let alone to predict the denoising performance as a function of the network parameters. In this paper, we consider the problem of denoising an image from additive Gaussian noise using the two generator-based approaches. In both cases, we assume the image is well described by a deep neural network with ReLU activations functions, mapping a $$k$$-dimensional code to an $$n$$-dimensional image. In the case of the autoencoder, we show that the feedforward network reduces noise energy by a factor of $O(k/n)$. In the case of optimizing over the range of a generative model, we state and analyze a simple gradient algorithm that minimizes a non-convex loss function and provably reduces noise energy by a factor of $O(k/n)$. We also demonstrate in numerical experiments that this denoising performance is, indeed, achieved by generative priors learned from data. 
    more » « less
  5. Large language models have substantially advanced nuance and context understanding in natural language processing (NLP), further fueling the growth of intelligent conversational interfaces and virtual assistants. However, their hefty computational and memory demands make them potentially expensive to deploy on cloudless edge platforms with strict latency and energy requirements. For example, an inference pass using the state-of-the-art BERT-base model must serially traverse through 12 computationally intensive transformer layers, each layer containing 12 parallel attention heads whose outputs concatenate to drive a large feed-forward network. To reduce computation latency, several algorithmic optimizations have been proposed, e.g., a recent algorithm dynamically matches linguistic complexity with model sizes via entropy-based early exit. Deploying such transformer models on edge platforms requires careful co-design and optimizations from algorithms to circuits, where energy consumption is a key design consideration. 
    more » « less