Social media offer an abundant source of valuable raw data, however informal writing can quickly become a bottleneck for many natural language processing (NLP) tasks. Off-theshelf tools are usually trained on formal text and cannot explicitly handle noise found in short online posts. Moreover, the variety of frequently occurring linguistic variations presents several challenges, even for humans who might not be able to comprehend the meaning of such posts, especially when they contain slang and abbreviations. Text Normalization aims to transform online user-generated text to a canonical form. Current text normalization systems rely on string or phonetic similarity and classification models that work on a local fashion. We argue that processing contextual information is crucial for this task and introduce a social media text normalization hybrid word-character attention-based encoder-decoder model that can serve as a pre-processing step for NLP applications to adapt to noisy text in social media. Our character-based component is trained on synthetic adversarial examples that are designed to capture errors commonly found in online user-generated text. Experiments show that our model surpasses neural architectures designed for text normalization and achieves comparable performance with state-of-the-art related work.
more »
« less
Attention-based quantum tomography
Abstract With rapid progress across platforms for quantum systems, the problem of many-body quantum state reconstruction for noisy quantum states becomes an important challenge. There has been a growing interest in approaching the problem of quantum state reconstruction using generative neural network models. Here we propose the ‘attention-based quantum tomography’ (AQT), a quantum state reconstruction using an attention mechanism-based generative network that learns the mixed state density matrix of a noisy quantum state. AQT is based on the model proposed in ‘Attention is all you need’ by Vaswani et al (2017 NIPS ) that is designed to learn long-range correlations in natural language sentences and thereby outperform previous natural language processing (NLP) models. We demonstrate not only that AQT outperforms earlier neural-network-based quantum state reconstruction on identical tasks but that AQT can accurately reconstruct the density matrix associated with a noisy quantum state experimentally realized in an IBMQ quantum computer. We speculate the success of the AQT stems from its ability to model quantum entanglement across the entire quantum system much as the attention model for NLP captures the correlations among words in a sentence.
more »
« less
- PAR ID:
- 10325397
- Date Published:
- Journal Name:
- Machine Learning: Science and Technology
- Volume:
- 3
- Issue:
- 1
- ISSN:
- 2632-2153
- Page Range / eLocation ID:
- 01LT01
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Assessing the correctness of student answers in a dialog-based intelligent tutoring system (ITS) is a well-defined Natural Language Processing (NLP) task that has attracted the attention of many researchers in the field. Inspired by Vaswani’s transformer, we propose in this paper an attention-based transformer neural network with a multi-head attention mechanism for the task of student answer assessment. Results show the competitiveness of our proposed model. A highest accuracy of 71.5% was achieved when using ELMo embeddings, 10 heads of attention, and 2 layers. This is very competitive and rivals the highest accuracy achieved by a previously proposed BI-GRU-Capsnet deep network (72.5%) on the same dataset. The main advantages of using transformers over BI-GRU-Capsnet is reducing the training time and giving more space for parallelization.more » « less
-
null (Ed.)Abstract Deep neural networks provide state-of-the-art performance for image denoising, where the goal is to recover a near noise-free image from a noisy observation. The underlying principle is that neural networks trained on large data sets have empirically been shown to be able to generate natural images well from a low-dimensional latent representation of the image. Given such a generator network, a noisy image can be denoised by (i) finding the closest image in the range of the generator or by (ii) passing it through an encoder-generator architecture (known as an autoencoder). However, there is little theory to justify this success, let alone to predict the denoising performance as a function of the network parameters. In this paper, we consider the problem of denoising an image from additive Gaussian noise using the two generator-based approaches. In both cases, we assume the image is well described by a deep neural network with ReLU activations functions, mapping a $$k$$-dimensional code to an $$n$$-dimensional image. In the case of the autoencoder, we show that the feedforward network reduces noise energy by a factor of $O(k/n)$. In the case of optimizing over the range of a generative model, we state and analyze a simple gradient algorithm that minimizes a non-convex loss function and provably reduces noise energy by a factor of $O(k/n)$. We also demonstrate in numerical experiments that this denoising performance is, indeed, achieved by generative priors learned from data.more » « less
-
Large language models have substantially advanced nuance and context understanding in natural language processing (NLP), further fueling the growth of intelligent conversational interfaces and virtual assistants. However, their hefty computational and memory demands make them potentially expensive to deploy on cloudless edge platforms with strict latency and energy requirements. For example, an inference pass using the state-of-the-art BERT-base model must serially traverse through 12 computationally intensive transformer layers, each layer containing 12 parallel attention heads whose outputs concatenate to drive a large feed-forward network. To reduce computation latency, several algorithmic optimizations have been proposed, e.g., a recent algorithm dynamically matches linguistic complexity with model sizes via entropy-based early exit. Deploying such transformer models on edge platforms requires careful co-design and optimizations from algorithms to circuits, where energy consumption is a key design consideration.more » « less
-
Say What? Automatic Modeling of Collaborative Problem Solving Skills from Student Speech in the WildWe investigated the feasibility of using automatic speech recognition (ASR) and natural language processing (NLP) to classify collaborative problem solving (CPS) skills from recorded speech in noisy environments. We analyzed data from 44 dyads of middle and high school students who used videoconferencing to collaboratively solve physics and math problems (35 and 9 dyads in school and lab environments, respectively). Trained coders identified seven cognitive and social CPS skills (e.g., sharing information) in 8,660 utterances. We used a state-of-theart deep transfer learning approach for NLP, Bidirectional Encoder Representations from Transformers (BERT), with a special input representation enabling the model to analyze adjacent utterances for contextual cues. We achieved a microaverage AUROC score (across seven CPS skills) of .80 using ASR transcripts, compared to .91 for human transcripts, indicating a decrease in performance attributable to ASR error. We found that the noisy school setting introduced additional ASR error, which reduced model performance (micro-average AUROC of .78) compared to the lab (AUROC = .83). We discuss implications for real-time CPS assessment and support in schools.more » « less
An official website of the United States government

