The NSF Public Access Repository (PAR) system and access will be unavailable from 10:00 PM to 12:00 PM ET on Tuesday, March 25 due to maintenance. We apologize for the inconvenience.
Explore Research Products in the PAR It may take a few hours for recently added research products to appear in PAR search results.
Title: BERT & Family Eat Word Salad: Experiments with Text Understanding
In this paper, we study the response of large models from the BERT family to incoherent inputs that should confuse any model that claims to understand natural language. We define simple heuristics to construct such examples. Our experiments show that state-of-the-art models consistently fail to recognize them as ill-formed, and instead produce high confidence predictions on them. As a consequence of this phenomenon, models trained on sentences with randomly permuted word order perform close to state-of-the-art models. To alleviate these issues, we show that if models are explicitly trained to recognize invalid inputs, they can be robust to such attacks without a drop in performance. more »« less
Gao, Luyu; Dai, Zhuyun; Callan, Jamie
(, ACM SIGIR International Conference on the Theory of Information Retrieval)
null
(Ed.)
Deep language models, such as BERT pre-trained on large corpora,have given a huge performance boost to state-of-the-art information retrieval ranking systems. Knowledge embedded in such models allows them to pick up complex matching signals between passages and queries. However, the high computation cost during inference limits their deployment in real-world search scenarios. In this paper, we study if and how the knowledge for search within BERT can be transferred to a smaller ranker through distillation.Our experiments demonstrate that it is crucial to use a proper distillation procedure, which produces up to nine times speed upwhile preserving the state-of-the-art performance.
Chen, Qibin; Lacomis, Jeremy; Schwartz, Edward J.; Neubig, Graham; Vasilescu, Bogdan; Le Goues, Claire
(, International Conference on Software Engineering)
Variable names are critical for conveying intended program behavior. Machine learning-based program analysis methods use variable name representations for a wide range of tasks, such as suggesting new variable names and bug detection. Ideally, such methods could capture semantic relationships between names beyond syntactic similarity, e.g., the fact that the names average and mean are similar. Unfortunately, previous work has found that even the best of previous representation approaches primarily capture "relatedness" (whether two variables are linked at all), rather than "similarity" (whether they actually have the same meaning). We propose VarCLR, a new approach for learning semantic representations of variable names that effectively captures variable similarity in this stricter sense. We observe that this problem is an excellent fit for contrastive learning, which aims to minimize the distance between explicitly similar inputs, while maximizing the distance between dissimilar inputs. This requires labeled training data, and thus we construct a novel, weakly-supervised variable renaming dataset mined from GitHub edits. We show that VarCLR enables the effective application of sophisticated, general-purpose language models like BERT, to variable name representation and thus also to related downstream tasks like variable name similarity search or spelling correction. VarCLR produces models that significantly outperform the state-of-the-art on IdBench, an existing benchmark that explicitly captures variable similarity (as distinct from relatedness). Finally, we contribute a release of all data, code, and pre-trained models, aiming to provide a drop-in replacement for variable representations used in either existing or future program analyses that rely on variable names.
Jiang, Haoming; He, Pengcheng; Chen, Weizhu; Liu, Xiaodong; Gao, Jianfeng; Zhao, Tuo.
(, Annual Meeting of the Association for Computational Linguistics)
Transfer learning has fundamentally changed the landscape of natural language processing (NLP). Many state-of-the-art models are first pre-trained on a large text corpus and then fine-tuned on downstream tasks. However, due to limited data resources from downstream tasks and the extremely high complexity of pre-trained models, aggressive fine-tuning of- ten causes the fine-tuned model to overfit the training data of downstream tasks and fail to generalize to unseen data. To address such an issue in a principled manner, we propose a new learning framework for robust and efficient fine-tuning for pre-trained models to attain better generalization performance. The pro- posed framework contains two important in- gredients: 1. Smoothness-inducing regulariza- tion, which effectively manages the complex- ity of the model; 2. Bregman proximal point optimization, which is an instance of trust- region methods and can prevent aggressive up- dating. Our experiments show that the pro- posed framework achieves new state-of-the-art performance on a number of NLP tasks includ- ing GLUE, SNLI, SciTail and ANLI. More- over, it also outperforms the state-of-the-art T5 model, which is the largest pre-trained model containing 11 billion parameters, on GLUE.
Fatima, Sabahat
(, ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '21))
Self-tracking using commodity wearables such as smartwatches can help older adults reduce sedentary behaviors and engage in physical activity. However, activity recognition applications that are typically deployed in these wearables tend to be trained on datasets that best represent younger adults. We explore how our activity recognition model, a hybrid of long short-term memory and convolutional layers, pre-trained on smartwatch data from younger adults, performs on older adult data. We report results on week-long data from two older adults collected in a preliminary study in the wild with ground-truth annotations based on activPAL, a thigh-worn sensor. We find that activity recognition for older adults remains challenging even when comparing our model’s performance to state of the art deployed models such as the Google Activity Recognition API. More so, we show that models trained on younger adults tend to perform worse on older adults.
Mohan, Sreyas; Vincent, Joshua L.; Manzorro, Ramon; Crozier, Peter; Fernandez-Granda, Carlos; Simoncelli, Eero
(, Advances in neural information processing systems)
Deep convolutional neural networks (CNNs) for image denoising are typically trained on large datasets. These models achieve the current state of the art, but they do not generalize well to data that deviate from the training distribution. Recent work has shown that it is possible to train denoisers on a single noisy image. These models adapt to the features of the test image, but their performance is limited by the small amount of information used to train them. Here we propose "GainTuning'', a methodology by which CNN models pre-trained on large datasets can be adaptively and selectively adjusted for individual test images. To avoid overfitting, GainTuning optimizes a single multiplicative scaling parameter (the “Gain”) of each channel in the convolutional layers of the CNN. We show that GainTuning improves state-of-the-art CNNs on standard image-denoising benchmarks, boosting their denoising performance on nearly every image in a held-out test set. These adaptive improvements are even more substantial for test images differing systematically from the training data, either in noise level or image type. We illustrate the potential of adaptive GainTuning in a scientific application to transmission-electron-microscope images, using a CNN that is pre-trained on synthetic data. In contrast to the existing methodology, GainTuning is able to faithfully reconstruct the structure of catalytic nanoparticles from these data at extremely low signal-to-noise ratios.
Gupta, Ashim, Kvernadze, Giorgi, and Srikumar, Vivek. BERT & Family Eat Word Salad: Experiments with Text Understanding. Retrieved from https://par.nsf.gov/biblio/10283844. Proceedings of the AAAI Conference on Artificial Intelligence .
Gupta, Ashim, Kvernadze, Giorgi, & Srikumar, Vivek. BERT & Family Eat Word Salad: Experiments with Text Understanding. Proceedings of the AAAI Conference on Artificial Intelligence, (). Retrieved from https://par.nsf.gov/biblio/10283844.
Gupta, Ashim, Kvernadze, Giorgi, and Srikumar, Vivek.
"BERT & Family Eat Word Salad: Experiments with Text Understanding". Proceedings of the AAAI Conference on Artificial Intelligence (). Country unknown/Code not available. https://par.nsf.gov/biblio/10283844.
@article{osti_10283844,
place = {Country unknown/Code not available},
title = {BERT & Family Eat Word Salad: Experiments with Text Understanding},
url = {https://par.nsf.gov/biblio/10283844},
abstractNote = {In this paper, we study the response of large models from the BERT family to incoherent inputs that should confuse any model that claims to understand natural language. We define simple heuristics to construct such examples. Our experiments show that state-of-the-art models consistently fail to recognize them as ill-formed, and instead produce high confidence predictions on them. As a consequence of this phenomenon, models trained on sentences with randomly permuted word order perform close to state-of-the-art models. To alleviate these issues, we show that if models are explicitly trained to recognize invalid inputs, they can be robust to such attacks without a drop in performance.},
journal = {Proceedings of the AAAI Conference on Artificial Intelligence},
author = {Gupta, Ashim and Kvernadze, Giorgi and Srikumar, Vivek},
editor = {null}
}
Warning: Leaving National Science Foundation Website
You are now leaving the National Science Foundation website to go to a non-government website.
Website:
NSF takes no responsibility for and exercises no control over the views expressed or the accuracy of
the information contained on this site. Also be aware that NSF's privacy policy does not apply to this site.