NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Life after BERT: What do Other Muppets Understand about Language?

https://doi.org/10.18653/v1/2022.acl-long.227

Lialin, Vladislav; Zhao, Kevin; Shivagunde, Namrata; Rumshisky, Anna (April 2022, Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers))

Full Text Available
BERT Busters: Outlier Dimensions that Disrupt Transformers

https://doi.org/10.18653/v1/2021.findings-acl.300

Kovaleva, Olga; Kulshreshtha, Saurabh; Rogers, Anna; Rumshisky, Anna (August 2021, Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021)

Full Text Available
A Primer in BERTology: What We Know About How BERT Works

https://doi.org/https://doi.org/10.1162/tacl_a_00349

Rogers, Anna; Kovaleva, Olga; Rumshisky, Anna (December 2020, Transactions of the Association for Computational Linguistics)
Das, Dipanjas (Ed.)
Transformer-based models have pushed state of the art in many areas of NLP, but our understanding of what is behind their success is still limited. This paper is the first survey of over 150 studies of the popular BERT model. We review the current state of knowledge about how BERT works, what kind of information it learns and how it is represented, common modifications to its training objectives and architecture, the overparameterization issue, and approaches to compression. We then outline directions for future research.
more » « less
Full Text Available
When BERT Plays the Lottery, All Tickets Are Winning

https://doi.org/10.18653/v1/2020.emnlp-main.259

Prasanna, Sai; Rogers, Anna; Rumshisky, Anna (November 2020, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP))
null (Ed.)
Large Transformer-based models were shown to be reducible to a smaller number of self-attention heads and layers. We consider this phenomenon from the perspective of the lottery ticket hypothesis, using both structured and magnitude pruning. For fine-tuned BERT, we show that (a) it is possible to find subnetworks achieving performance that is comparable with that of the full model, and (b) similarly-sized subnetworks sampled from the rest of the model perform worse. Strikingly, with structured pruning even the worst possible subnetworks remain highly trainable, indicating that most pre-trained BERT weights are potentially useful. We also study the “good” subnetworks to see if their success can be attributed to superior linguistic knowledge, but find them unstable, and not explained by meaningful self-attention patterns.
more » « less
Full Text Available
Revealing the Dark Secrets of BERT

https://doi.org/10.18653/v1/D19-1445

Kovaleva, Olga; Romanov, Alexey; Rogers, Anna; Rumshisky, Anna (January 2019, Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP))

Full Text Available

Search for: All records