When BERT Plays the Lottery, All Tickets Are Winning

Prasanna, Sai; Rogers, Anna; Rumshisky, Anna

doi:10.18653/v1/2020.emnlp-main.259

Citation Details

When BERT Plays the Lottery, All Tickets Are Winning

Large Transformer-based models were shown to be reducible to a smaller number of self-attention heads and layers. We consider this phenomenon from the perspective of the lottery ticket hypothesis, using both structured and magnitude pruning. For fine-tuned BERT, we show that (a) it is possible to find subnetworks achieving performance that is comparable with that of the full model, and (b) similarly-sized subnetworks sampled from the rest of the model perform worse. Strikingly, with structured pruning even the worst possible subnetworks remain highly trainable, indicating that most pre-trained BERT weights are potentially useful. We also study the “good” subnetworks to see if their success can be attributed to superior linguistic knowledge, but find them unstable, and not explained by meaningful self-attention patterns. more »

Award ID(s):: 1844740

PAR ID:: 10216558

Author(s) / Creator(s):: Prasanna, Sai; Rogers, Anna; Rumshisky, Anna

Date Published:: 2020-11-01

Journal Name:: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Page Range / eLocation ID:: 3208 to 3229

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.18653/v1/2020.emnlp-main.259

More Like this