Yang, Hongru, Kailkhura, Bhavya, Wang, Zhangyang, and Liang, Yingbin. Training dynamics of transformers to recognize word co-occurrence via gradient flow analysis. Retrieved from https://par.nsf.gov/biblio/10599606.
Yang, Hongru, Kailkhura, Bhavya, Wang, Zhangyang, & Liang, Yingbin. Training dynamics of transformers to recognize word co-occurrence via gradient flow analysis. Retrieved from https://par.nsf.gov/biblio/10599606.
Yang, Hongru, Kailkhura, Bhavya, Wang, Zhangyang, and Liang, Yingbin.
"Training dynamics of transformers to recognize word co-occurrence via gradient flow analysis". Country unknown/Code not available: Advances in Neural Information Processing Systems (NeurIPS). https://par.nsf.gov/biblio/10599606.
@article{osti_10599606,
place = {Country unknown/Code not available},
title = {Training dynamics of transformers to recognize word co-occurrence via gradient flow analysis},
url = {https://par.nsf.gov/biblio/10599606},
abstractNote = {},
journal = {},
publisher = {Advances in Neural Information Processing Systems (NeurIPS)},
author = {Yang, Hongru and Kailkhura, Bhavya and Wang, Zhangyang and Liang, Yingbin},
}
Warning: Leaving National Science Foundation Website
You are now leaving the National Science Foundation website to go to a non-government website.
Website:
NSF takes no responsibility for and exercises no control over the views expressed or the accuracy of
the information contained on this site. Also be aware that NSF's privacy policy does not apply to this site.