NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Hypothesis Generation with Large Language Models

https://doi.org/10.18653/v1/2024.nlp4science-1.10

Zhou, Yangqiaoyu; Liu, Haokun; Srivastava, Tejes; Mei, Hongyuan; Tan, Chenhao (November 2024, Association for Computational Linguistics)

Full Text Available
Beyond Ensemble Averages: Leveraging Climate Model Ensembles for Subseasonal Forecasting

https://doi.org/10.1175/AIES-D-23-0103.1

Orlova, Elena; Liu, Haokun; Rossellini, Raphael; Cash, Benjamin_A; Willett, Rebecca (October 2024, Artificial Intelligence for the Earth Systems)

Abstract Producing high-quality forecasts of key climate variables, such as temperature and precipitation, on subseasonal time scales has long been a gap in operational forecasting. This study explores an application of machine learning (ML) models as postprocessing tools for subseasonal forecasting. Lagged numerical ensemble forecasts (i.e., an ensemble where the members have different initialization dates) and observational data, including relative humidity, pressure at sea level, and geopotential height, are incorporated into various ML methods to predict monthly average precipitation and 2-m temperature 2 weeks in advance for the continental United States. For regression, quantile regression, and tercile classification tasks, we consider using linear models, random forests, convolutional neural networks, and stacked models (a multimodel approach based on the prediction of the individual ML models). Unlike previous ML approaches that often use ensemble mean alone, we leverage information embedded in the ensemble forecasts to enhance prediction accuracy. Additionally, we investigate extreme event predictions that are crucial for planning and mitigation efforts. Considering ensemble members as a collection of spatial forecasts, we explore different approaches to using spatial information. Trade-offs between different approaches may be mitigated with model stacking. Our proposed models outperform standard baselines such as climatological forecasts and ensemble means. In addition, we investigate feature importance, trade-offs between using the full ensemble or only the ensemble mean, and different modes of accounting for spatial variability. Significance StatementAccurately forecasting temperature and precipitation on subseasonal time scales—2 weeks–2 months in advance—is extremely challenging. These forecasts would have immense value in agriculture, insurance, and economics. Our paper describes an application of machine learning techniques to improve forecasts of monthly average precipitation and 2-m temperature using lagged physics-based predictions and observational data 2 weeks in advance for the entire continental United States. For lagged ensembles, the proposed models outperform standard benchmarks such as historical averages and averages of physics-based predictions. Our findings suggest that utilizing the full set of physics-based predictions instead of the average enhances the accuracy of the final forecast.
more » « less
Human mobility and COVID-19 transmission: a systematic review and future directions

https://doi.org/10.1080/19475683.2022.2041725

Zhang, Mengxi; Wang, Siqin; Hu, Tao; Fu, Xiaokang; Wang, Xiaoyue; Hu, Yaxin; Halloran, Briana; Li, Zhenlong; Cui, Yunhe; Liu, Haokun; et al (October 2022, Annals of GIS)

Full Text Available
Fine-Tuned Transformers Show Clusters of Similar Representations Across Layers

https://doi.org/10.18653/v1/2021.blackboxnlp-1.42

Phang, Jason; Liu, Haokun; Bowman, Samuel R. (January 2021, Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP)

Full Text Available
Comparing Test Sets with Item Response Theory

Vania, Clara; Htut, Phu Mon; Huang, William; Mungra, Dhara; Yuanzhe Pang, Richard; Phang, Jason; Liu, Haokun; Cho, Kyunghyun; Bowman, Samuel R. (June 2021, Annual Meeting of the Association for Computational Linguistics)
null (Ed.)
Full Text Available
BLiMP: The Benchmark of Linguistic Minimal Pairs for English

https://doi.org/10.1162/tacl_a_00321

Warstadt, Alex; Parrish, Alicia; Liu, Haokun; Mohananey, Anhad; Peng, Wei; Wang, Sheng-Fu; Bowman, Samuel R. (December 2020, Transactions of the Association for Computational Linguistics)
null (Ed.)
We introduce The Benchmark of Linguistic Minimal Pairs (BLiMP), 1 a challenge set for evaluating the linguistic knowledge of language models (LMs) on major grammatical phenomena in English. BLiMP consists of 67 individual datasets, each containing 1,000 minimal pairs—that is, pairs of minimally different sentences that contrast in grammatical acceptability and isolate specific phenomenon in syntax, morphology, or semantics. We generate the data according to linguist-crafted grammar templates, and human aggregate agreement with the labels is 96.4%. We evaluate n-gram, LSTM, and Transformer (GPT-2 and Transformer-XL) LMs by observing whether they assign a higher probability to the acceptable sentence in each minimal pair. We find that state-of-the-art models identify morphological contrasts related to agreement reliably, but they struggle with some subtle semantic and syntactic phenomena, such as negative polarity items and extraction islands.
more » « less
Full Text Available
Learning Which Features Matter: RoBERTa Acquires a Preference for Linguistic Generalizations (Eventually)

https://doi.org/10.18653/v1/2020.emnlp-main.16

Warstadt, Alex; Zhang, Yian; Li, Xiaocheng; Liu, Haokun; Bowman, Samuel R. (January 2020, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP))
null (Ed.)
One reason pretraining on self-supervised linguistic tasks is effective is that it teaches models features that are helpful for language understanding. However, we want pretrained models to learn not only to represent linguistic features, but also to use those features preferentially during fine-turning. With this goal in mind, we introduce a new English-language diagnostic set called MSGS (the Mixed Signals Generalization Set), which consists of 20 ambiguous binary classification tasks that we use to test whether a pretrained model prefers linguistic or surface generalizations during finetuning. We pretrain RoBERTa from scratch on quantities of data ranging from 1M to 1B words and compare their performance on MSGS to the publicly available RoBERTa_BASE. We find that models can learn to represent linguistic features with little pretraining data, but require far more data to learn to prefer linguistic generalizations over surface ones. Eventually, with about 30B words of pretraining data, RoBERTa_BASE does consistently demonstrate a linguistic bias with some regularity. We conclude that while self-supervised pretraining is an effective way to learn helpful inductive biases, there is likely room to improve the rate at which models learn which features matter.
more » « less
Full Text Available
Investigating BERT’s Knowledge of Language: Five Analysis Methods with NPIs

https://doi.org/10.18653/v1/D19-1286

Warstadt, Alex; Cao, Yu; Grosu, Ioana; Peng, Wei; Blix, Hagen; Nie, Yining; Alsop, Anna; Bordia, Shikha; Liu, Haokun; Parrish, Alicia; et al (November 2019, Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP))

Full Text Available

Search for: All records