skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on April 1, 2026

Title: A Comparison of AI Weather Prediction and Numerical Weather Prediction Models for 1–7-Day Precipitation Forecasts
Abstract Pure artificial intelligence (AI)-based weather prediction (AIWP) models have made waves within the scientific community and the media, claiming superior performance to numerical weather prediction (NWP) models. However, these models often lack impactful output variables such as precipitation. One exception is Google DeepMind’s GraphCast model, which became the first mainstream AIWP model to predict precipitation, but performed only limited verification. We present an analysis of the ECMWF’s Integrated Forecasting System (IFS)-initialized (GRAPIFS) and the NCEP’s Global Forecast System (GFS)-initialized (GRAPGFS) GraphCast precipitation forecasts over the contiguous United States and compare to results from the GFS and IFS models using 1) grid-based, 2) neighborhood, and 3) object-oriented metrics verified against the fifth major global reanalysis produced by ECMWF (ERA5) and the NCEP/Environmental Modeling Center (EMC) stage IV precipitation analysis datasets. We affirmed that GRAPGFSand GRAPIFSperform better than the GFS and IFS in terms of root-mean-square error and stable equitable errors in probability space, but the GFS and IFS precipitation distributions more closely align with the ERA5 and stage IV distributions. Equitable threat score also generally favored GraphCast, particularly for lower accumulation thresholds. Fractions skill score for increasing neighborhood sizes shows greater gains for the GFS and IFS than GraphCast, suggesting the NWP models may have a better handle on intensity but struggle with the location. Object-based verification for GraphCast found positive area biases at low accumulation thresholds and large negative biases at high accumulation thresholds. GRAPGFSsaw similar performance gains to GRAPIFSwhen compared to their NWP counterparts, but initializing with the less familiar GFS conditions appeared to lead to an increase in light precipitation. Significance StatementPure artificial intelligence (AI)-based weather prediction (AIWP) has exploded in popularity with promises of better performance and faster run times than numerical weather prediction (NWP) models. However, less attention has been paid to their capability to predict impactful, sensible weather like precipitation, precipitation type, or specific meteorological features. We seek to address this gap by comparing precipitation forecast performance by an AI model called GraphCast to the Global Forecast System (GFS) and the Integrated Forecasting System (IFS) NWP models. While GraphCast does perform better on many verification metrics, it has some limitations for intense precipitation forecasts. In particular, it less frequently predicts intense precipitation events than the GFS or IFS. Overall, this article emphasizes the promise of AIWP while at the same time stresses the need for robust verification by domain experts.  more » « less
Award ID(s):
2019758
PAR ID:
10596371
Author(s) / Creator(s):
; ;
Publisher / Repository:
AMS
Date Published:
Journal Name:
Weather and Forecasting
Volume:
40
Issue:
4
ISSN:
0882-8156
Page Range / eLocation ID:
561 to 575
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Numerous artificial intelligence-based weather prediction (AIWP) models have emerged over the past 2 years, mostly in the private sector. There is an urgent need to evaluate these models from a meteorological perspective, but access to the output of these models is limited. We detail two new resources to facilitate access to AIWP model output data in the hope of accelerating the investigation of AIWP models by the meteorological community. First, a 3-yr (and growing) reforecast archive beginning in October 2020 containing twice daily 10-day forecasts forFourCastNet v2-small,Pangu-Weather, andGraphCast Operationalis now available via an Amazon Simple Storage Service (S3) bucket through NOAA’s Open Data Dissemination (NODD) program (https://noaa-oar-mlwp-data.s3.amazonaws.com/index.html). This reforecast archive was initialized with both the NOAA’s Global Forecast System (GFS) and ECMWF’s Integrated Forecasting System (IFS) initial conditions in the hope that users can begin to perform the feature-based verification of impactful meteorological phenomena. Second, real-time output for these three models is visualized on our web page (https://aiweather.cira.colostate.edu) along with output from the GFS and the IFS. This allows users to easily compare output between each AIWP model and traditional, physics-based models with the goal of familiarizing users with the characteristics of AIWP models and determine whether the output aligns with expectations, is physically consistent and reasonable, and/or is trustworthy. We view these two efforts as a first step toward evaluating whether these new AIWP tools have a place in forecast operations. 
    more » « less
  2. Abstract The development of deep learning (DL) weather forecasting models has made rapid progress and achieved comparable or better skill than traditional Numerical Weather prediction (NWP) models, which are generally computationally intensive. However, applications of these DL models have yet to be fully explored, including for severe convective events. We evaluate the DL model Pangu‐Weather in forecasting tornadic environments with one‐day lead times using convective available potential energy (CAPE), 0–6 bulk wind difference (BWD6), and 0–3 km storm‐relative helicity (SRH3). We also compare its performance to the National Centers for Environmental Prediction (NCEP)'s Global Forecast System (GFS), a traditional NWP model. Pangu‐Weather generally outperforms GFS in predicting BWD6 and SRH3 at the closest grid point and hour of the storm report. However, Pangu‐Weather tends to underpredict the maximum values of all convective parameters in the 1–2 hr before the storm across the surrounding grid points compared to the GFS. 
    more » « less
  3. Abstract Sierras de Córdoba (Argentina) is characterized by the occurrence of extreme precipitation events during the austral warm season. Heavy precipitation in the region has a large societal impact, causing flash floods. This motivates the forecast performance evaluation of 24-h accumulated precipitation and vertical profiles of atmospheric variables from different numerical weather prediction (NWP) models with the final aim of helping water management in the region. The NWP models evaluated include the Global Forecast System (GFS), which parameterizes convection, and convection-permitting simulations of the Weather Research and Forecasting (WRF) Model configured by three institutions: University of Illinois at Urbana–Champaign (UIUC), Colorado State University (CSU), and National Meteorological Service of Argentina (SMN). These models were verified with daily accumulated precipitation data from rain gauges and soundings during the RELAMPAGO-CACTI field campaign. Generally all configurations of the higher-resolution WRFs outperformed the lower-resolution GFS based on multiple metrics. Among the convection-permitting WRF Models, results varied with respect to rainfall threshold and forecast lead time, but the WRFUIUC mostly performed the best. However, elevation-dependent biases existed among the models that may impact the use of the data for different applications. There is a dry (moist) bias in lower (upper) pressure levels which is most pronounced in the GFS. For Córdoba an overestimation of the northern flow forecasted by the NWP configurations at lower levels was encountered. These results show the importance of convection-permitting forecasts in this region, which should be complementary to the coarser-resolution global model forecasts to help various users and decision-makers. 
    more » « less
  4. Abstract Global Forecast System (GFS), North American Mesoscale Forecast System (NAM), and High-Resolution Rapid Refresh (HRRR) 2-m temperature, 10-m wind speed, and precipitation accumulation forecasts initialized at 1200 UTC are verified against New York State Mesonet (NYSM) observations from 1 January 2018 through 31 December 2021. NYSM observations at 126 site locations are used to calculate standard error statistics (e.g., forecast error, root-mean-square error) for temperature and wind speed and contingency table statistics for precipitation across forecast hours, meteorological seasons, and regions. The majority of the focus is placed on the first 18 forecast hours to allow for comparison among all three models. A daily NYSM station-mean temperature error analysis identified a slight cold bias at temperatures below 25°C in the GFS, a cool-to-warm bias as forecast temperatures warm in the HRRR, and a warm bias at temperatures above 30°C in each model. Differences arise when considering temperature biases with respect to lead times and seasons. Wind speeds are overforecast at all ranges in each season, and forecast wind speeds ≥ 18 m s−1are rarely observed. Performance diagrams indicate overall good forecast performance at precipitation thresholds of 0.1–1.5 mm, but with a high frequency bias in the GFS and NAM. This paper provides an overview of deterministic forecast performance across New York State, with the aim of sharing common biases associated with temperature, wind speed, and precipitation with operational forecasters and is the first step in developing a real-time model forecast uncertainty prediction tool. 
    more » « less
  5. Abstract The traditional method for estimating weather forecast sensitivity to initial conditions uses adjoint models, which are limited to short lead times due to linearization around a control forecast. The advent of deep‐learning frameworks enables a new approach using backpropagation and gradient descent to iteratively optimize initial conditions, minimizing forecast errors. We apply this approach to the June 2021 Pacific Northwest heatwave using the GraphCast model, yielding over 90% reduction in 10‐day forecast errors over the Pacific Northwest. Similar improvements are found for Pangu‐Weather model forecasts initialized with the GraphCast‐derived optimal, suggesting that model error is an unimportant part of the perturbations. Eliminating small scales from the perturbations also yields similar forecast improvements. Extending the length of the optimization window, we find forecast improvement to about 23 days, suggesting atmospheric predictability at the upper end of recent estimates. 
    more » « less