skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on February 1, 2026

Title: Data Augmentation Strategies for Improved PM2.5 Forecasting Using Transformer Architectures
Breathing in fine particulate matter of diameter less than 2.5 µm (PM2.5) greatly increases an individual’s risk of cardiovascular and respiratory diseases. As climate change progresses, extreme weather events, including wildfires, are expected to increase, exacerbating air pollution. However, models often struggle to capture extreme pollution events due to the rarity of high PM2.5 levels in training datasets. To address this, we implemented cluster-based undersampling and trained Transformer models to improve extreme event prediction using various cutoff thresholds (12.1 µg/m3 and 35.5 µg/m3) and partial sampling ratios (10/90, 20/80, 30/70, 40/60, 50/50). Our results demonstrate that the 35.5 µg/m3 threshold, paired with a 20/80 partial sampling ratio, achieved the best performance, with an RMSE of 2.080, MAE of 1.386, and R2 of 0.914, particularly excelling in forecasting high PM2.5 events. Overall, models trained on augmented data significantly outperformed those trained on original data, highlighting the importance of resampling techniques in improving air quality forecasting accuracy, especially for high-pollution scenarios. These findings provide critical insights into optimizing air quality forecasting models, enabling more reliable predictions of extreme pollution events. By advancing the ability to forecast high PM2.5 levels, this study contributes to the development of more informed public health and environmental policies to mitigate the impacts of air pollution, and advanced the technology for building better air quality digital twins.  more » « less
Award ID(s):
1841520
PAR ID:
10574332
Author(s) / Creator(s):
; ;
Publisher / Repository:
Atmosphere
Date Published:
Journal Name:
Atmosphere
Volume:
16
Issue:
2
ISSN:
2073-4433
Page Range / eLocation ID:
127
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Accurate air pollution monitoring is critical to understand and mitigate the impacts of air pollution on human health and ecosystems. Due to the limited number and geographical coverage of advanced, highly accurate sensors monitoring air pollutants, many low-cost and low-accuracy sensors have been deployed. Calibrating low-cost sensors is essential to fill the geographical gap in sensor coverage. We systematically examined how different machine learning (ML) models and open-source packages could help improve the accuracy of particulate matter (PM) 2.5 data collected by Purple Air sensors. Eleven ML models and five packages were examined. This systematic study found that both models and packages impacted accuracy, while the random training/testing split ratio (e.g., 80/20 vs. 70/30) had minimal impact (0.745% difference for R2). Long Short-Term Memory (LSTM) models trained in RStudio and TensorFlow excelled, with high R2 scores of 0.856 and 0.857 and low Root Mean Squared Errors (RMSEs) of 4.25 µg/m3 and 4.26 µg/m3, respectively. However, LSTM models may be too slow (1.5 h) or computation-intensive for applications with fast response requirements. Tree-boosted models including XGBoost (0.7612, 5.377 µg/m3) in RStudio and Random Forest (RF) (0.7632, 5.366 µg/m3) in TensorFlow offered good performance with shorter training times (<1 min) and may be suitable for such applications. These findings suggest that AI/ML models, particularly LSTM models, can effectively calibrate low-cost sensors to produce precise, localized air quality data. This research is among the most comprehensive studies on AI/ML for air pollutant calibration. We also discussed limitations, applicability to other sensors, and the explanations for good model performances. This research can be adapted to enhance air quality monitoring for public health risk assessments, support broader environmental health initiatives, and inform policy decisions. 
    more » « less
  2. In recent years, air pollution has caused more than 1 million deaths per year in China, making it a major focus of public health efforts. However, future climate change may exacerbate such human health impacts by increasing the frequency and duration of weather conditions that enhance air pollution exposure. Here, we use a combination of climate, air quality, and epidemiological models to assess future air pollution deaths in a changing climate under Representative Concentration Pathway 4.5 (RCP4.5). We find that, assuming pollution emissions and population are held constant at current levels, climate change would adversely affect future air quality for >85% of China’s population (∼55% of land area) by the middle of the century, and would increase by 3% and 4% the population-weighted average concentrations of fine particulate matter (PM2.5) and ozone, respectively. As a result, we estimate an additional 12,100 and 8,900 Chinese (95% confidence interval: 10,300 to 13,800 and 2,300 to 14,700, respectively) will die per year from PM2.5 and ozone exposure, respectively. The important underlying climate mechanisms are changes in extreme conditions such as atmospheric stagnation and heat waves (contributing 39% and 6%, respectively, to the increase in mortality). Additionally, greater vulnerability of China’s aging population will further increase the estimated deaths from PM2.5 and ozone in 2050 by factors of 1 and 3, respectively. Our results indicate that climate change and more intense extremes are likely to increase the risk of severe pollution events in China. Managing air quality in China in a changing climate will thus become more challenging. 
    more » « less
  3. null (Ed.)
    Short-term exposure to fine particulate matter (PM2.5) pollution is linked to numerous adverse health effects. Pollution episodes, such as wildfires, can lead to substantial increases in PM2.5 levels. However, sparse regulatory measurements provide an incomplete understanding of pollution gradients. Here, we demonstrate an infrastructure that integrates community-based measurements from a network of low-cost PM2.5 sensors with rigorous calibration and a Gaussian process model to understand neighborhood-scale PM2.5 concentrations during three pollution episodes (July 4, 2018, fireworks; July 5 and 6, 2018, wildfire; Jan 3−7, 2019, persistent cold air pool, PCAP). The firework/wildfire events included 118 sensors in 84 locations, while the PCAP event included 218 sensors in 138 locations. The model results accurately predict reference measurements during the fireworks (n: 16, hourly root-mean-square error, RMSE, 12.3−21.5 μg/m3, n(normalized)-RMSE: 9−24%), the wildfire (n: 46, RMSE: 2.6−4.0 μg/m3; nRMSE: 13.1−22.9%), and the PCAP (n: 96, RMSE: 4.9−5.7 μg/m3; nRMSE: 20.2−21.3%). They also revealed dramatic geospatial differences in PM2.5 concentrations that are not apparent when only considering government measurements or viewing the US Environmental Protection Agency’s AirNow’s visualizations. Complementing the PM2.5 estimates and visualizations are highly resolved uncertainty maps. Together, these results illustrate the potential for low-cost sensor networks that combined with a data-fusion algorithm and appropriate calibration and training can dynamically and with improved accuracy estimate PM2.5 concentrations during pollution episodes. These highly resolved uncertainty estimates can provide a much-needed strategy to communicate uncertainty to end users. 
    more » « less
  4. In urban areas like Chicago, daily life extends above ground level due to the prevalence of high-rise buildings where residents and commuters live and work. This study examines the variation in fine particulate matter (PM2.5) concentrations across building stories. PM2.5 levels were measured using PurpleAir sensors, installed between 8 April and 7 May 2023, on floors one, four, six, and nine of an office building in Chicago. Additionally, data were collected from a public outdoor PurpleAir sensor on the fourteenth floor of a condominium located 800 m away. The results show that outdoor PM2.5 concentrations peak at 14 m height, and then decline by 0.11 μg/m3 per meter elevation, especially noticeable from midnight to 8 a.m. under stable atmospheric conditions. Indoor PM2.5 concentrations increase steadily by 0.02 μg/m3 per meter elevation, particularly during peak work hours, likely caused by greater infiltration rates at higher floors. Both outdoor and indoor concentrations peak around noon. We find that indoor and outdoor PM2.5 are positively correlated, with indoor levels consistently remaining lower than outside levels. These findings align with previous research suggesting decreasing outdoor air pollution concentrations with increasing height. The study informs decision-making by community members and policymakers regarding air pollution exposure in urban settings. 
    more » « less
  5. A regional modeling system that integrates the state-of-the-art emissions processing (SMOKE), climate (CWRF), and air quality (CMAQ) models has been combined with satellite measurements of fire activities to assess the impact of fire emissions on the contiguous United States (CONUS) air quality during 1997–2016. The system realistically reproduced the spatiotemporal distributions of the observed meteorology and surface air quality, with a slight overestimate of surface ozone (O3) by ~4% and underestimate of surface PM2.5 by ~10%. The system simulation showed that the fire impacts on primary pollutants such as CO were generally confined to the fire source areas but its effects on secondary pollutants like O3 spread more broadly. The fire contribution to air quality varied greatly during 1997-2016 and occasionally accounted for more than 100 ppbv of monthly mean surface CO and over 20 µg m−3 of monthly mean PM2.5 in the Northwest U.S. and Northern California, two regions susceptible to frequent fires. Fire emissions also had implications on air quality compliance. From 1997 to 2016, fire emissions increased surface 8-hour O3 standard exceedances by 10% and 24-hour PM2.5 exceedances by 33% over CONUS. 
    more » « less