skip to main content

Title: Machine-learning based reconstructions of primary and secondary climate variables from North American and European fossil pollen data

We test several quantitative algorithms as palaeoclimate reconstruction tools for North American and European fossil pollen data, using both classical methods and newer machine-learning approaches based on regression tree ensembles and artificial neural networks. We focus on the reconstruction of secondary climate variables (here, January temperature and annual water balance), as their comparatively small ecological influence compared to the primary variable (July temperature) presents special challenges to palaeo-reconstructions. We test the pollen–climate models using a novel and comprehensive cross-validation approach, running a series ofh-block cross-validations usinghvalues of 100–1500 km. Our study illustrates major benefits of this variableh-block cross-validation scheme, as the effect of spatial autocorrelation is minimized, while the cross-validations with increasinghvalues can reveal instabilities in the calibration model and approximate challenges faced in palaeo-reconstructions with poor modern analogues. We achieve well-performing calibration models for both primary and secondary climate variables, with boosted regression trees providing the overall most robust performance, while the palaeoclimate reconstructions from fossil datasets show major independent features for the primary and secondary variables. Our results suggest that with careful variable selection and consideration of ecological processes, robust reconstruction of both primary and secondary climate variables is possible.

; ; ;
Publication Date:
Journal Name:
Scientific Reports
Nature Publishing Group
Sponsoring Org:
National Science Foundation
More Like this
  1. In north-western North America, the so-called divergence problem (DP) is expressed in tree ring width (RW) as an unstable temperature signal in recent decades. Maximum latewood density (MXD), from the same region, shows minimal evidence of DP. While MXD is a superior proxy for summer temperatures, there are very few long MXD records from North America. Latewood blue intensity (LWB) measures similar wood properties as MXD, expresses a similar climate response, is much cheaper to generate and thereby could provide the means to profoundly expand the extant network of temperature sensitive tree-ring (TR) chronologies in North America. In this study, LWB is measured from 17 white spruce sites ( Picea glauca) in south-western Yukon to test whether LWB is immune to the temporal calibration instabilities observed in RW. A number of detrending methodologies are examined. The strongest calibration results for both RW and LWB are consistently returned using age-dependent spline (ADS) detrending within the signal-free (SF) framework. RW data calibrate best with June–July maximum temperatures (Tmax), explaining up to 28% variance, but all models fail validation and residual analysis. In comparison, LWB calibrates strongly (explaining 43–51% of May–August Tmax) and validates well. The reconstruction extends to 1337 CE, but uncertaintiesmore »increase substantially before the early 17th century because of low replication. RW-, MXD- and LWB-based summer temperature reconstructions from the Gulf of Alaska, the Wrangell Mountains and Northern Alaska display good agreement at multi-decadal and higher frequencies, but the Yukon LWB reconstruction appears potentially limited in its expression of centennial-scale variation. While LWB improves dendroclimatic calibration, future work must focus on suitably preserved sub-fossil material to increase replication prior to 1650 CE.« less
  2. Abstract

    The Prairie Pothole Region (PPR), located in central North America, is an important region hydrologically and ecologically. Millions of wetlands, many containing ponds, are located here, and they serve as habitats for various biota and breeding grounds for waterfowl. They also provide carbon sequestration, sediment and nutrient attenuation, and floodwater storage. Land modification and climate change are threatening the PPR, and water and wildlife managers face important conservation decisions due to these threats. We developed predictive, multisite forecasting models using canonical correlation analysis (CCA) for pond counts in the southeast PPR, the portion located within the United States, to aid in these important decisions. These forecast models predict spring (May) and summer (July) pond counts for each region (stratum) of the United States Fish and Wildlife Service’s pond and waterfowl surveys using a suite of antecedent, large-scale climate variables and indices including 500 millibar heights, sea surface temperatures (SSTs), and Palmer Drought Severity Index (PDSI). Models were developed to issue forecasts at the start of all preceding months beginning on March 1st. The models were evaluated for their performance in a predictive mode by leave-one-out cross-validation. The models exhibited good performance (Rvalues above 0.6 for May forecasts and 0.4more »for July forecasts), with performance increasing as lead time decreased. This simple and versatile modeling approach offers a robust tool for efficient management and sustainability of ecology and natural resources. It demonstrates the ability to use large-scale climate variables to predict a local variable in a skilful way and could serve as an example to develop similar models for use in management and conservation decisions in other regions and sectors of the environment.

    « less
  3. Although extended or ‘protracted’ El Niño and La Niña episodes were first suggested nearly 20 years ago, they have not received the attention of other ‘flavours’ of the El Niño–Southern Oscillation (ENSO) or low-frequency ‘ENSO-like’ phenomena. In this study, instrumental variables and palaeoclimatic reconstructions are used to investigate the most recent ‘protracted’ El Niño episode in 2014–2016, and place it into a longer historical context. Although just reaching the threshold for such an episode, the 2014–2016 ‘protracted’ El Niño had very severe societal, agricultural, environmental and ecological impacts, particularly in western Pacific regions like eastern Australia. We show that although ‘protracted’ ENSO episodes of either phase cause similar, near-global modulations of weather and climate as during more ‘classical’ events, impacts associated with ‘protracted’ episodes last longer, with strong influences in eastern Australia. The latter is a response to the dominance of Niño 4 sea surface temperature (SST) and associated atmospheric teleconnection anomalies during ‘protracted’ ENSO episodes. Importantly, while Niño 4 SST anomalies recorded during the austral summer of 2016 were the highest values on record, an analysis of long-term palaeoclimate records indicates that there may have been episodes of greater magnitude and duration than seen in instrumental observations. This suggestsmore »that shorter instrumental observations may underestimate the risks of possible future ENSO extremes compared with those observed from multi-century palaeoclimate records. Improved knowledge of ENSO and the potential to forecast ‘protracted’ episodes would be of immense practical benefit to communities affected by the severe impacts of ENSO extremes.« less
  4. Abstract. The Last Millennium Reanalysis (LMR) utilizes an ensemble methodology to assimilate paleoclimate data for the production of annually resolved climate field reconstructions of the Common Era. Two key elements are the focus of this work: the set of assimilated proxy records and the forward models that map climate variables to proxy measurements. Results based on an updated proxy database and seasonal regression-based forward models are compared to the LMR prototype, which was based on a smaller set of proxy records and simpler proxy models formulated as univariate linear regressions against annual temperature. Validation against various instrumental-era gridded analyses shows that the new reconstructions of surface air temperature and 500 hPa geopotential height are significantly improved (from 10 % to more than 100 %), while improvements in reconstruction of the Palmer Drought Severity Index are more modest. Additional experiments designed to isolate the sources of improvement reveal the importance of the updated proxy records, including coral records for improving tropical reconstructions, and tree-ring density records for temperature reconstructions, particularly in high northern latitudes. Proxy forward models that account for seasonal responses, and dependence on both temperature and moisture for tree-ring width, also contribute to improvements in reconstructed thermodynamic and hydroclimate variables in midlatitudes.more »The variability of temperature at multidecadal to centennial scales is also shown to be sensitive to the set of assimilated proxies, especially to the inclusion of primarily moisture-sensitive tree-ring-width records.« less
  5. Variable selection plays a fundamental role in high-dimensional data analysis. Various methods have been developed for variable selection in recent years. Well-known examples are forward stepwise regression (FSR) and least angle regression (LARS), among others. These methods typically add variables into the model one by one. For such selection procedures, it is crucial to find a stopping criterion that controls model complexity. One of the most commonly used techniques to this end is cross-validation (CV) which, in spite of its popularity, has two major drawbacks: expensive computational cost and lack of statistical interpretation. To overcome these drawbacks, we introduce a flexible and efficient test-based variable selection approach that can be incorporated into any sequential selection procedure. The test, which is on the overall signal in the remaining inactive variables, is based on the maximal absolute partial correlation between the inactive variables and the response given active variables. We develop the asymptotic null distribution of the proposed test statistic as the dimension tends to infinity uniformly in the sample size. We also show that the test is consistent. With this test, at each step of the selection, a new variable is included if and only if the -value is below somemore »pre-defined level. Numerical studies show that the proposed method delivers very competitive performance in terms of variable selection accuracy and computational complexity compared to CV.« less