skip to main content


This content will become publicly available on October 11, 2024

Title: The challenge of balancing model sensitivity and robustness in predicting yields: a benchmarking study of amide coupling reactions

A sensitive model captures the reactivity cliffs but overfit to yield outliers. On the other hand, a robust model disregards the yield outliers but underfits the reactivity cliffs.

 
more » « less
Award ID(s):
2202693
NSF-PAR ID:
10494606
Author(s) / Creator(s):
; ;
Publisher / Repository:
Royal Society of Chemistry
Date Published:
Journal Name:
Chemical Science
Volume:
14
Issue:
39
ISSN:
2041-6520
Page Range / eLocation ID:
10835 to 10846
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Chemists often use statistical analysis of reaction data with molecular descriptors to identify structure-reactivity relationships, which can enable prediction and mechanistic understanding. In this study, we developed a broadly applicable and quantitative classification workflow that identifies reactivity cliffs in 11 Ni- and Pd-catalyzed cross-coupling datasets using monodentate phosphine ligands. A distinctive ligand steric descriptor, minimum percent buried volume [% V bur (min)], is found to divide these datasets into active and inactive regions at a similar threshold value. Organometallic studies demonstrate that this threshold corresponds to the binary outcome of bisligated versus monoligated metal and that % V bur (min) is a physically meaningful and predictive representation of ligand structure in catalysis. 
    more » « less
  2. Abstract

    Debris flows are powered by sediment supplied from steep hillslopes where soils are often patchy and interrupted by bare‐bedrock cliffs. The role of patchy soils and cliffs in supplying sediment to channels remains unclear, particularly surrounding wildfire disturbances that heighten debris‐flow hazards by increasing sediment supply to channels. Here, we examine how variation in soil cover on hillslopes affects sediment sizes in channels surrounding the 2020 El Dorado wildfire, which burned debris‐flow prone slopes in the San Bernardino Mountains, California. We focus on six headwater catchments (<0.1 km2) where hillslope sources ranged from a continuous soil mantle to 95% bare‐bedrock cliffs. At each site, we measured sediment grain size distributions at the same channel locations before and immediately following the wildfire. We compared results to a mixing model that accounts for three distinct hillslope sediment sources distinguished by local slope thresholds. We find that channel sediment in fully soil‐mantled catchments reflects hillslope soils (D50 = 0.1–0.2 cm) both before and after the wildfire. In steeper catchments with cliffs, channel sediment is consistently coarse prior to fire (D50 = 6–32 cm) and reflects bedrock fracture spacing, despite cliffs representing anywhere from 5% to 95% of the sediment source area. Following the fire, channel sediment size reduces most (5‐ to 20‐fold) in catchments where hillslope sources are predominantly soil covered but with patches of cliffs. The abrupt fining of channel sediment is thought to facilitate postfire debris‐flow initiation, and our results imply that this effect is greatest where bare‐bedrock cliffs are present but not dominant. A patchwork of bare‐bedrock cliffs is common in steeplands where hillslopes respond to channel incision by landsliding. We show how local slope thresholds applied to such terrain aid in estimating sediment supply conditions before two destructive debris flows that eventually nucleated in these study catchments in 2022.

     
    more » « less
  3. Abstract

    The accelerated calving of ice shelves buttressing the Antarctic Ice Sheet may form unstable ice cliffs. The marine ice cliff instability hypothesis posits that cliffs taller than a critical height (~90 m) will undergo structural collapse, initiating runaway retreat in ice‐sheet models. This critical height is based on inferences from preexisting, static ice cliffs. Here we show how the critical height increases with the timescale of ice‐shelf collapse. We model failure mechanisms within an ice cliff deforming after removal of ice‐shelf buttressing stresses. If removal occurs rapidly, the cliff deforms primarily elastically and fails through tensile‐brittle fracture, even at relatively small cliff heights. As the ice‐shelf removal timescale increases, viscous relaxation dominates, and the critical height increases to ~540 m for timescales greater than days. A 90‐m critical height implies ice‐shelf removal in under an hour. Incorporation of ice‐shelf collapse timescales in prognostic ice‐sheet models will mitigate the marine ice cliff instability, implying less ice mass loss.

     
    more » « less
  4. Abstract Common designs of model evaluation typically focus on monolingual settings, where different models are compared according to their performance on a single data set that is assumed to be representative of all possible data for the task at hand. While this may be reasonable for a large data set, this assumption is difficult to maintain in low-resource scenarios, where artifacts of the data collection can yield data sets that are outliers, potentially making conclusions about model performance coincidental. To address these concerns, we investigate model generalizability in crosslinguistic low-resource scenarios. Using morphological segmentation as the test case, we compare three broad classes of models with different parameterizations, taking data from 11 languages across 6 language families. In each experimental setting, we evaluate all models on a first data set, then examine their performance consistency when introducing new randomly sampled data sets with the same size and when applying the trained models to unseen test sets of varying sizes. The results demonstrate that the extent of model generalization depends on the characteristics of the data set, and does not necessarily rely heavily on the data set size. Among the characteristics that we studied, the ratio of morpheme overlap and that of the average number of morphemes per word between the training and test sets are the two most prominent factors. Our findings suggest that future work should adopt random sampling to construct data sets with different sizes in order to make more responsible claims about model evaluation. 
    more » « less
  5. Abstract

    Nitryl chloride (ClNO2) plays an important role in the budget and distribution of tropospheric oxidants, halogens, and reactive nitrogen species. ClNO2is formed from the heterogeneous uptake and reaction of dinitrogen pentoxide (N2O5) on chloride‐containing aerosol, with a production yield,ϕ(ClNO2), defined as the moles of ClNO2produced relative to N2O5lost. Theϕ(ClNO2) has been increasingly incorporated into 3‐D chemical models where it is parameterized based on laboratory‐derived kinetics and currently accepted aqueous‐phase formation mechanism. This parameterization modelsϕ(ClNO2) as a function of the aerosol chloride to water molar ratio. Box model simulations of night flights during the 2015 Wintertime INvestigation of Transport, Emissions, and Reactivity (WINTER) aircraft campaign derived 3,425 individualϕ(ClNO2) values with a median of 0.138 and range of 0.003 to 1. Comparison of the box model median to those predicted by two other field‐basedϕ(ClNO2) derivation methods agreed within a factor of 1.3, within the uncertainties of each method. In contrast, the box model median was 75–84% lower than predictions from the laboratory‐based parameterization (i.e., [parameterization − box model]/parameterization). An evaluation of factors influencing this difference reveals a positive dependence ofϕ(ClNO2) on aerosol water, opposite to the currently parameterized trend. Additional factors may include aqueous‐phase competition reactions for the nitronium ion intermediate and/or direct ClNO2loss mechanisms. Further laboratory studies of ClNO2formation and the impacts of aerosol water, sulfate, organics, and ClNO2aqueous‐phase reactions are required to elucidate and quantify these processes on ambient aerosol, critical for the development of a robustϕ(ClNO2) parameterization.

     
    more » « less