NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Can One Hear the Shape of a Molecule (from its Coulomb Matrix Eigenvalues)?

https://doi.org/10.1021/acs.jcim.0c00631

Schrier, Joshua (August 2020, Journal of Chemical Information and Modeling)

Full Text Available
Shapley Residuals: Quantifying the limits of the Shapley value for explanations.

Kumar, I.E.; Scheidegger, C.; Venkatasubramanian, S; Friedler, S. (January 2020, ICML Workshop on Workshop on Human Interpretability in Machine Learning (WHI))

Popular feature importance techniques compute additive approximations to nonlinear models by first defining a cooperative game describing the value of different subsets of the model’s features, then calculating the resulting game’s Shapley values to attribute credit additively between the features. However, the specific modeling settings in which the Shapley values are a poor approximation for the true game have not been well-described. In this paper we utilize an interpretation of Shapley values as the result of an orthogonal projection between vector spaces to calculate a residual representing the kernel component of that projection. We provide an algorithm for computing these residuals, characterize different modeling settings based on the value of the residuals, and demonstrate that they capture information about model predictions that Shapley values cannot.
more » « less
Full Text Available
Understanding structural adaptability: a reactant informatics approach to experiment design

https://doi.org/10.1039/C7ME00127D

Xu, Rosalind J.; Olshansky, Jacob H.; Adler, Philip D.; Huang, Yongjia; Smith, Matthew D.; Zeller, Matthias; Schrier, Joshua; Norquist, Alexander J. (January 2018, Molecular Systems Design & Engineering)

The structural and electronic adaptability ranges of a [VO(SeO 3 )(HSeO 3 )] framework found in organically templated vanadium selenites were determined using a three step approach, informed by cheminformatics descriptors, involving (i) the extraction of the most important reaction parameters from historical reaction data, (ii) a fractional factorial design on those parameters to better explore chemical space and (iii) decision tree construction on organic molecular properties to determine the factors governing framework formation. This process enabled the elucidation of both the structural and electronic adaptability ranges and provided the context to extract chemical understanding from the structural features that give rise to these respective ranges. This work resulted in the synthesis and structural determination of five new compounds.
more » « less
Full Text Available
Interpretable Active Learning

Phillips, Richard; Chang, Kyu Hyun; Friedler, Sorelle A. (January 2018, Conference on Fairness, Accountability, and Transparency)

Active learning has long been a topic of study in machine learning. However, as increasingly complex and opaque models have become standard practice, the process of active learning, too, has become more opaque. There has been little investigation into interpreting what specific trends and patterns an active learning strategy may be exploring. This work expands on the Local Interpretable Model-agnostic Explanations framework (LIME) to provide explanations for active learning recommendations. We demonstrate how LIME can be used to generate locally faithful explanations for an active learning strategy, and how these explanations can be used to understand how different models and datasets explore a problem space over time. These explanations can also be used to generate batches based on common sources of uncertainty. These regions of common uncertainty can be useful for understanding a model’s current weaknesses. In order to quantify the per-subgroup differences in how an active learning strategy queries spatial regions, we introduce a notion of uncertainty bias (based on disparate impact) to measure the discrepancy in the confidence for a model’s predictions between one subgroup and another. Using the uncertainty bias measure, we show that our query explanations accurately reflect the subgroup focus of the active learning queries, allowing for an interpretable explanation of what is being learned as points with similar sources of uncertainty have their uncertainty bias resolved. We demonstrate that this technique can be applied to track uncertainty bias over user-defined clusters or automatically generated clusters based on the source of uncertainty. We also measure how the choice of initial labeled examples effects groups over time.
more » « less
Full Text Available

Search for: All records