<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Combining Machine Learning and Computational Chemistry for Predictive Insights Into Chemical Systems</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>08/25/2021</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10310190</idno>
					<idno type="doi">10.1021/acs.chemrev.1c00107</idno>
					<title level='j'>Chemical Reviews</title>
<idno>0009-2665</idno>
<biblScope unit="volume">121</biblScope>
<biblScope unit="issue">16</biblScope>					

					<author>John A. Keith</author><author>Valentin Vassilev-Galindo</author><author>Bingqing Cheng</author><author>Stefan Chmiela</author><author>Michael Gastegger</author><author>Klaus-Robert Müller</author><author>Alexandre Tkatchenko</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Machine learning models are poised to make a transformative impact on chemical sciences by dramatically accelerating computational algorithms and amplifying insights available from computational chemistry methods. However, achieving this requires a confluence and coaction of expertise in computer science and physical sciences. This Review is written for new and experienced researchers working at the intersection of both fields. We first provide concise tutorials of computational chemistry and machine learning methods, showing how insights involving both can be achieved. We follow with a critical review of noteworthy applications that demonstrate how computational chemistry and machine learning can be used together to provide insightful (and useful) predictions in molecular and materials modeling, retrosyntheses, catalysis, and drug design.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>1. Introduction 9817 1.1. Background 9817 1.2. Motivation for This Review 9817 2. CompChem and Notable Intersections with ML 9820 2.1. Computational Modeling, Data, and Information Across Many Scales 9820 2.1.1. Models and Levels of Abstraction 9820 2.1.2. CompChem Representations 9820 2.1.3. Method Accuracy 9821 2.1.4. Precision and Reproducibility 9821 2.2. Hierarchies of Methods 9821 2.2.1. Wavefunction Theory Methods 9823 2.2.2. Correlated Wavefunction Methods 9824 2.2.3. Density Functional Theory 9825 2.2.4. Semiempirical Methods 9826 2.2.5. Nuclear Quantum Effects 9827 2.2.6. Interatomic Potentials 9827 2.3. Response Properties 9829 2.4. Solvation Models 9829 2.5. Insightful Predictions for Molecular and Material Properties 9830 3. Machine Learning Tutorial and Intersections with Chemistry 9830 3.1. What is ML? 9831 3.1.1. What Does ML Do Well? 9832 3.1.2. What Does ML Do Poorly? 9832 3.2. Types of Learning 9833 3.2.1. Supervised Learning 9833 3.2.2. Unsupervised Learning 9833 3.2.3. Reinforcement Learning 9833 3.3. Universal Approximators 9834 3.4. ML Workflow 9834 3.4.1. Data Sets 9834 3.4.2. Descriptors 9835 3.4.3. Training 9835 4. Applications of Machine Learning to Chemical Systems 9836 4.1. Representing Chemical Systems 9837 4.1.1. Descriptors 9837 4.1.2. Representing Local Environments 9837 4.1.3. Locality Approximation 9839 4.1.4. Advantages of Built-In Symmetries 9839 4.1.5. End-to-End NN Representations 9840 4.2. From Descriptors to Predictions 9840 4.3. CompChem Data 9842 4.3.1. Benchmark Data Sets 9842 4.3.2. Visualization of Data Sets 9842 4.3.3. Text and Data Mining for Chemistry 9843 4.4. Transforming Atomistic Modeling 9844 4.4.1. Predicting Thermodynamic Properties 9844 4.4.2. Nuclear Quantum Effects 9844 Special Issue: Machine Learning at the Atomic Scale 1. INTRODUCTION 1.1. Background</p><p>A lasting challenge in applied physical and chemical sciences has been to answer the question: how can one identif y and make chemical compounds or materials that have optimal properties for a given purpose? A substantial part of research in physics, chemistry, and materials science concerns the discovery and characterization of novel compounds that can benefit society, but most advances still are generally attributed to trial-and-error experimentation, and this requires significant time and cost. Current global challenges create greater urgency for faster, better, and less expensive research and development efforts. Computational chemistry (CompChem) methods have significantly improved over time, and they promise paradigm shifts in how compounds are fundamentally understood and designed for specific applications. Machine learning (ML) methods have in the past decades witnessed an unprecedented technological evolution enabling a plethora of applications, some of which have become daily companions in our lives. <ref type="bibr">[1]</ref><ref type="bibr">[2]</ref><ref type="bibr">[3]</ref> Applications of ML include technological fields, such as web search, translation, natural language processing, self-driving vehicles, control architectures, and in the sciences, for example, medical diagnostics, <ref type="bibr">[4]</ref><ref type="bibr">[5]</ref><ref type="bibr">[6]</ref><ref type="bibr">[7]</ref><ref type="bibr">[8]</ref> particle physics, <ref type="bibr">9</ref> nano sciences, <ref type="bibr">10</ref> bioinformatics, <ref type="bibr">11,</ref><ref type="bibr">12</ref> braincomputer interfaces, <ref type="bibr">13</ref> social media analysis, <ref type="bibr">14</ref> robotics, <ref type="bibr">15,</ref><ref type="bibr">16</ref> and team, social, or board games. <ref type="bibr">[17]</ref><ref type="bibr">[18]</ref><ref type="bibr">[19]</ref> These methods have also become popular for accelerating the discovery and design of new materials, chemicals, and chemical processes. <ref type="bibr">20</ref> At the same time, we have witnessed hype, criticism, and misunderstanding about how ML tools are to be used in chemical research. From this, we see a need for researchers working at the intersection of CompChem+ML to more critically recognize the true strengths and weaknesses of each component in any given study. Specifically, we wanted to review why and how CompChem+ML can provide useful insights into the study of molecules and materials.</p><p>While developing this Review, we polled the scientific community with an anonymous online survey that asked for questions and concerns regarding the use of ML models with chemistry applications. Respondents raised excellent points including:</p><p>1. ML methods are becoming less understood while they are also more regularly used as black box tools.</p><p>2. Many publications show inadequate technical expertise in ML (e.g., inappropriate splitting of training, testing, and validation sets). 3. It can be difficult to compare different ML methods and know which is the best for a particular application or whether ML should even be used at all. 4. Data quality and context are often missing from ML modeling, and data sets need to be made freely available and clearly explained. Additionally, when asked about the most exciting active and emerging areas of ML in the next five years, respondents mentioned a wide range of topics from catalysis discovery, drug and peptide design, "above the arrow" reaction predictions, and generative models that promise to fundamentally transform chemical discovery. When asked about challenges that ML will not surmount in the next five years, respondents mentioned modeling complex photochemical and electrochemical environments, discovering exact exchange-correlation functionals, and completely autonomous reaction discovery. This Review will give our perspective on many of these topics.</p><p>As context for this Review, Figure <ref type="figure">1</ref> shows a heatmap depicting the frequency of ML keywords found in scientific articles that also have keywords associated with different American Chemical Society (ACS) technical divisions. Preparing this figure required several steps. First, lists of ML keywords were chosen. Second, lists of keywords were created by perusing ACS division symposia titles from over the past five years. Third, Python scripts used Scopus Application Programming Interfaces (APIs) to identify the number of scientific publications that matched sets of ML and division symposia keywords. Figure <ref type="figure">1</ref> elucidates several interesting points. First, the most popular ML approaches across all divisions are clearly neural networks, followed by genetic algorithms and support vector machines/kernel methods. Second, divisions such as physical (PHYS), analytical (ANYL), and environmental (ENVR) are already using diverse sets of ML approaches, while divisions such as inorganic (INOR), nuclear (NUCL), and carbohydrate (CARB) are primarily employing more distinct subsets of approaches, while other divisions, such as educational (CHED), history (HIST), law (CHAL), and business-oriented divisions (BMGT and SCHB), that is, divisions that produce much fewer scholarly journal articles, are not linking to publications that mention ML. Third, ML has had more prevalence across practically all divisions over time. For further insight, Table <ref type="table">1</ref> lists the top four keywords obtained from recent ACS symposium titles, as well as their respective contribution percentage reflected in Figure <ref type="figure">1</ref>. There, one sees that a handful of keywords can significantly overshadow matches in some of the bins, for exampled, "electro", "sensor", "protein", and "plastic". With any ML application, there will be a risk of imperfect data or user bias, but this is a useful launch point to appreciate how and where ML is being used in chemical sciences. A key takeaway is that we are witnessing an unprecedented crescendo in interest in ML over the last ten years (e.g., Figure <ref type="figure">1c</ref>) thanks to improved understanding of the intersectionality of traditional science and engineering disciplines with rapidly evolving disciplines such as CompChem and data science.  <ref type="table">1</ref> are freely available with a creative commons attribution license. Readers are welcome to use, adapt, and share these scripts with appropriate attribution: <ref type="url">https://github.com/keithgroup/scopus_searching_ML_in_chem_literature.</ref>).</p><p>CompChem, ML, and chemical and physical intuition (CPI). This review will classify concepts using a rendition of a "data to wisdom" hierarchy, Figure <ref type="figure">2</ref>. Scholars have noted shortcomings with similar constructs, <ref type="bibr">21</ref> but we use it to reflect a stepladder for scientific progress, starting from collecting data and ending with overall impact. CompChem, ML, and CPI each have different strengths and weaknesses and bring synergistic opportunities. CPI alone can be employed to climb the ladder from data to impact, but current CPI may only provide limited understanding or applicability outside of available data sets. However, CompChem is extraordinarily well-suited for generating high quality data that contain useful information (vide infra, section 2) often more easily than via traditional experimentation. ML is likewise extremely wellsuited for recognizing and accurately quantifying nonlinear relationships (vide infra, section 3), a task that is especially difficult for even the most expert-level CPI alone. A key opportunity is that useful ML requires robust data sets, and these can be provided by CompChem as long as the CPI component is selecting and correctly interpreting appropriate methods for the task at hand to productively climb the ladder toward impact (vide infra, section 4). We stress that the impact generation process shown in Figure <ref type="figure">2</ref> is by no means a linear one &#57557; on the contrary, it contains many loops and dead ends.  As we show later (in Section 4), within the troika of CompChem+ML+CPI, ML acts as a catalyst that accelerates explorative data-driven hypotheses generation. Automatically generated hypotheses are then validated and calibrated with CompChem and CPI to yield further improved ML modeling (enriched by more physical prior knowledge), which then loops back with improved hypotheses. This feedback loop is the key to the modern knowledge discovery leading to insight, wisdom and hopefully positive impacts to society.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">COMPCHEM AND NOTABLE INTERSECTIONS WITH ML</head><p>2.1. Computational Modeling, Data, and Information Across Many Scales</p><p>We consider quantum mechanics as described by the nonrelativistic time-independent Schrodinger equation as our "standard model" because it accurately represents the physics of charged particles (electrons and nuclei) that make up almost all molecules and materials. Indeed, this opinion has been held by some for almost a century:</p><p>The fundamental laws necessary for the mathematical treatment of a large part of physics and the whole of chemistry are thus completely known, and the difficulty lies only in the fact that application of these laws leads to equations that are too complex to be solved.</p><p>P. A. M. Dirac, 1929   Any theoretical method for predicting molecular or material phenomena must first be rooted in quantum mechanics theory and then suitably coarse-grained and approximated so that it can be applied in a practical setting. CompChem, or more precisely, computational quantum chemistry defines computationally driven numerical analyses based on quantum mechanics. In this section, we will explain how and why different CompChem methods capture different aspects of underlying physics. Specifically, this section provides a concise overview of the broad range of CompChem methods that are available for generating data sets that would be useful for MLassisted studies of molecules and materials.</p><p>2.1.1. Models and Levels of Abstraction. Models extract information from data. The renowned statistician George Box famously discussed "good models" as those characterized as "simple", "illuminating", and "useful". <ref type="bibr">22</ref> Good models should be parsimonious and describe essential relationships without overelaboration. The ideal gas equation, PV = nRT, exemplifies a good model. The ideal gas equation relates macroscopic pressure (P), volume (V), number of molecules (n), and temperature (T) of gases under idealized conditions, without requiring explicit knowledge of the processes occurring on an atomic scale. Its simple functional form needs just one parameter, the ideal gas constant R, and this makes it possible to formulate useful insights, such as how at constant pressure a gas expands with rising temperature. On the other hand, this elegant equation only holds for conditions where the gas behaves as an ideal gas. The derivation of more accurate models of gases requires more mathematically complicated equations of state that rely on more free parameters <ref type="bibr">23</ref> that in turn obfuscate physical insights, require more computational effort to solve, and thus make the model less "good". This example also offers a convenient connection to ML models that will be discussed later in section 3. As mathematical models for complex phenomena become more complicated and less intuitive to derive, ML models that infer nonlinear relationships from data become more applicable when increasing amounts of empirical data become available.</p><p>Alternatively, the conventional CompChem treatment entails first determining the system's relevant geometry and its total ground state energy, and from that physical properties of interest (e.g., pressure, volume, band gap, polarizability, etc.) can be obtained using quantum and statistical mechanics. In this section, we discuss the relevant CompChem methods for these. While the mathematical physics for these methods might occasionally be too complicated for a user to fully understand, many algorithms exist so that they can still be easily run in a "black-box" way with modern computational chemistry software and accompanying tutorials. <ref type="bibr">[24]</ref><ref type="bibr">[25]</ref><ref type="bibr">[26]</ref><ref type="bibr">[27]</ref> CompChem thus serves as an invaluable tool to generate data and information for knowledge and insights across many length and time scales. Figure <ref type="figure">3</ref> is an adaption of a multiscale hierarchy of different classes of CompChem methods. It shows their applicability for modeling different length and time scales and depicts how large scale models may be developed based on smaller scale theories.</p><p>2.1.2. CompChem Representations. Integral to every CompChem study is the user's representation for the system, that is, how the user chooses to describe the system. CompChem representations can range from simple and lucid (e.g., a precise chemical system such as a water molecule isolated in a vacuum) to complex and ambiguous (e.g., a putative but speculative depiction of a solid-liquid interface under electrochemical conditions). Approximate wavefunctions (expressed on a basis set of mathematical functions) or approximate Hamiltonians (referred to as levels of theory) as described below in this section can also be considered representations. One might then say that many representations for different components of a system will constitute an overall representation, and this is true. The point we make is that the validity of any computational result depends on the overall representation, and sometimes an incorrect representation may provide a correct result due to "fortuitous error cancellation". In CompChem studies, a valid representation is one that captures the nature of the physical phenomena of a system. For a molecular example, if one is determining the bond energy of a large biodiesel molecule using CompChem methods, <ref type="bibr">28</ref> it may or may not be justified to approximate a nearby long-chain alkyl group (-C n H (2n+1) ) simply as a methyl (-CH 3 ) or even a hydrogen atom. Indeed, choosing such a representation can sometimes be a useful example of CPI since alkyl bonds usually exhibit relatively short-ranged interactions (a feature that will be discussed in the context of ML in more detail in section 4. 1.3.). An atomic scale geometry with fewer atoms would reduce the computational cost of the study or allow a more accurate but more computationally expensive calculation to be run. On the other hand, it might also be a poor choice if the chemical group, for example, a substituted alkyl group participated in physical organic interactions, such as subtle steric, induction, or resonance effects. <ref type="bibr">29</ref> For a solid-state example, a user might exercise good CPI by assuming that a relatively small unit cell under periodic boundary conditions would capture salient features of a bulk material or a material surface (as is often the case for many metals). On the other hand, subtle symmetry-breaking effects in materials (e.g., distortions arising from tilting octahedra groups in perovskites, <ref type="bibr">30</ref> or surface reconstruction phenomena that occur on single crystals) <ref type="bibr">31</ref> might only be observed when considering larger and more computationally expensive unit cells. Relevant to both examples, it may also be that the CompChem method itself brings errors that obfuscate phenomena that the user intends to model. In general, CompChem errors may be due to 1) errors introduced by the user in the initial set up of the CompChem application, or 2) errors in the CompChem method when treating the physics of the system. In section 3, we will discuss how the choice of ML representation also plays similarly critical roles in determining whether and to what extent an ML model is useful.</p><p>2.1.3. Method Accuracy. The quantitative accuracy of a CompChem model stems from its suitability in describing the system. As explained above, an observed accuracy will depend on the representation being used. High-quality CompChem calculations have traditionally been benchmarked against data sets that consist of well-controlled and relatively precise thermochemistry experiments on small, isolated molecules. <ref type="bibr">32,</ref><ref type="bibr">33</ref> The error bars for standard calorimetry experiments are approximately 4 kJ/mol (or 1 kcal/mol or 0.04 eV), and computational methods that can provide greater accuracy than this are stated as achieving "chemical accuracy". Note that this term should be used when describing the accuracy of the method compared to the most accurate data possible; for example, if one CompChem method was found to reproduce another CompChem method within 1 kJ/mol, but both methods reproduce experimental data with errors of 20 kJ/ mol, then neither method should be called chemically accurate. There are many well-established reasons why CompChem models can bring errors. For example, errors may be due to size consistency <ref type="bibr">34</ref> or size extensivity <ref type="bibr">35</ref> problems that are intrinsic within the CompChem method, larger systems sometimes embody significant medium and long-range interactions (e.g., van der Waals forces) <ref type="bibr">36</ref> or self-interaction errors <ref type="bibr">37</ref> that might not be noticeable in small test cases. The recommended path forward is to consider which fundamental interactions are in play in the system and then use a CompChem model that is adequate at describing those interactions. Besides this, users should make use of existing tutorial references that provide practical knowledge about which parameters in a CompChem calculation should be carefully noted, for example ref 38. Historically the most popular CompChem methods for molecular and materials modeling (the B3LYP <ref type="bibr">39</ref> and PBE <ref type="bibr">40</ref> exchange correlation functionals, see section 2.2.3.) are often said to have an expected accuracy of about 10-15 kJ/mol (or 2-4 kcal/mol or 0.1-0.2 eV) when modeling differences between the total energies of two similar systems, and errors are expected to be somewhat larger when considering transition state energies. Though this is used as a simple rule, it is obviously an oversimplification and actual accuracy is only assessed by thoughtful benchmarking of the case being considered. <ref type="bibr">[41]</ref><ref type="bibr">[42]</ref><ref type="bibr">[43]</ref><ref type="bibr">[44]</ref><ref type="bibr">[45]</ref> 2.1.4. Precision and Reproducibility. In CompChem, one normally assumes that any two users using the same representation for the system with the same code on the same computing architecture will obtain the exact same result within the numerical precision of the computers being used. This is not always the case, especially for molecular dynamics (MD) simulations that often rely on stochastic methods. <ref type="bibr">46</ref> Computational precision also becomes more concerning when there are different versions of codes in circulation, errors that might arise from different compilers and libraries, and a lack of consensus in the community about which computational methods and which default settings should be used for specific application systems, for example, grid density selections, <ref type="bibr">47</ref> or standard keywords for molecular dynamics simulations. <ref type="bibr">46,</ref><ref type="bibr">48</ref> There have been efforts to confirm that different codes can reproduce energies for the same system representation, <ref type="bibr">48,</ref><ref type="bibr">49</ref> but some commercial codes hold proprietary licenses that restrict publications that critically benchmark calculation accuracy and timings across different codes. A path forward to benefit the advancement of insight is the development of (open) source codes <ref type="bibr">50</ref> that perform as well if not better than commercial codes. While increased access to computational algorithms is beneficial, it also raises the need for enforcing high standards of quality and reproducibility. <ref type="bibr">51,</ref><ref type="bibr">52</ref> We are also glad to see active developments to more lucidly show how any set of computational data is generated, precisely with which codes, keywords, and auxiliary scripts and routines. <ref type="bibr">[53]</ref><ref type="bibr">[54]</ref><ref type="bibr">[55]</ref><ref type="bibr">[56]</ref> We are now in an era where truly massive amounts of data and information can be generated for CompChem+ML efforts. To go forward, one needs to know what constitutes good and useful data, and the next section provides an overview of how to do this using CompChem.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Hierarchies of Methods</head><p>Earlier we mentioned that a usual task in CompChem is to calculate the ground state energy of an atomic scale system. Indeed, CompChem methods can determine the energy for a hypothetical configuration of atoms, and this constitutes the potential energy surface (PES) of the system (Figure <ref type="figure">4</ref>). The PES is a hypersurface spanning 3N dimensions, where N is the number of atoms in the system. Since the PES is used to analyze chemical bonding between atoms within the system, the PES can also be simplified by ignoring translational and rotational degrees of freedom for the entire system. This reduces the dimensionality of the PES from 3N to 3N -5 for linear systems (e.g., diatomic molecules or perfectly linear molecules such as acetylene) or 3N -6 for all other nonlinear systems. Furthermore, since visualization is difficult beyond three dimensions, PES drawings will show a 1-D or 2-D projection of this hypersurface where the z-axis is conventionally used to represent the scale for system energy.</p><p>Any arbitrary PES will contain several interesting features. Minima on the PES correspond to mechanically stable configurations of a molecule or material, for example reactant and product states of a chemical reaction or different conformational isomers of a molecule. Because they are minima, the second derivative of the energy given by the PES with respect to any dimension will be positive. Minima can also be connected by pathways, which indicate chemical transformations (Figure <ref type="figure">4</ref>, red line). Along such pathways, the second derivative can be positive, zero, or negative, but all other second derivatives must be positive. Transition states are first-order saddlepoints and thus represent a maximum in one coordinate and a minimum along all others. They correspond to the lowest energy barriers connecting two minima on the PES and are hence important for characterizing transitions between PES minima (e.g., chemical reactions). Second-order saddle points <ref type="bibr">57</ref> and bifurcating pathways <ref type="bibr">58</ref> can also exist, but these are not discussed further here.</p><p>A wide range of higher-level properties of the system can be predicted or derived using the PES, including predicted thermodynamic binding constants, kinetic rate constants for reactions, or properties based on dynamics of the system. The task is then to choose an appropriate CompChem method that can carry out energy and gradient calculations on the system's PES. Figure <ref type="figure">5</ref> shows several different hierarchies for CompChem methods capable of doing this. Note that all of these methods mentioned in this figure fall in the categories of the bottom two regions in the multiscale hierarchy Figure <ref type="figure">3</ref>. All of these methods in principle could be used to develop  coarse-grained or continuum models as well. Also note that methods in Figure <ref type="figure">5</ref> will bring very different computational costs and opportunities for methods involving ML.</p><p>2.2.1. Wavefunction Theory Methods. In standard computational quantum chemistry, a system's energy can be computed in terms of the Schrodinger equation. <ref type="bibr">[61]</ref><ref type="bibr">[62]</ref><ref type="bibr">[63]</ref> The wavefunction that will be used to represent the positions of electrons and nuclei in the system (&#936;(r, R)) is hard to intuit since it can be complex valued. However, its square describes the real probability density of the nuclear (R) and electronic positions (r). In a real system, the position and interactions of a single particle in the system with respect to all other particles will be correlated, and this makes exactly solving the Schrodinger equation impossible for almost all systems of practical interest. To make the problem more tractable, one may exploit the Born-Oppenheimer approximation; <ref type="bibr">64</ref> since nuclei are expected to move much slower than the electrons they can be approximated as stationary at any point along the PES. This allows the energy to be calculated using the timeindependent Schrodinger equation and solving the eigenvalue problem:</p><p>Here, the Hamiltonian operator (H &#770;) is the sum of the kinetic (T &#770;) and potential (V &#770;) operators, &#936; is the wavefunction (i.e., an eigenfunction) that represents particles in the system, and E is the energy (i.e., an eigenvalue). In this way, nuclei can be treated as fixed point charges, and then, eq 1 can be transformed into the so-called electronic Schrodinger equation, where the Hamiltonian H &#770;el and wavefunction &#936; el (r; R) now only depend on the nuclear coordinates R in a parametric fashion:</p><p>The above expression has H &#770;el composed of single electron (e) and pairwise electron-nuclear (eN), nuclear-nuclear (NN), and electron-electron (ee) terms. Here, we will now implicitly assume the Born-Oppenheimer approximation throughout and leave off the subscript indicating the electronic problem. However, we note that the Born-Oppenheimer approximation is not always sufficient and computationally intensive nonadiabatic quantum dynamics may be required. <ref type="bibr">65</ref> In certain cases, semiclassical treatments are appropriate; for example, nonadiabatic effects between electrons and nuclei can be considered using nuclear-electronic orbital methods. <ref type="bibr">66</ref> A second common approximation is to expand the total electronic wavefunction in terms of one-electron wavefunctions (i.e., spin orbitals): &#981;(r i ). Electrons are Fermions and therefore exhibit antisymmetry, which in turn results in the Pauli exclusion principle. Antisymmetry means that the interchange of any two particles within the system should bring an overall sign change to the wavefunction (i.e., from + to -, or vice versa). This property is conveniently captured mathematically by combining one electron spin orbitals into the form of a Slater determinant:</p><p>Note that a determinant's sign changes whenever two columns or rows are interchanged, and in a Slater determinant this corresponds to interchanging electrons and thus the physically appropriate sign change for the overall wavefunction. Additionally,</p><p>is a normalizing factor to ensure the wavefunction is unitary.</p><p>The spin orbitals can be treated as a mathematical expansion using a basis set of &#956; functions &#967; &#956; , each having coefficients c &#956;i , which are generally Gaussian basis functions, <ref type="bibr">[67]</ref><ref type="bibr">[68]</ref><ref type="bibr">[69]</ref> Slater-type hydrogenic orbitals, <ref type="bibr">70</ref> or plane waves under periodic boundary conditions: <ref type="bibr">[71]</ref><ref type="bibr">[72]</ref><ref type="bibr">[73]</ref> </p><p>The different types of mathematical functions bring different strengths and weaknesses, but these will not be discussed further here. A universal point is that larger basis sets will have more basis functions and thus give a more flexible and physical representation of electrons within the system. On one hand this can be crucial for capturing subtle electronic structure effects due to electron correlation. On the other hand, larger basis sets also necessitate significantly higher computational effort. A standard technique to avoid high computational effort in electronic structure calculations is to replace nonreacting core electrons with analytic functions using effective core potentials (ECPs, i.e., pseudopotentials). <ref type="bibr">[74]</ref><ref type="bibr">[75]</ref><ref type="bibr">[76]</ref><ref type="bibr">[77]</ref><ref type="bibr">[78]</ref><ref type="bibr">[79]</ref><ref type="bibr">[80]</ref><ref type="bibr">[81]</ref><ref type="bibr">[82]</ref><ref type="bibr">[83]</ref><ref type="bibr">[84]</ref><ref type="bibr">[85]</ref><ref type="bibr">[86]</ref><ref type="bibr">[87]</ref><ref type="bibr">[88]</ref><ref type="bibr">[89]</ref> This requires reformulating the basis sets that describe the valence space of the atoms, for example see refs 90 and 91. Larger nuclei that bring higher atomic numbers and larger numbers of electrons will also exhibit relativistic effects, <ref type="bibr">92</ref> and relativistic Hamiltonians are based on the Dirac equation <ref type="bibr">93,</ref><ref type="bibr">94</ref> or quantum electrodynamics. <ref type="bibr">95</ref> These methods can range from reasonably cost-effective methods <ref type="bibr">96,</ref><ref type="bibr">97</ref> to those bringing extremely high computational cost. <ref type="bibr">98</ref> Practical applications have traditionally used standard nonrelativistic Hamiltonian methods, along with ECPs (or pseudopotentials) that have been explicitly developed to account for compressed core orbitals that result from relativistic effects.</p><p>Using the Born-Oppenheimer approximation (eq 2) together with a Slater determinant wavefunction (eq 3) expressed in a finite basis set (eq 4) brings about the simplest wavefunction based method, the Hartree-Fock (HF) approach (for historical context see refs 99-101). The HF method is a mean field approach, where each electron is treated as if it moves within the average field generated by all other electrons. It is generally considered inaccurate when describing many chemical systems, but it continues to serve as a critical pillar for CompChem electronic structure calculations since it either establishes the foundation for all other accurate methods or provides energy contributions (i.e., exact exchange) that is not provided in some CompChem methods. CompChem methods that achieve accuracy higher than HF theory are said to contain electron correlation, a critical component for understanding molecules and materials (as described in more detail in section 2.2.2.). Expressing &#936; as a Slater determinant and rearranging eq 2 while temporarily neglecting nuclear-nuclear interactions allows one to define the HF energy in terms of integrals of the electronic spin orbitals:</p><p>2 2 e eN ee ee &#8747; &#8747; &#8748; &#8748; &#8721; &#8721; &#8721; &#8721; &#8721; &#981; &#981; &#981; &#981; &#981; &#981; &#981; &#981; &#981; &#981; &#981; &#981; = -* &#8711; -* | -| + * | -| * -* | -| * &#945; &#945; &#945; &gt; &gt; &#770;&#770;&#770;&#770;( 5)</p><p>where the first two terms are referred to as one-electron integrals and represent the kinetic energy of the electrons and the potential energy contributions from electron-nuclei interactions. The remaining terms are two-electron integrals that describe the potential energy arising from electronelectron interactions and are called Coulomb and exchange integrals. Using Lagrange multipliers, one can express the HF equation in a compact matrix form, the so-called Roothan-Hall equations, <ref type="bibr">[102]</ref><ref type="bibr">[103]</ref><ref type="bibr">[104]</ref> which allow for an efficient solution:</p><p>Each matrix has a size of &#956; &#215; &#956;, where &#956; is the number of basis functions used to express the orbitals of the system. C is a coefficient matrix collecting the basis coefficients c &#956;i (see eq 4), while S is the overlap matrix measuring the degree of overlap between individual basis functions and &#1013; is a diagonal matrix of the spin orbital energies. Finally, F is the Fock matrix, with elements of a similar form as in eq 5, but expressed in terms of basis functions &#967; &#956; . One important detail not readily apparent in eq 6 is that the Fock matrix depends on the orbital coefficients that must be provided before eq 6 can be solved. As such, eq 6 cannot be solved in closed form, but instead requires a socalled self-consistent field approach. Starting from an arbitrary set of trial (i.e., initial guess) functions, one iteratively solves for optimal molecular orbital coefficients, which are then used to construct a new Fock matrix, until a minimum energy is reached in accordance with the variational principle of quantum mechanics. Evaluating and transforming the twoelectron integrals in eq 5 are a significant bottleneck for these calculations and thus the computational effort of the HF methods formally scales as ( ) 4 &#956; with the number of basis functions. This means that a calculation on a system twice as large will require at least 2 4 = 16 times as much computing time. The electronic exchange interaction resulting from the antisymmetry of the wavefunction imposes a strong constraint on the mathematical form of ML models for electronic wavefunctions. Construction of efficient and reliable antisymmetric ML models for the many-body wavefunction is an important area of current research. <ref type="bibr">105,</ref><ref type="bibr">106</ref> 2.2.2. Correlated Wavefunction Methods. The system's correlation energy is defined as sum of electron-electron interactions that originate beyond the mean-field approximation for electron-electron interactions that is provided by HF theory. While correlation energy makes up a rather small contribution to the overall energy of a system (usually about 1% of the total energy), because internal energies in molecular and material systems are so enormous, this contribution becomes rather significant. As an example, most molecular crystals would be unstable as solids if calculated using the HF level of theory. The missing component is attractive forces that are obtained from levels of theory that account for correlation energy. Correlation energies are obtained by calculating additional electron-electron interaction energies that arise from different arrangements of electron configurations (i.e., different possible excited states) that are not treated with the mean field approach of HF theory.</p><p>The most complete correlation treatment is the full configuration interaction (FCI) method, which is the exact numerical solution of the electronic Schrodinger equation (in the complete basis limit) that considers interactions arising from all possible excited configurations of electrons. The FCI wavefunction takes the form of a linear combination of all possible excited Slater determinants which can be generated from a single HF reference wavefunction by electron excitations:</p><p>where &#936; &#946; &#945; represents the Slater determinant obtained by exciting an electron from orbital &#945; into an unoccupied orbital &#946;, and the as are expansion coefficients determining the weight of the different contributing configurations. Expectedly, FCI calculations scale extremely poorly with the number of electrons in the system ( n ( ) ! ), as the number of possible configurations grows rapidly, making them feasible only for small molecules. For an example of the state of the art, FCI calculations have been used to benchmark highly accurate methods on calculations on a benzene molecule. <ref type="bibr">107</ref> Most correlated wavefunction methods use a subset of the possible configurations in eq 7 to be computationally tractable. The configuration interaction (CI) <ref type="bibr">108</ref> method for example only includes determinants up to a certain permutation level (e.g., single and double excitations in CISD). Alternatively, MPn <ref type="bibr">35</ref> (e.g., MP2) recovers the correlation energy by applying different orders of perturbation theory. Coupled cluster theory, another widely used post-HF method, includes additional electron configurations via cluster operators. <ref type="bibr">109</ref> One coupled cluster method that involves single, double, and perturbative triples excitations, CCSD(T), is referred to as the "goldstandard" approach for CompChem electronic structure methods since it brings high accuracy for molecular energies. However, there are many newer advances that improve upon CCSD(T). <ref type="bibr">107,</ref><ref type="bibr">110</ref> Note that just because a method has a reputation for being accurate does not mean that it will be for all systems. For example, consider again the benzene molecule, which is best illustrated having dotted resonance bond depicting a planar molecule with equal C-C bond lengths. Such a geometry will not be found to be stable with many different CompChem methods, in part because of subtle chemical bonding interactions or errors that arise from specific choices of basis sets used with different levels of theory. <ref type="bibr">111,</ref><ref type="bibr">112</ref> A key point to reiterate is that correlated wavefunction methods are founded on the HF theory, and so they are even more computationally demanding than HF calculations, for example, n ( ) <ref type="bibr">5</ref> for MP2, n ( ) <ref type="bibr">6</ref> for CCSD and CISD and n ( ) <ref type="bibr">7</ref> for CCSD(T). However, this computational expense is alleviated by continually improving computing resources (e.g., the usability of graphics processing units (GPUs)) <ref type="bibr">[113]</ref><ref type="bibr">[114]</ref><ref type="bibr">[115]</ref><ref type="bibr">[116]</ref> and the development of efficiency enhancing algorithms, such as pseudospectral methods, <ref type="bibr">[117]</ref><ref type="bibr">[118]</ref><ref type="bibr">[119]</ref> resolution of the identity (RI), <ref type="bibr">120</ref> domain-based local pair natural orbital methods (DLPNO), <ref type="bibr">121</ref> and explicitly correlated R12/F12 methods. <ref type="bibr">122</ref> There are also ongoing efforts to develop other CompChem methods based on quantum Monte Carlo <ref type="bibr">123</ref> and density matrix renormalization group theory (DMRG) <ref type="bibr">124</ref> to provide high accuracy with competitive scaling with other computational methods. Efforts are beginning to become implemented that use ML to accelerate these types of calculations. <ref type="bibr">105,</ref><ref type="bibr">106,</ref><ref type="bibr">[125]</ref><ref type="bibr">[126]</ref><ref type="bibr">[127]</ref><ref type="bibr">[128]</ref><ref type="bibr">[129]</ref> Schemes have also been developed to exploit systematic errors between different levels of theory with different basis sets so that approximations can be extrapolated toward an exact result. Examples include the complete basis set (CBS), <ref type="bibr">130</ref> Gaussian Gn, <ref type="bibr">131</ref> Weizmann (W-)n <ref type="bibr">132</ref> methods, and high accuracy extrapolated ab initio thermochemistry (HEAT) <ref type="bibr">133</ref> methods. For a recent review on these and other methods, see ref 134. These schemes are also becoming a target of recent work using ML methods. <ref type="bibr">135</ref> HF determinants provide good baseline approximations of the ground state electronic structure of many molecules, but they may describe poorly more complicated bonding that arises during bond dissociation events, excited states, and conical intersections. <ref type="bibr">[136]</ref><ref type="bibr">[137]</ref><ref type="bibr">[138]</ref><ref type="bibr">[139]</ref> Some many-body wavefunctions are best described as a superposition of two or more configurations, for example, when other configurations in eq 7 can have similar or higher expansion coefficients a than the HF determinant. For this reason, high quality single reference methods like CCSD(T) fail because the theory assumes that salient electronic effects are captured by the initial single HF configuration. (In fact, methods such as CCSD(T) have been implemented with diagnostic approaches available that let users know when there may be cause for concern). <ref type="bibr">[140]</ref><ref type="bibr">[141]</ref><ref type="bibr">[142]</ref> In these cases, it may no longer be trivial to find reliable black-box or automated procedures (e.g., in situations involving resonance states, chemical reactions, molecular excited states, transition metal complexes, and metallic materials, etc.). <ref type="bibr">136</ref> Socalled multiconfiguration approaches, <ref type="bibr">136</ref> such as the generalized valence bond (GVB) method <ref type="bibr">143</ref> or the complete active space self-consistent field (CASSCF), <ref type="bibr">144</ref> the multireference CI (MRCI) methods, <ref type="bibr">145</ref> complete active space perturbation theory (CASPT2), <ref type="bibr">146</ref> or multireference coupled cluster (MRCC), <ref type="bibr">147,</ref><ref type="bibr">148</ref> can more physically model these systems since they employ several suitable reference configurations with different degrees of correlation treatments. These methods are not black-box and should be expected to require an experienced practitioner with CPI to choose the reference states that can substantially influence the quality of results. <ref type="bibr">149</ref> This is an area though where ML can bring progress in automating the selections of physically justified active spaces. <ref type="bibr">129</ref> In closing, there are a large number of available correlated wavefunction methods but many are even more costly than HF theory by virtue of requiring an HF reference energy expression shown in eq 5. Figure <ref type="figure">5a</ref> depicts a so-called "magic cube" (that is an extension beyond a traditional "Pople diagram" <ref type="bibr">135,</ref><ref type="bibr">150</ref> ) that concisely shows a full hierarchy of computational approaches across different Hamiltonians, basis sets, and correlation treatment methods. This makes it easy to identify different wavefunction methods that should be more accurate and more likely to provide useful atomic scale insights (as well as those that would be more computationally intensive). Another important aspect highlighted in the "magic cube" is that higher level wavefunction methods require larger basis sets to successfully model electron correlation effects. A CCSD(T) computation carried out with a small basis set for example might only offer the same accuracy as MP2 while being two orders of magnitude more expensive to evaluate. <ref type="bibr">108</ref> As was mentioned earlier with the benzene system, spurious errors with different basis sets might still be found that indicate problems with specific combinations of levels of theory and basis sets. The deep complexity of correlated wavefunction methods makes this a promising area for continued efforts in CompChem+ML research.</p><p>2.2.3. Density Functional Theory. Density-functional theory (DFT) <ref type="bibr">151</ref> is another method to calculate the quantum mechanical internal energy of a system using an energy expression that relies on functionals (i.e., a function of a function) of electronic density &#961; = |&#936; el(r; R) | 2 :</p><p>Compared to wavefunction theory, DFT should be far more efficient since the dimensionality of a density representation for electrons will always be three rather than the 3n dimensions for any n-electron system described by a many-body wavefunction method. DFT has an important drawback that the exact expression for the energy functional is currently unknown, all approximations bring some degree of uncontrollable error, and this has precipitated disagreeable opinions from purists in chemical physics, especially those who are developing correlated wavefunction methods. However, there is also substantial evidence that DFT approximations are reasonably reliable and accurate for many practical applications that bring information, knowledge, and sometimes insight. We now provide a bird's-eye view of DFT-based methods.</p><p>One thrust of DFT developments since its inception has focused on designing accurate expressions strictly in terms of a density representation, and these approaches are referred to as "kinetic energy (KE-)" or "orbital-free (OF-)" DFT. <ref type="bibr">152</ref> Some energy contributions (e.g., nuclear-electron energy and classical electron-electron energy terms) can be expressed exactly, but other terms, such as the kinetic energy as a function of the density are not known and must be approximated. OF-DFT is very computationally efficient (these methods should scale linearly with system size <ref type="bibr">153,</ref><ref type="bibr">154</ref> ) but these formulations have not yet been developed to rival the accuracy or transferability of wavefunction methods, though they have been used for studying different classes of chemical and materials systems. <ref type="bibr">[155]</ref><ref type="bibr">[156]</ref><ref type="bibr">[157]</ref> OF-DFT methods are also used in exciting applications modeling chemistry and materials under extreme conditions. <ref type="bibr">[158]</ref><ref type="bibr">[159]</ref><ref type="bibr">[160]</ref> One should expect that once highly accurate forms are developed and matured, accurate CompChem calculations on electronic structures on systems having more than a million atoms might become commonplace. Indeed, there are efforts to use ML to develop more physical OFDFT methods. <ref type="bibr">161,</ref><ref type="bibr">162</ref> The most commonly used form of DFT (which is also one of the most widely used CompChem methods in use today) is called Kohn-Sham (KS-)DFT. <ref type="bibr">163</ref> In KS-DFT, one assumes a fictitious system of noninteracting electrons with the same ground state density as the real system of interest. This makes it possible to split the energy functional in eq 8 into a new</p><p>Chemical Reviews pubs.acs.org/CR Review <ref type="url">https://doi.org/10.1021/acs.chemrev.1c00107</ref> Chem. Rev. 2021, 121, 9816-9872</p><p>form that involves an exact expression of the kinetic energy for noninteracting electrons:</p><p>Here, T ni [&#961;] is the kinetic energy of the noninteracting electrons, V eN [&#961;] is the exact nuclear-electron potential, and V ee [&#961;] is the Coulombic (classical) energy of the noninteracting electrons. The last two terms are corrections due to the interacting nature of electrons and nonclassical electron-electron repulsion. KS-DFT also expands the threedimensional electron density into a spin orbital-basis &#981; similar to HF theory to define the one-electron kinetic energy in a straightforward manner. This allows the T ni , V eN , and V ee expressions to be evaluated exactly and one arrives at the KS energy:</p><p>The last two correction terms in eq 9 arise from electron interactions, and these are combined into the so-called "exchange-correlation" term (E xc ), which uniquely defines which scheme of KS-DFT is being used. In theory, an exact E xc term would capture all differences between the exact FCI energy and the system of noninteracting electrons for a ground state.</p><p>The KS-DFT equations can be cast in a similar form as the Roothan-Hall equations (eq 6), which allows for a computationally efficient solution. Moreover, the elements of the KS matrix (which replaces the Fock matrix F) are easier to evaluate due to the fact that several of the computationally intensive integrals are now accounted for via E xc . Hence, the formal scaling for KS-DFT is n ( ) <ref type="bibr">3</ref> with respect to the number of electrons. Even though this is much poorer scaling than ideally linear scaling OF-DFT, the exact treatment of noninteracting electrons makes KS-DFT more accurate. Furthermore, there are several modern exchange-correlation functionals that routinely achieve much higher accuracy than HF theory with less computational cost, and thus KS-DFT is a competitive alternative with many correlated wavefunction methods in many modern applications.</p><p>A remaining problem is constructing a practical expression for the exchange-correlation functional, as its exact functional form remains unknown. This has spawned a wealth of approximations that have been founded with different degrees of first principles and/or empirical schemes. Classes of KS-DFT functionals are defined by whether the exchangecorrelation functional is based on just the homogeneous electron gas (i.e., the "local density approximation", LDA), that and its derivative (i.e., the "generalized gradient approximation", GGA), as well as other additional terms that should result in physically improved descriptions or error cancellations. The resulting hierarchy of KS-DFT functionals is often referred to as a "Jacob's Ladder" of DFT (Figure <ref type="figure">5b</ref>). Generally, the higher up the ladder one goes, the more accurate but more computationally demanding the calculation. <ref type="bibr">164</ref> However, the intrinsic inexactness in DFT makes it difficult to assess which functionals are physically better than others. <ref type="bibr">165,</ref><ref type="bibr">166</ref> Nevertheless, the Jacob's Ladder hierarchy is useful for clearly designating how and why newer methods should perform in specific applications (for perspective see refs 167-169).</p><p>Indeed, by being based on a ground-state representation for homogeneous electron gas, DFT calculations can sometimes bring more easily physical insight into some systems that are very challenging for wavefunction theory to examine (e.g., metals, where HF theory provides divergent exchange energy behaviors <ref type="bibr">170,</ref><ref type="bibr">171</ref> ). On the other hand, DFT is also generally not well-suited for studying physical phenomena involving localized orbitals or band structures such as those found in semiconducting materials with small band gaps, molecular or material excited charge transfer states, or interaction forces that can arise due to excited states, e.g. dispersion (or London) forces. The former features can normally be treated using Hubbard-corrected DFT+U models that require a systemspecific U-J parameter <ref type="bibr">172,</ref><ref type="bibr">173</ref> or more generalizable but much more computationally expensive hybrid DFT approaches. Dispersion forces (i.e., van der Waals interactions) are nonexistent in semilocal DFT approximations, and it is now commonplace to introduce them into DFT calculations using a variety of different methods. <ref type="bibr">36</ref> There is also growing interest in using embedded CompChem calculation schemes that can partition systems into discrete regions that could be treated with highly accurate correlated wavefunction theory and computationally efficient KS-DFT schemes separately. <ref type="bibr">[174]</ref><ref type="bibr">[175]</ref><ref type="bibr">[176]</ref><ref type="bibr">[177]</ref><ref type="bibr">[178]</ref> DFT has also been extended to the modeling of excited states in the form of time-dependent (TD-)DFT. <ref type="bibr">179</ref> Similar to ground state DFT, TDDFT is a less computationally expensive alternative to excited state wavefunction-based methods. The approach yields reasonable results where excitations induce only small changes in the ground state density, e.g. low lying excited states. <ref type="bibr">179,</ref><ref type="bibr">180</ref> However, due to its single reference nature, TDDFT tends to break down in situations where more than one electronic configuration contribute significantly to the excited state. Just as with correlated wavefunction methods, there are already signs of CompChem+ML efforts to improve the applicability of DFT-based methods. <ref type="bibr">[181]</ref><ref type="bibr">[182]</ref><ref type="bibr">[183]</ref><ref type="bibr">[184]</ref><ref type="bibr">[185]</ref> 2.2.4. Semiempirical Methods. Correlated wavefunctions and, to a lesser degree, KS-DFT are still very computationally demanding and only of limited use for large scale simulations. Further approximations based on wavefunctions and DFT methods have been developed to simplify and accelerate energy calculations. These so-called semiempirical methods still explicitly consider the electronic structure of a molecule but in a more approximate way than methods described above.</p><p>Semiempirical approaches based on wavefunction theory include methods like extended Huckel theory and neglect of diatomic differential overlap (NDDO). <ref type="bibr">186</ref> Both approaches are simplifications of the HF eqs (eq 5) by introducing approximations to the different integrals. In the NDDO approach, <ref type="bibr">187</ref> only the two-electron integrals in eq 5 are considered, where the two orbitals on the right and left-hand side of the remaining two-center (and one-center) integrals are then approximated by introducing a set of empirical functions, one for each unique type of integral. Moreover, the overlap matrix in eq 6 is assumed to be diagonal, which greatly simplifies the energy evaluation. This reduces the required computational effort tremendously and allows the scaling of these approaches to be reduced to N ( ) <ref type="bibr">2</ref> . NDDO serves as a basis for more sophisticated semiempirical schemes, such as AM1, 188 PM7, <ref type="bibr">189</ref> and MNDO, <ref type="bibr">190</ref> where the energy is usually determined selfconsistently using a minimally sized basis set. Inadequacies in theory can be compensated by different empirical parametrization schemes that can allow these calculations to rival the accuracy of higher level theory for some systems. For example Dral et al. <ref type="bibr">191</ref> provided a recent "big-data" analysis of the performance of several semiempirical methods with large data sets.</p><p>Semiempirical schemes are also carried over to approximate KS-DFT with so-called density functional tight binding (DFTB). <ref type="bibr">192</ref> DFTB simplifies the KS eqs (eq 10) by decomposing the total electron density &#961; into a density of free and neutral atoms &#961; 0 and a small perturbation term &#948;&#961; 0 (&#961; = &#961; 0 + &#948;&#961; 0 ). Expanding eq 10 in the perturbation &#948;&#961; 0 makes it possible to partition the total energy into three terms amendable to different approximation schemes:</p><p>E rep is a repulsive potential containing interactions between the nuclei and contributions from the exchange correlation functional (these are typically approximated via pairwise potentials). The charge fluctuation term E Coul is modeled as a Coulomb potential of Gaussian charge distributions computed from the approximate density. Finally, E BS refers to the "band structure" term, which considers the electronic structure and contains contributions from T ni , V eN , and the exchange correlation functional (see eq 10).</p><p>To compute E BS , the density is expressed in a minimal basis of atomic orbitals, similar as in NDDO. The necessary Hamiltonian and overlap integrals are then evaluated via an approximate scheme based on Slater-Koster transformations. In addition to the energy, atomic partial charges are also computed in this step, which are then used in E Coul . As a consequence, DFTB equations can also be solved self-consistently. DFTB methods are parametrized by finding suitable forms for the repulsive potential and adjusting the parameters used in the Slater-Koster integrals. Non-selfconsistent and self-consistent tight-binding DFT meth-ods <ref type="bibr">193,</ref><ref type="bibr">194</ref> have been developed for simulating large scale systems. Semiempirical methods have also been a target of different ML schemes, yielding improved parametrization schemes and more accurate functional approximations. <ref type="bibr">[195]</ref><ref type="bibr">[196]</ref><ref type="bibr">[197]</ref><ref type="bibr">[198]</ref> 2.2.5. Nuclear Quantum Effects. The quantum nature of lighter elements, such as H-Li, and even heavier elements that form strong chemical bonds (C-C bond in graphene for example <ref type="bibr">199</ref> ) gives rise to significant nuclear quantum effects (NQEs). Such effects are responsible for large differences from the Dulong-Petit limit of the heat capacity of solids, isotope effects, and the deviations of the particle momentum distribution from the Maxwell-Boltzmann equation. <ref type="bibr">200</ref> To capture NQEs, path-integral molecular dynamics (PIMD) <ref type="bibr">201,</ref><ref type="bibr">202</ref> or centroid molecular dynamics (CMD) <ref type="bibr">203,</ref><ref type="bibr">204</ref> can be used, but these methods are associated with much higher computational costs (usually about 30 times higher) compared with classical MD simulations using point nuclei. Moreover, because systems may be influenced by competing NQEs, the extent of NQEs is sensitive to the potential energy surface assumed. (Semi)local DFT approaches may not even qualitatively predict isotope fractionation ratios, and usually hybrid DFT is needed to reach quantitative accuracy. <ref type="bibr">205</ref> However, employing hybrid DFT calculations or other high level methods in PIMD/CMD simulations can accrue extremely high computational costs. For this reason, ML force fields have been proposed as efficient means to carry out PIMD simulations, enabling essentially exact quantummechanical treatment of both electronic and nuclear degrees of freedom, at least for small molecules with dozens of atoms. <ref type="bibr">206,</ref><ref type="bibr">207</ref> 2.2.6. Interatomic Potentials. Interatomic potentials introduce an additional level of abstraction compared to methods described above. Instead of using exact quantum mechanical expressions to create the PES for the system, analytic functions are used to model a presupposed PES that contains explicit interactions between atoms, while electrons are treated in an implicit manner (sometimes using partial charge schemes). <ref type="bibr">[251]</ref><ref type="bibr">[252]</ref><ref type="bibr">[253]</ref><ref type="bibr">[254]</ref><ref type="bibr">[255]</ref><ref type="bibr">[256]</ref> Interatomic potentials thus are (oftentimes dramatically) more computationally efficient than correlated wavefunction, DFT, and semiempirical approaches. This efficiency makes it possible to study even larger systems of atoms (e.g., biomolecules, surfaces, and materials) than is possible with other computational methods. Note that different empirical potentials bring substantially different computational efficiencies; for example Lennard-Jones (LJ) potentials are  <ref type="bibr">229</ref> classical Drude oscillator models, <ref type="bibr">230</ref> fluctuating charge (FQ) models, <ref type="bibr">231</ref> MB-Pol, <ref type="bibr">232</ref> distributed point polarizable models (DPP2), <ref type="bibr">233</ref> and many more <ref type="bibr">234</ref> embedded atom method (EAM)-like yes reactions within solid materials EAM, <ref type="bibr">235</ref> MEAM, <ref type="bibr">236</ref> Finnis-Sinclair, <ref type="bibr">237</ref> Sutton-Chen <ref type="bibr">238</ref> bond-order potentials (BOPs) yes reactions within solids, liquids, gases Brenner, <ref type="bibr">239</ref> Tersoff, 240,241 REBO, <ref type="bibr">239,</ref><ref type="bibr">242</ref> COMB, <ref type="bibr">243,</ref><ref type="bibr">244</ref> ReaxFF,</p><p>245,246 APT 247 other quantum mechanics-derived force fields yes reactions within liquids and gases EVB 248 and related models 249,250 Chemical Reviews pubs.acs.org/CR Review <ref type="url">https://doi.org/10.1021/acs.chemrev.1c00107</ref> Chem. Rev. 2021, 121, 9816-9872</p><p>more efficient than classical forcefields (FFs) like AMBER and CHARMM, while those are more efficient than most bondorder potentials, such as ReaxFF. <ref type="bibr">245,</ref><ref type="bibr">246</ref> The degree of efficiency arises from the balance of using accurate or physically justified functional forms, approximations, and model parametrizations. There are many different formulations (see Figure <ref type="figure">5c</ref>), and we will discuss the most general classes. An overview of the different types of potentials and their features is provided in Table <ref type="table">2</ref>. For extensive discussions on these methods including semiempirical approaches, we refer to the extensive review by Akimov and Prezhdo (ref 257). An excellent review for interatomic potentials is provided by Harrison et al. (ref 258), and an excellent overview of modern methods can be found in a special issue of J. Chem. Phys. <ref type="bibr">259</ref> The distinctions between different types of FFs can be blurry sometimes, and we will differentiate categories in ascending complexity. One of the simplest interatomic potentials is the LJ potential: <ref type="bibr">260</ref> &#196;</p><p>It models the total energy as the sum of all pairwise interaction between atoms i and j using an attractive and repulsive term depending on the interatomic distance r ij . &#949; ij modulates the strength of the interaction function, while &#963; ij defines where it reaches its minimum. The LJ potential is a prototypical "good model" of interatomic potentials, as it has a sufficiently simple physical form with only two parameters while still yielding useful results.</p><p>For covalent systems, such as bulk carbon or silicon, just pairwise distances are not sufficient to capture the local coordination of the atoms, and many empirical potentials <ref type="bibr">212,</ref><ref type="bibr">213,</ref><ref type="bibr">261</ref> for these systems were expressed as a function of the pairwise distances and three-body terms within a certain cutoff distance. The pairwise term can take the form of LJ-type, electrostatic, or harmonic potentials, and the three-body term is usually a function of the angles formed by sets of three atoms.</p><p>So-called class I classical FFs introduce a more complicated energy expression:</p><p>The first three terms are the energy contributions of the distances (r ij ), angles (&#952; ijk ) and dihedral angles (&#981; ijkl ) between bonded atoms. Because of this, they are also referred to as bonded contributions. Bond and angle energies are modeled via harmonic potentials, with the k ij and k ijk parameters modulating the potential strength and r &#773; ij and &#952;&#773; ijk are the equilibrium distances and angles. The dihedral term is modeled with a Fourier series to capture the periodicity of dihedral angles, with k ijkl and &#981; ijkl as free parameters. The last two terms account for nonbonded interactions. The long-range electrostatics are modeled as the Coulomb energy between charges q i and q j , and the van der Waals energy is treated via a LJ potential (eq 12). In Class I/II FFs, empirical parameters are tabulated for a variety of elements in wide ranges of chemical environments (for example ref 262). Parameters for any one system should not necessarily be assumed to transfer well to other systems, and reparametrizations may be needed depending on the application. Different sets of parametrization schemes give rise to different types of classical FFs, with CHARMM, <ref type="bibr">217</ref> Amber, <ref type="bibr">214,</ref><ref type="bibr">215</ref> GROMOS, <ref type="bibr">[218]</ref><ref type="bibr">[219]</ref><ref type="bibr">[220]</ref> and OPLS <ref type="bibr">221,</ref><ref type="bibr">222</ref> being a few of many examples.</p><p>An extension beyond these FFs are class II (i.e., "polarizable") FFs, where the static charges are replaced by environment dependent functions (e.g., AMOEBA <ref type="bibr">263</ref> ). A significant advantage to the class I and II types of FFs is that they are computationally efficient, which makes them well suited for MD simulations of complex and extended (bio)molecules, such as proteins, lipids, or polymers. Implementations of FF calculations on GPUs makes these simulations extremely productive. <ref type="bibr">[264]</ref><ref type="bibr">[265]</ref><ref type="bibr">[266]</ref><ref type="bibr">[267]</ref><ref type="bibr">[268]</ref> A disadvantage of Class I and II types of interatomic potentials is that they rely on predefined bonding patterns to compute the total energy, and this limits their transferability. In general, bonds between atoms are defined at the beginning of the simulation run and cannot change. Furthermore, bonding terms make use of harmonic potentials that are not suitable for modeling bond dissociation.</p><p>Reactive potentials, which eschew harmonic potential dependencies and thus can describe the formation and breaking of chemical bonds, include the embedded atom method (EAM, Figure <ref type="figure">5c</ref>), which is used widely in materials science. <ref type="bibr">235</ref> EAM is a type of many-body potential primarily used for metals, where each atom is embedded in the environment of all others. The total energy is given by</p><p>F i is an embedding function and &#961;&#297; an approximation to the local electron density based on the environment of atom i. F i (&#961;&#297;) can be seen as a contribution due to nonlocalized electrons in a metal. V ij is a term describing to the core-core repulsion between atoms. An EAM potential is determined by the functional forms used for F i and V ij , as well as how the density is expressed. Its dependence on the local environment without the need for predefined bonds make EAM well suited for modeling material properties of metals. An extension of EAM is modified EAM (MEAM), <ref type="bibr">236</ref> which includes directional dependence in the description of the local density &#961;&#297;, but this brings greater computational cost. EAMs also form the conceptual basis of the embedded atom neural network (EANN) machine learning potentials (MLPs). <ref type="bibr">269</ref> Another common type of reactive potentials are bond-order potentials (BOPs). In general, BOPs model the total energy of a system as interactions between the neighboring atoms:</p><p>V rep and V att are repulsive and attractive potentials depending on the interatomic distance r ij . A cutoff function f cut restricts all interactions to the local atomic environment. b ij(k) is the bond order term, from which the potential takes its name. This term measures the bond order between atoms i and j (i.e., "1" for a single bond, "2" for a double bond, and "0.6" for a partially dissociated bond). Bond orders can also depend on neighboring atoms k in some implementations. BOPs are typically used for covalently bound systems, such as bulk solids Chemical Reviews pubs.acs.org/CR Review and liquids containing hydrogen, carbon or silicon (e.g., carbon nanotubes and graphene). Depending on the exact form of the expressions in eq 15, different types of BOPs are obtained, such as Tersoff <ref type="bibr">240,</ref><ref type="bibr">241</ref> and REBO <ref type="bibr">239,</ref><ref type="bibr">242</ref> potentials. BOPs can also be extended to incorporate dynamically assigned charges, yielding potentials like COMB <ref type="bibr">243,</ref><ref type="bibr">270</ref> or ReaxFF. <ref type="bibr">245,</ref><ref type="bibr">246</ref> As with EAMs, BOPs have also been used as a starting point for constructing more elaborate MLPs <ref type="bibr">[271]</ref><ref type="bibr">[272]</ref><ref type="bibr">[273]</ref> that will also be discussed in more detail in section 3. While efficient and versatile, all interatomic potentials described above are inherently constrained by their functional forms. A different approach is pursued by MLPs, such as Behler-Parinello Neural Networks, <ref type="bibr">274</ref> q-SNAP, <ref type="bibr">275</ref> and GAP potentials <ref type="bibr">276</ref> (Figure <ref type="figure">5c</ref>). In MLPs, suitable functional expressions for interactions and energy are determined in a fully data-driven manner and ultimately only limited by the amount and quality of available reference data. One can then use substantially more data to generate a much more accurate MLP than would be possible when using, for instance, a ReaxFF potential trained on similar data sets. <ref type="bibr">277</ref> For the sake of completeness, we note that all approaches described here are fully atomistic-each atom is modeled as an individual entity. It is also possible to combine groups of atoms into pseudoparticles giving rise to so-called coarse grained methods. On an even higher level of abstraction, whole environments can be modeled as a single continuum. As such approaches are not subject of the present review, we refer the interested reader, for example, to refs 278 and 279.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.">Response Properties</head><p>Once an energy calculation is completed by one of the CompChem methods above, many other interesting molecular properties can be calculated. Most of these properties can be obtained as the response of the energy to a perturbation, for example, changes in nuclear coordinates R, external electric (&#1013;) or magnetic (B) fields or the nuclear magnetic moments {I i }. Given an expression for the energy, which depends on the above quantities, so-called response properties can be computed via the corresponding partial derivatives of the energy. A general response property &#928; then takes the form</p><p>where the ns indicate the n-th order partial derivative with respect to the quantity in the subscript. <ref type="bibr">102</ref> A common response property is nuclear forces F = -&#928; (1, 0, 0, 0) that are the negative first derivatives of the energy with respect to the nuclear positions. Such calculations allow a plethora of different geometry optimization schemes for chemical structures on the PES. Hessian calculations corresponding to the second derivative of energy with respect to nuclear positions are necessary to confirm the location of first-order saddle points on the PES and identify normal modes and their frequencies for vibrational partition functions that are useful for modeling temperature dependencies based on statistical thermodynamics. Hessian calculations are computationally costly, since they normally involve calculations based on finite differences methods involving many nuclear force calculations. Many methods have been developed to allow CompChem algorithms to sample minimum energy regions of the PES <ref type="bibr">[280]</ref><ref type="bibr">[281]</ref><ref type="bibr">[282]</ref><ref type="bibr">[283]</ref><ref type="bibr">[284]</ref> or precisely locate points of interest. <ref type="bibr">285,</ref><ref type="bibr">286</ref> Historically, many of these techniques have relied on approximate or full Hessian calculations, <ref type="bibr">287</ref> but other approaches, such as the nudged-elastic band <ref type="bibr">288,</ref><ref type="bibr">289</ref> and string <ref type="bibr">[290]</ref><ref type="bibr">[291]</ref><ref type="bibr">[292]</ref> methods, are popular alternatives that do not require a Hessian calculation. There have also been efforts using different forms of ML to accelerate procedures or overcome long-standing challenges in efficient sampling of and optimization on the PES. <ref type="bibr">[293]</ref><ref type="bibr">[294]</ref><ref type="bibr">[295]</ref><ref type="bibr">[296]</ref><ref type="bibr">[297]</ref><ref type="bibr">[298]</ref> The general expression above can provide a wealth of other quantities, some of which are relevant for molecular spectroscopy or provide a direct connection to experiment (see Table <ref type="table">3</ref>). Infrared spectra can be simulated based on dipole moments &#956; = -&#928; (0, 1, 0, 0), while molecular polariziabilities &#945; = -&#928; (0, 2, 0, 0) offer access to polarized and depolarized Raman spectra. Nuclear magnetic shielding tensors &#963; = &#928; (0,0,1,1) are a central response property of a magnetic field. These allow the computation of chemical shifts recorded in nuclear magnetic resonance (NMR) spectroscopy via their trace</p><p>The beauty of this formalism lies in the fact that a single energy calculation method provides access to a wide range of quantum chemical properties in a highly systematic manner. A large number of modern MLPs use the response of the potential energy with respect to nuclear positions to obtain energy conserving forces. However, far fewer applications model perturbations with respect to electric and magnetic fields. Ref 299 extends the descriptor used in the Faber-Christensen-Huang-Lilienfeld (FCHL) Kernel by adding an explicit field dependent term that makes it possible to predict dipole moments across chemical compound space. Ref 300 introduces a general neural network (NN) framework to model interactions of a system with vector fields, which was then used to predict dipole moments, polarizabilities and nuclear magnetic shielding tensors as response properties.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4.">Solvation Models</head><p>An important aspect of CompChem is molecular descriptions from within a solution environment. Simulating a dynamical environment composed of many surrounding molecules is usually not feasible with electronic-structure methods. To circumvent this problem, solvation modeling schemes have been devised (see refs 301-306 for discussions on this topic).</p><p>The most popular approaches are so-called polarizable continuum solvent models (PCM). <ref type="bibr">279</ref> They model the electrostatic interaction of a solute molecule with its environment by representing the charge distribution of the solvent molecules as a continuous electric field, the reaction field. This dielectric continuum can be interpreted as a thermally averaged representation of the environment and is typically assigned a constant permittivity depending on the particular solvent to be modeled (&#949; = 80.4 for water). The solute is placed inside a cavity embedded in this continuum. The charge distribution of the molecule then polarizes the continuous medium, which in turn acts back on the molecule.</p><p>To compute the electrostatic interactions arising from this mutual polarization with electronic structure theory, a selfconsistent scheme is employed. After constructing a suitable molecular cavity, a Poisson problem of the following form is solved:</p><p>Here, &#961; m (r) is the charge distribution of the solute and &#1013;(r) is the position dependent permittivity, which usually is set to one within the cavity and the &#949; of the solvent on the outside. V(r) is the electrostatic potential composed of the two terms</p><p>where V m (r) is the solute potential and V s (r) is the apparent potential due to the surface charge distribution &#963;(s)</p><p>&#915; indicates the surface of the cavity. Eq 17 is solved numerically to obtain the surface charge distribution &#963;(s). Once &#963;(s) has been determined in this fashion, the potential is computed according to eq 19 and used to construct an effective Hamiltonian of the form</p><p>where H &#770;is the vacuum Hamiltonian. These equations are then solved self-consistently in a Roothan-Hall or KS approach, yielding the electrostatic solvent-solute interaction energy. This scheme is also called the self-consistent reaction field approach (SCRF). Continuum models differ in how the cavities are constructed and how eq 17 is solved to obtain the surface charge distribution. Variants include the original PCM model, also referred to as dielectric PCM (D-PCM), <ref type="bibr">307</ref> the integral equation formulation of PCM (IEFPCM), <ref type="bibr">308</ref> SMD, <ref type="bibr">309</ref> conductor PCM (C-PCM), <ref type="bibr">310</ref> or the conductor-like screening model (COSMO). <ref type="bibr">311</ref> The latter two approaches replace the dielectric medium by a perfect conductor to allow for a particularly efficient computation of &#963;(s). PCMs can be further extended with statistical thermodynamics treatments to account for solutes having different size and concentration effects, and this leads to models such as COSMO-RS. <ref type="bibr">312</ref> A drawback of most PCM-like approaches is that they neglect local solvent structures. Thus, they cannot reliably account for situations where explicit solvent interactions are important, for example, when for stabilizing specific sites for a transition state through hydrogen bonding. <ref type="bibr">301</ref> Furthermore, while implicit models might be parametrized to fit bulk-like properties of mixed or ionic solvents (e.g., ref 313.), the complex local solvent environment presented by these systems are treatable by other means. For mixed solvent systems a range of hybrid schemes such as COSMO-RS, <ref type="bibr">305</ref> reference interaction site models (RISMs) <ref type="bibr">314,</ref><ref type="bibr">315</ref> or QM/MM <ref type="bibr">[316]</ref><ref type="bibr">[317]</ref><ref type="bibr">[318]</ref> approaches have been developed. As an in-depth discussion of these alternative schemes exceeds the scope of this Review, we instead refer to other references. <ref type="bibr">319,</ref><ref type="bibr">320</ref> ML models are becoming used to describe solvent effects. Ref 300 introduces a continuum ML model based on a reaction field that can predict energies and response properties for continuum solvents, it can extrapolate to solvents not seen during training, and it can be extended to operate in a QM/ MM fashion to account for explicit solvents effects in a Claisen rearrangement reaction. Ref 321 implemented automatable calculation schemes and unsupervised ML to allow predictions of single ion solvation energies for monovalent and divalent cations and anions based on physically rigorous quasi-chemical theory. <ref type="bibr">322,</ref><ref type="bibr">323</ref> Ref 324 used convolutional NNs and MD simulations to carry out high-throughput screening of mixed solvent systems. Ref 325 implemented efficient ways to carry out ML-based QM/MM MD simulations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.5.">Insightful Predictions for Molecular and Material Properties</head><p>By solving for electronic structures, by whatever means is appropriate, one obtains molecular energies and energy spectrum (typically corresponding to quasiparticles given by KS or HF orbitals). From these, one can then compute molecular or material properties that arise from quantum mechanical and statistical operators, for example, thermodynamic energies, response properties, highest and lowest occupied molecular orbital energies, and band gaps, among other properties. Many properties are defined by the characters of the orbitals, and having knowledge of these should always be helpful and aid in deriving useful insight into designing molecules and materials for a particular function. Furthermore, one is often interested in how these molecules behave over time (i.e., the dynamics given some statistical ensemble that depends on temperature, pressure, etc) over all possible degrees of freedom. By understanding how energies and forces change over time, one can predict thermal and pressure dependencies as well as spectroscopic properties for advanced knowledge that builds toward insightful predictions.</p><p>Molecular and materials chemistry is vastly complex and variable, and one often faces a question of whether to span wider chemical spaces versus take deeper explorations of a specific phenomenon. A key problem is that even after the effort of either approach, it is also not as clear how information for one system might be related to another to provide more knowledge. For instance, one may decide to calculate all possible properties of ethanol with a CompChem method, but understanding how any calculated property would be correlated to an analogous property of isopropanol is still usually difficult to do. There is great interest in understanding chemical and materials space through applications of quantitative structure activity/property relationships, <ref type="bibr">326,</ref><ref type="bibr">327</ref> cheminformatics, <ref type="bibr">328</ref> conceptual DFT, <ref type="bibr">329</ref> and alchemical perturbation DFT. <ref type="bibr">330</ref> All these applications benefit from greater access to CompChem data, and all have promise as being interfaced with ML for transformative applications to catalyze wisdom and impact.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">MACHINE LEARNING TUTORIAL AND INTERSECTIONS WITH CHEMISTRY</head><p>ML has had a dramatic impact on many aspects of our daily lives and has arguably become one of the most far-reaching technologies of our era. It is hard to overstate its importance in solving long-standing computer science challenges, such as image classification <ref type="bibr">[331]</ref><ref type="bibr">[332]</ref><ref type="bibr">[333]</ref><ref type="bibr">[334]</ref> or natural language processing, <ref type="bibr">[335]</ref><ref type="bibr">[336]</ref><ref type="bibr">[337]</ref><ref type="bibr">[338]</ref><ref type="bibr">[339]</ref> tasks that require knowledge that is hard to capture in a traditional computer program. <ref type="bibr">[340]</ref><ref type="bibr">[341]</ref><ref type="bibr">[342]</ref> Previous classical artificial intelligence (AI) approaches relied on very large sets of rules and heuristics, but these were unable to cover the full scope of these complex problems. Over the past decade,</p><p>Chemical Reviews pubs.acs.org/CR Review <ref type="url">https://doi.org/10.1021/acs.chemrev.1c00107</ref> Chem. Rev. 2021, 121, 9816-9872</p><p>advances in ML algorithms and computer technology made it possible to learn underlying regularities and relevant patterns from massive data sets that enable automatic constructions of powerful models that can sometimes even outperform humans at those tasks. This development inspired researchers to approach challenges in science with the same tools, driven by the hope that ML would revolutionize their respective fields in a similar way. Here, we give an overview of these developments in chemistry and physics to serve as an orientation for newcomers to ML. We will first explain what tasks ML is good at and when it might not be the best solution to a problem. We will start by introducing the field of ML in general terms and dissect its strengths and weaknesses.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">What is ML?</head><p>In the most general sense, ML algorithms estimate functional relationships without being given any explicit instructions of how to analyze or draw conclusions from the data. Learning algorithms can recover mappings between a set of inputs and corresponding outputs or just from the inputs alone. Without output labels, the algorithm is left on its own to discover structure in the data.</p><p>Universal approximators <ref type="bibr">343,</ref><ref type="bibr">344</ref> are commonly used for that purpose. These reconstruct any function that fulfills a few basic properties, such as continuity and smoothness, as long as enough data is available. Smoothness is a crucial ingredient that makes a function learnable, because it implies that neighboring points are correlated in similar ways. That property means that one can draw successful conclusions about unknown points as long as they are close to the training data (coming from the same underlying probability distribution). <ref type="bibr">341</ref> In contrast, completely random processes in the above sense allow no predictions.</p><p>An association that immediately springs to mind is traditional regression analysis, but ML goes a step further. Regression analyses aim to reconstruct the function that goes through a set of known data points with the lowest error, but ML techniques aim to identify functions to predict interpolations between data points and thus minimize the prediction error for new data points that might later appear. <ref type="bibr">345</ref> Those contrasting objectives are mirrored in the different optimization targets. In traditional regression, the optimization task</p><p>only measures the fit to the data, but learning algorithms typically aim to find models f &#770;that satisfy</p><p>Both optimization targets reward a close fit, often using the squared loss</p><p>. However, the key difference is an additional regularization term in eq 22, which influences the selection of candidate models by introducing additional properties that promote generalization. To understand why this is necessary, it is helpful to consider that eq 22 is only a proxy for the optimization problem</p><p>that we would actually like to solve. In an ideal world, we would minimize the loss function over the complete distribution of inputs and labels p(x, y). However, this is obviously impossible in practice, so we apply the principle of Occam's razor that presumes that simpler (parsimonious) hypotheses are more likely to be correct. With this additional consideration we hope to be able to recover a reasonably general model, despite only having seen a finite training set. A common way to favor simpler models is via an additional term in the cost function, which is what &#8741;&#915;&#920;&#8741; 2 in eq 22 expresses.</p><p>Here, &#915; is a matrix that defines "simplicity" with regard to the model parameters &#920;. Usually,</p><p>(where I is the identity matrix and &#955; &gt; 0) is chosen to simply favor a small L 2 -norm on the parameters, such that the solution does not rely on individual input features too strongly. This particular approach is called Tikhonov regularization, <ref type="bibr">[346]</ref><ref type="bibr">[347]</ref><ref type="bibr">[348]</ref> but other regularization techniques also exist. <ref type="bibr">349,</ref><ref type="bibr">350</ref> A model that is heavily regularized (i.e., using a large &#955;) will eventually become biased in that it is too simplistic to fit the data well. In contrast, a lack of regularization might yield an overly complex model with high variance. Such an "overly fit" model will follow the data exactly to the point that it also models the noise components and consequently fails to generalize (see Figure <ref type="figure">6</ref>). Finding the appropriate amount of regularization &#955; to manage under-and overfitting is known as attaining a good bias-variance trade-of f. <ref type="bibr">351</ref> We will introduce a process called cross-validation to address this challenge further below (see section 3.4.3).</p><p>3.1.1. What Does ML Do Well? Implicit Knowledge from Data. ML algorithms can infer functional relationships from data in a statistically rigorous way without detailed knowledge about the problem at hand. ML thus captures implicit knowledge from a data set-even aspects where CPI might not be available. Traditional modeling approaches, such as the classical force fields discussed in section 2.2.6, rely on preconceived notions about the PES that is being modeled and, thus, the way the physical system behaves. In contrast, ML algorithms start from a loss function and a much more general model class. Within the limits permitted by the noise inherent to the data, generalization can be improved to arbitrary accuracy given increasingly larger informative training data sets. This process allows us to explore a problem even before there is a reasonably full understanding. An ML predictor can serve as a starting point for theory building and be regarded as a versatile tool in the modeling loop: building predictive models, improving them, enriching them by formal insight, and improving further and ultimately extracting a formal understanding. More and more research efforts start to combine data-driven learning algorithms with rigorous scientific or engineering theory to yield novel insights and applications. <ref type="bibr">10,</ref><ref type="bibr">16,</ref><ref type="bibr">352</ref> Redundancy in CompChem Calculations. For a quantum chemical property for compounds in a data set, CompChem calculations need to be repeated independently for each input, even if they are very similar. No formally rigorous method exists to exploit redundancies in the calculations in such a scenario. The empiricism of learning algorithms however does provide a pathway to extract information based on compound structure similarity. A data-driven angle allows one to ask questions in new ways that give rise to new perspectives on established problems. For example, unsupervised algorithms like clustering or projection methods group objects according to latent structural patterns and provide insights that would remain hidden when only looking at individual compounds.</p><p>3.1.2. What Does ML Do Poorly? Lack of Generality and Precision. Some difficult problems in chemistry and physics can be solved accurately with CompChem, but doing so would require significant resources. For example, enumerating all pairwise interactions in a many-body system will inevitably scale quadratically, and there is no obvious path around this. One might ask if empirical approaches can address such fundamental problems more efficiently, but this is unfortunately not possible since ML is more suited for finding solutions in general function spaces rather than in deterministic algorithms where constraints guide the solution process. However, if we were not as interested in finding a full solution but rather some aspect of it, the stochastic nature of ML can be beneficial. For instance, a traditional ML approach might not be the best tool for explicitly calculating the Schrodinger equation, but it might be a far more useful tool for developing a force field that returns the energy of a system without the need for a cumbersome wavefunction and a selfconsistent algorithm. As an example, Hermann et al. <ref type="bibr">105</ref> used deep NNs to show how ML methods may be suitable for overcoming challenges faced by traditional CompChem approaches.</p><p>Reliance on High-Quality Data. ML algorithms require a large amount of high quality data, and it is hard to decide a priori when a data set is sufficient. Sometimes, a data set may be large, but it does not adequately sample all the relevant systems one intends to model. For example, an MD simulation might generate many thousands of molecular confirmations used to train an ML force field, but perhaps that sampling only occurred in a local region of the PES. In this case, the ML force field would be effective at modeling regions of the PES it was trained to but useless in other regions until more data and broader sampling occurred. This feature is general to all empirical models that are generally limited in their extrapolation abilities.</p><p>Inability to Derive High-Level Concepts. Standard ML algorithms cannot conceptualize knowledge from a data set. Two main reasons are the nonlinearity and excessive parametric complexity of most models that allow many equally viable solutions for the same problem. <ref type="bibr">353,</ref><ref type="bibr">354</ref> It can be hard to gain insight into the modeled relationship because it is not based on a small set of simple rules. Techniques have emerged to make ML models interpretable (explainable AI-XAI <ref type="bibr">355</ref> ). While helpful, drawing scientific insight clearly still requires human expertise. <ref type="bibr">352,</ref><ref type="bibr">[355]</ref><ref type="bibr">[356]</ref><ref type="bibr">[357]</ref><ref type="bibr">[358]</ref><ref type="bibr">[359]</ref><ref type="bibr">[360]</ref><ref type="bibr">[361]</ref> Furthermore, the path from an ML model back to a physical set of equations is being explored, but it is far from being fully established automatically. <ref type="bibr">[362]</ref><ref type="bibr">[363]</ref><ref type="bibr">[364]</ref><ref type="bibr">[365]</ref><ref type="bibr">[366]</ref><ref type="bibr">[367]</ref><ref type="bibr">[368]</ref> Prone to Artifacts. Despite following the rules of best practice, ML algorithms can give unexpected and undesired results. Instead of extracting meaningful relationships, they may occasionally exploit nuisance patterns within the underlying experimental design, like the model architecture, the loss function or artifacts in the data set. This results in a "clever Hans" predictor, <ref type="bibr">360</ref> which technically manages the learning problem but uses a trivial solution that is only applicable within the narrow scope of the particular experimental setup at hand. The predictor will appear to be performing well, while actually harvesting the wrong information and, therefore, not allowing any generalization or transferable insights.</p><p>For example, a recently proposed random forest predictor for the success of Buchwald-Hartwig coupling reactions <ref type="bibr">369</ref> was later revealed to give almost the same performance when the original inputs were replaced by Gaussian noise. <ref type="bibr">370,</ref><ref type="bibr">371</ref> This finding strongly suggested that the ML algorithm exploited some hidden underlying structure in the input data, irrespective of the chemical knowledge that was provided through the descriptor. Even though the model might appear quite useful, any conclusions that rely on the importance of the chemical features used in the model were thus rendered questionable at best. This example demonstrates that out-ofsample validation alone is often not sufficient to establish that a proposed model has indeed learned something meaningful. Therefore, the hypothesis described by the model must be challenged in extensive testing in practically relevant scenarios like actual physical simulations. In other words the ML model needs to lead to a better understanding of the modeling itself and the underlying chemistry.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Types of Learning</head><p>ML models are classified by the type of learning problem they solve. Consider for instance a data scientist who develops an ML model that can predict acidity constants (pK a values) for any molecule. A researcher with knowledge of physical organic chemistry might be aware of the empirical Taft equation <ref type="bibr">29</ref> that provides a linear free energy relationship between molecules on the basis of empirical parameters that account for a molecule's fundamental field, inductive, resonance, and steric effects (e.g., values related to Hammett &#961; and &#963; values). There are several ways the data scientist might develop an ML model for this or another application. Examples mentioned here include supervised, unsupervised, and reinforcement learning.</p><p>3.2.1. Supervised Learning. Supervised learning addresses learning problems where the ML model f : ML &#770;&#9135;&#8594; &#9135; connects a set of known inputs and outputs , either to perform a regression or classification task. While the former maps onto a continuous space (e.g., energy, polarizability), the latter outputs a categorical value (e.g., acid or base; metal or insulator) for each data point.</p><p>Using the pK a predictor example, a supervised learning algorithm could be trained to correlate recognizable chemical patterns or structures to experimentally known pK a values. The goal would be to deduce the relationship between these inputs and outputs, such that the model is able to generalize beyond the known training set. A standard universal approximator has to accomplish this learning task without any preconceived notion about the problem at hand and will, therefore, likely require many examples before it can make accurate predictions. Recently, a lot of research is being carried out that investigates ways to incorporate high-level concepts into the learning algorithm in the form of prior knowledge. <ref type="bibr">207,</ref><ref type="bibr">372</ref> In this vein, one could take into account chemically relevant parameters, such as Hammett constants so that the parametrized ML model incorporates the modified Hammett or Taft equation. An example of a classification problem in materials science is the categorization of materials, where identifying characteristics of the electronic structure can be used to distinguish between insulators and metals. <ref type="bibr">373</ref> 3.2.2. Unsupervised Learning. Unsupervised learning describes problems in which only the inputs are known, with no corresponding labels. In this setting, the goal is to recover some of the underlying structure of the data to gain a higherlevel understanding. Unsupervised learning problems are not as rigorously defined as supervised problems in the sense that there can be multiple correct answers, depending on the model and objective function that is applied.</p><p>For example, one might be interested in separating conformers of a molecule from an MD trajectory, given exclusively the positions of the atoms. A clustering algorithm (like the k-means algorithm) could identify those conformers by grouping the data based on common patterns. <ref type="bibr">374,</ref><ref type="bibr">375</ref> Alternatively, a projection technique could reveal a lowdimensional representation of the data set. <ref type="bibr">376</ref> Often data is represented in high dimension, despite being intrinsically lowdimensional. With the right projection technique, it is possible to retain the meaningful properties in a representation with fewer degrees of freedom. A conceptually simple embedding method is principal component analysis (PCA) in which the relationship that is sought to be preserved is the scalar product between the data points. <ref type="bibr">340</ref> There are many other linear and nonlinear projection methods, such as multidimensional scaling, <ref type="bibr">377</ref> kernel PCA (KPCA), <ref type="bibr">378,</ref><ref type="bibr">379</ref> t-distributed stochastic neighbor embedding (t-SNE), <ref type="bibr">380</ref> sketch-map, <ref type="bibr">381</ref> and the uniform manifold approximation and projection (UMAP). <ref type="bibr">382</ref> Finally, anomaly detection is another extension of unsupervised learning, where 'outliers' to the available data can be discovered. <ref type="bibr">383</ref> However, without knowing the labels (in this example, the potential energy associated with each geometry), there is no way to conclusively verify that the result is correct. The literature is gradually seeing more instances of unsupervised learning, particular to reveal important chemical properties to efficiently explore chemical/materials spaces.</p><p>3.2.3. Reinforcement Learning. Reinforcement learning (RL) describes problems that combine aspects of supervised and unsupervised learning. RL problems often involve defining an agent within an environment that learns by receiving feedback in the form of punishments and rewards. The progress of the agent is characterized by a combination of explorative activity and exploitation of already gathered knowledge. <ref type="bibr">384</ref> For chemistry applications, RL techniques are Chemical Reviews pubs.acs.org/CR Review being increasingly used for finding molecules with desired properties in large chemical spaces. <ref type="bibr">10</ref> </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Universal Approximators</head><p>Universal approximators have their origins in the 1960s, where the hope was to construct "learning machines" that have similar capabilities as the human brain. An early mathematical model of a single simplified neuron emerged that was called a perceptron (eq 24). <ref type="bibr">385,</ref><ref type="bibr">386</ref> i k j j j j j j y</p><p>Here, x denotes the N-dimensional input to the perceptron. It has N + 1 parameters consisting of w i (so-called weights) and a single b (a so-called threshold) that are adapted to the data. This adaption process is typically called "learning" (vide infra), and it amounts to minimizing a predefined loss function.</p><p>In the 1960s, this simple NN had very limited use, as it was only able to model a linear separating hyperplane. Even simple nonlinear functions like the XOR were out of reach. <ref type="bibr">387</ref> Thus, excitement waned but then reappeared two decades later with the emergence of novel models consisting of more neurons and their arrangement in multilayer NN structures <ref type="bibr">388</ref> (see eq 25).</p><p>Recent algorithmic and hardware advances now allow deep and increasingly complex architectures. <ref type="bibr">1,</ref><ref type="bibr">2</ref> i k j j j j j j j &#196;</p><p>In eq 25, g(&#8226;) denotes an activation function that is a nonlinear transformation that allows complex mappings between input and output. As with the perceptron, the parameters of multilayer NNs can be learned efficiently using iterative algorithms that compute the gradient of the loss-function using the so-called back-propagation (BP) algorithm. <ref type="bibr">[388]</ref><ref type="bibr">[389]</ref><ref type="bibr">[390]</ref> In the late 1980s, artificial NNs were then proven to be universal approximators of smooth nonlinear functions, <ref type="bibr">343,</ref><ref type="bibr">391,</ref><ref type="bibr">392</ref> and so they gained broad interest even outside the ML community that then was still relatively small. In 1995, a novel technique called Support Vector Machine (SVM) <ref type="bibr">345,</ref><ref type="bibr">393</ref> and kernel-based learning were then proposed, <ref type="bibr">379,</ref><ref type="bibr">[394]</ref><ref type="bibr">[395]</ref><ref type="bibr">[396]</ref> which came with some useful theoretical guarantees. SVMs implement a nonlinear predictor:</p><p>where K is the so-called kernel. The kernel implicitly defines an inner product in some feature space and thus avoids an explicit mapping of the inputs. This "kernel trick" 397 makes it possible to introduce nonlinearity into any learning algorithm that can be expressed in terms of inner products of the input. <ref type="bibr">379</ref> It has since been applied to many other algorithms beyond SVMs, <ref type="bibr">394</ref> such as Gaussian Processes (GP), <ref type="bibr">348</ref> PCA, <ref type="bibr">378,</ref><ref type="bibr">379</ref> and independent component analysis (ICA). <ref type="bibr">398</ref> The most effective kernels are tailored to the specific learning task at hand, but there are many generic choices, such as the polynomial kernel K(x j , x) = (&#10216;x j , x&#10217;b) d , which describes inner products between degree d polynomials. Another popular choice is the Gaussian kernel K(x j , x) = exp(-(x jx) 2 /(2&#963; 2 )). It is one of the most versatile kernels because it only imposes smoothness assumptions on the solution depending on the width parameter &#963;. <ref type="bibr">347,</ref><ref type="bibr">395</ref> As seen in eq 26, an SVM can also be understood as a shallow NN with a fixed set of nonlinearities. In other words, the kernel explicitly defines a similarity metric to compare data points, whereas NNs have more freedom to shape this transformation during training because they nest parametrizable nonlinear transformations on multiple scales. This difference gives both techniques unique strengths and drawbacks. Despite that, there exists a duality between both approaches that allows NNs to be translated into kernel machines and analyzed more formally (see refs 399-401).</p><p>In the context of CompChem, both NNs and kernel-based methods are the most used ML approaches. Simpler learners, such as nearest neighbor models or decision trees can still be surprisingly effective. Those have also been successfully used to solve a wide spectrum of problems including drug design, chemical synthesis planning, and crystal structure classification. <ref type="bibr">[402]</ref><ref type="bibr">[403]</ref><ref type="bibr">[404]</ref><ref type="bibr">[405]</ref><ref type="bibr">[406]</ref><ref type="bibr">[407]</ref> </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.">ML Workflow</head><p>In the following, we summarize the overall ML process, starting from a data set all the way to a trained and tested model. The ML workflow typically includes the following stages:</p><p>1 Gathering and preparing the data 2 Choosing a representation 3 Training the model 3a Train model candidates 3b Evaluate model accuracy 3c Tune hyperparameters 4 Testing the model out of sample Note, that the progression to a good ML model is not necessarily linear and some steps (except the out of sample test) may require reiteration as we learn about the problem at hand.</p><p>3.4.1. Data Sets. On a fundamental level, ML models could be simply regarded as sophisticated parametrizations of data sets. While the architectural details of the model matter, the reference data set forms the backbone that ultimately determines the model's effectiveness. If the data set is not representative of the problem at hand, the model will be incomplete and behave unpredictably in situations that have been improperly captured. The same applies to any other shortcomings of the data set, such as biases or noise artifacts that will also be reflected in the model. Some of these data set issues are likely to remain unnoticed when following the standard model selection protocol since training and test data sets are usually sampled from the same distribution. If the sampling method is too narrow, errors seen during the crossvalidation procedure may appear to be encouragingly small, but the ML model will fail catastrophically when applied to a real problem. If the training and test sets come from different distributions, then techniques to compensate this covariate shift can be used. <ref type="bibr">408,</ref><ref type="bibr">409</ref> Robust models can generally only be constructed from comprehensive data sets, but it is possible to incorporate certain patterns into models to make them more data-efficient. Prior scientific knowledge or intuition about specific problems can be used to reduce the function space from which an ML algorithm has to select a solution. If some of the unphysical solutions are removed a priori, less data are necessary to identify a good model. This is why NNs and kernel methods, Chemical Reviews pubs.acs.org/CR Review despite both being broad universal function classes, bring different scaling behaviors. The choice of the kernel function provides a direct way to include prior knowledge such as invariances, symmetries, or conservation laws, whereas NNs are typically used if the learning problem cannot be characterized as specifically. <ref type="bibr">207,</ref><ref type="bibr">372,</ref><ref type="bibr">410</ref> In general, without prior knowledge, NNs often require larger data sets to produce the same accuracy as well-constrained kernel methods that embody problem knowledge. This consideration is particularly important if the data is expensive, for example, if it comes from high quality experiments or expensive computations. 3.4.2. Descriptors. To apply ML, the data set needs to be encoded into a numerical representation (i.e., features/ descriptors) that allows the learning algorithm to extract meaningful patterns and regularities. <ref type="bibr">[411]</ref><ref type="bibr">[412]</ref><ref type="bibr">[413]</ref><ref type="bibr">[414]</ref><ref type="bibr">[415]</ref><ref type="bibr">[416]</ref><ref type="bibr">[417]</ref><ref type="bibr">[418]</ref><ref type="bibr">[419]</ref> This is particularly challenging for unstructured data like molecular graphs that have well-defined invariable or equivariable characteristics that are hard to capture in a vectorial representation. For example, atoms of the same type are indistinguishable from each other, but it is hard to represent them without imposing some kind of order (which inevitably assigns an identity to each atom). Furthermore, physical systems can be translated and rotated in space without affecting many attributes. Only a representation that is adapted to those transformations can solve the learning problem efficiently.</p><p>It turned out to be a major challenge to reconcile all invariances of molecular systems in a descriptor without sacrificing its uniqueness or computability. Some representations cannot avoid collisions, where multiple geometries map onto the same representation. Others are unique, but prohibitively expensive to generate. Many solutions to this problem have been proposed, based on general strategies such as invariant integration, <ref type="bibr">207</ref> parameter sharing, <ref type="bibr">352,</ref><ref type="bibr">[421]</ref><ref type="bibr">[422]</ref><ref type="bibr">[423]</ref> density representations, <ref type="bibr">276</ref> or finger printing techniques. <ref type="bibr">[424]</ref><ref type="bibr">[425]</ref><ref type="bibr">[426]</ref><ref type="bibr">[427]</ref><ref type="bibr">[428]</ref><ref type="bibr">[429]</ref><ref type="bibr">[430]</ref><ref type="bibr">[431]</ref><ref type="bibr">[432]</ref><ref type="bibr">[433]</ref> Alternatively, an NN model infers the representation from data. <ref type="bibr">352,</ref><ref type="bibr">424,</ref><ref type="bibr">434,</ref><ref type="bibr">435</ref> To date, none of the proposed approaches are without compromise, which is why the optimal choice of descriptor depends on the learning task at hand.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.3.">Training.</head><p>The training process is the key step that ties together the data set and model architecture. Through the choice of the model architecture, we implicitly define a function space of possible solutions, which is then conditioned on the training data set by selecting suitable parameters. This optimization task is guided by a loss function that encodes our two somewhat opposing objectives: (1) achieving a good fit to the data, while (2) keeping the parametrization general enough such that the trained model becomes applicable to data that is not covered in the training set (see the two terms in eq 22). Satisfying the latter objectives involves a process called model selection in which a suitable model is chosen from a set of variants that have been trained with exclusive focus on the first objective. Depending on the model architecture, more or less sophisticated optimization algorithms can be applied to train the set of model candidates.</p><p>Kernel-based learning algorithms are typically linear in their parameters &#945; (see eq 26). Coupled with a quadratic loss function,</p><p>, they yield a convex optimization problem. Convex problems can be solved quickly and reliably due to only having a single solution that is guaranteed to be globally optimal. This solution can be found algebraically by taking the derivative of the loss function and setting it to zero. For example, kernel ridge regression (KRR) and GPs then yield a linear system of the form</p><p>which is typically solved in a numerically robust way by factorizing the kernel matrix K. There exist a broad spectrum of matrix factorization algorithms, such as the Cholesky decomposition, that exploit the symmetry and positive definiteness properties of kernel matrices. <ref type="bibr">[436]</ref><ref type="bibr">[437]</ref><ref type="bibr">[438]</ref><ref type="bibr">[439]</ref><ref type="bibr">[440]</ref> Factorization approaches are, however, only feasible if enough memory is available to store the matrix factors, and this can be a limitation for large-scale problems. In that case, numerical optimization algorithms provide an alternative: they take a multistep approach to solve the optimization problem iteratively by following the gradient:</p><p>where &#947; is the step size (or learning rate). Iterative solvers follow the gradient of the loss function until it vanishes at a minimum, which is much less computationally demanding per step, because it only requires the evaluation of the model f &#770;. In particular, kernel models can be evaluated without storing K (see eq 28).</p><p>NNs are constructed by nesting nonlinear functions in multiple layers, which yields nonconvex optimization problems. Closed-form solutions similar to eq 27 do not exist, which means that NNs can only be trained iteratively, that is, analogous to eq 28. Several variants of this standard gradient descent algorithm exist including stochastic or mini-batch gradient descent, where only an n-sized portion of the training data (x,y) i:i+n is considered in every step. Because of multiple local minima and saddle points on the loss surface, the global minimum is exponentially hard to obtain (since these algorithms usually converge to a local minimum). However, thanks to the strong modeling power of NNs, local solutions are usually good enough. <ref type="bibr">441</ref> Hyperparameters. In addition to the parameters that are determined when fitting an ML model to the data set (i.e., the node weights/biases or regression coefficients), many models contain so-called hyperparameters that need to be fixed before training. Two types of hyperparameters can be distinguished: ones that influence the model, such as the type of kernel or the NN architecture, and ones that affect the optimization algorithm, for example, the choice of regularization scheme or the aforementioned learning rate. Both tune a given model to the prior beliefs about the data set and thus play a significant role in model effectiveness. Hyperparameters can be used to gauge the generalization behavior of a model.</p><p>Hyperparameter spaces are often rather complex: certain parameters might need to be selected from unbounded value spaces, others could be restricted to integers or have interdependencies. This is why they are usually optimized using primitive exhaustive search schemes like grid or random searches in combination with educated guesses for suitable search ranges. Common gradient-based optimization methods typically cannot be applied for this task. Instead, the performance of a given set of hyperparameters is measured by evaluating the respective model on another training data set called the validation data set (see Figure <ref type="figure">6</ref>). This process is also referred to as model selection.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Chemical</head><p>Reviews pubs.acs.org/CR Review <ref type="url">https://doi.org/10.1021/acs.chemrev.1c00107</ref> Chem. Rev. 2021, 121, 9816-9872</p><p>Model Selection. Cross-validation or out-of-sample testing is a technique to assess how a trained ML model will generalize to previously unseen data. <ref type="bibr">340,</ref><ref type="bibr">395</ref> For a reasonably complex model, it is typically not challenging to generate the right responses for the data known from the training set. This is why the training error is not indicative of how the model will fulfill its ultimate purpose of predicting responses for new inputs. Alas, since the probability distribution of the data is typically unknown, it is not possible to determine this so-called generalization error exactly. Instead, this error is often estimated using an independent test subset that is held back and later passed through the trained model to compare its responses to the known test labels. If the model suffers from overfitting on the training data, this test will yield large errors. It is important to remember not to tweak any parameters in response to these test results, as this will skew this assessment of the model performance and will lead to overfitting on the test set. <ref type="bibr">442</ref> Besides cross-validation, there are alternative ways to estimate the generalization error, for example via maximization of the marginal likelihood in Bayesian inference. <ref type="bibr">[443]</ref><ref type="bibr">[444]</ref><ref type="bibr">[445]</ref> Some well-defined learning scenarios even allow the computation of rigorous upper bounds for the generalization error. <ref type="bibr">345,</ref><ref type="bibr">[446]</ref><ref type="bibr">[447]</ref><ref type="bibr">[448]</ref> </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">APPLICATIONS OF MACHINE LEARNING TO</head><p>CHEMICAL SYSTEMS We now discuss ways that CompChem methods described in section 2 and ML methods in section 3 can be implemented as CompChem+ML approaches for insights into chemical systems. We often notice the lack of details about why an ML model is used and how it actually contributes to worthwhile and scientific insights. Thus, we will summarize the underlying attributes of conventional CompChem+ML efforts and then explain why these attributes are important for specific applications.</p><p>To begin, consider molecules or materials in a data set, and any entry will be related to another based on an abstract concept of "similarity". While similarity is an applicationdependent concept, it should go hand in hand with CPI. For instance, physical properties of chemical systems can be attributed to the structure or composition of the chemical fragments within those systems. Thus, if chemical structures and compositions of two entries in the database were similar, then their physical properties would also likely be similar.</p><p>For CompChem+ML using a supervised algorithm, a CompChem prediction might be made on a hypothetical system, pinpointed by an ML model that was trained to identify chemical fragments that correlate with labeled physical properties. This would be a direct exploitation of chemical similarity. Alternatively, for CompChem+ML using an unsupervised algorithm, the ML model would identify an underlying distribution or key features based on the similarity between pairs of entries in the data set without labels. This would be a more nuanced leveraging of chemical similarity. In both cases the accuracy, efficiency and reliability of the ML models depend strongly on how similarity is defined and measured.</p><p>In this section, we will first describe state-of-the-art descriptors and kernels for atomic systems that can be used to quantify the similarity between chemical systems. We will then explain the essential attributes of good atomic descriptors. Lastly for this section, we will elucidate why and how specific combinations of these descriptors and ML algorithms are beginning to revolutionize the field of CompChem.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Representing Chemical Systems</head><p>In CompChem, molecules and materials are usually represented by the Cartesian coordinates and the chemical elements of all the atoms. Thus, the size of the vector representation containing the coordinates and charges will be and</p><p>}, respectively, for a system of size N. Even though these atomic coordinates provide a complete description of the system, they are hardly ever used as the input of a ML model because this vector would introduce substantial superfluous redundancy. For instance, an ML model might treat two identical molecules that are rotated or translated as different molecules, and that in turn might cause the ML model to predict different physical properties for the two otherwise indistinguishable molecules. There are further difficulties when comparing molecules having different numbers of atoms. To work around these problems, atomic coordinates are usually converted into an appropriate representation &#968; that is suitable for a particular task. Such conversions are useful because they allow the incorporation of physical invariances. Mathematically speaking, the representation fulfills</p><p>)</p><p>where S indicates a symmetry operation, for example, a rigid rotation about an axis C i , an exchange of two identical atoms, or a translation of the whole system in the Cartesian space, etc. It can also be advantageous to adopt a coarse-grained representation of the system. <ref type="bibr">449,</ref><ref type="bibr">450</ref> For example, dihedral angles of a peptide might be accounted for without the positions of the side-chains, positions of ions in a solution might be accounted for without the explicit coordinates of solvents, or just the center of mass for a water molecule might be accounted for in place of the full three-centered atomistic representation. The choice of these coarse-grained representations provides a way to incorporate prior knowledge of the data, or such representations can be learned from an unsupervised learning step. <ref type="bibr">451</ref> 4.1.1. Descriptors. Atomistic systems can be represented in a myriad of ways. Some descriptions are designed to emphasize particular aspects of a system, while others aim to disambiguate similar chemical or physical principles across a wide range of molecules or materials. The set of desirable properties in a representation thus depends on the task at hand. All adhere to the aforementioned physical symmetries and invariances needed for chemical systems. Many have similar theoretical foundations that can be understood as the basis onto which the atomic density is projected, <ref type="bibr">452</ref> and the connection between them has been summarized in a recent review. <ref type="bibr">453</ref> Table <ref type="table">4</ref> gives a coarse characterization of popular representations. <ref type="bibr">276,</ref><ref type="bibr">411,</ref><ref type="bibr">412,</ref><ref type="bibr">415,</ref><ref type="bibr">417,</ref><ref type="bibr">418,</ref><ref type="bibr">454,</ref><ref type="bibr">455</ref> To create this overview, we had to adopt a reductionist perspective, which inevitably hides the complexities involved in developing robust atomistic representations. Whether a representation satisfies a particular property can sometimes not be answered unequivocally. For example, is a descriptor unique if the ML model showed pathologically erroneous results? Should a symmetry be perfectly satisfied, even if it is a bad ML feature? We therefore stress that the table simply presents representations and their attributes. A representation that satisfies more attributes is not necessarily better if it also lacks another important attribute. We kindly refer the reader to the respective original publications for more information.</p><p>The descriptors in Table <ref type="table">4</ref> can be classified into two categories: global and atomic (i.e., not global). Traditional descriptors used in cheminformatics are global descriptors based on the covalent connectivity of atoms. These include simple valence counting and common neighbor analysis, <ref type="bibr">456</ref> the presence or absence of predefined atomic fragments (e.g., the Morgan fingerprints 427 ), pairwise distances between atoms (e.g., Coulomb Matrix, <ref type="bibr">413</ref> Sine Matrix, <ref type="bibr">414</ref> Ewald Sum Matrix, <ref type="bibr">414</ref> Bag of Bonds (BoB) <ref type="bibr">415</ref> ), etc. Coulomb matrices have known problems because of lack of smoothness, but these are partly addressed by employing the Wasserstein norm, rather than Euclidean or Manhattan norms. <ref type="bibr">457</ref> However, atomic descriptors <ref type="bibr">411,</ref><ref type="bibr">412,</ref><ref type="bibr">[416]</ref><ref type="bibr">[417]</ref><ref type="bibr">[418]</ref><ref type="bibr">[419]</ref><ref type="bibr">[420]</ref><ref type="bibr">458</ref> are generally more popular than the global ones in ML and CompChem. In atomic descriptors, a chemical system is described as a set of atomic environments, , ... ...</p><p>i N 1</p><p>, and each consists of the atoms (chemical species and position) within a sphere of radius r cut centered at a specific atom i. One needs to combine the set of atomic descriptors of all environments to construct a descriptor for the entire atomic structure. The most straightforward way to do this is to average the atomic descriptors,</p><p>where the sum runs over all N A atoms i in structure A and i is the environment around atom i. When there are multiple chemical species, the descriptors for the local environments of different species can either be included in the single sum, or the averaging can be performed for the environments of each species separately and the species-specific averaged local descriptors can be concatenated. This can be done by considering the root mean square displacement (RMSD), <ref type="bibr">454</ref> the best match between the environments of the two structures (best-match), <ref type="bibr">459</ref> or by combining local descriptors using a regularized entropy match (RE-Match). <ref type="bibr">459</ref> 4.1.2. Representing Local Environments. We will now describe the Smooth Overlap of Atomic Positions (SOAP) descriptors <ref type="bibr">412</ref> since many other descriptors based on the atomic density are similar and differ mainly by how the density is projected onto basis functions. <ref type="bibr">420,</ref><ref type="bibr">452</ref> To construct SOAP descriptors, one first considers an atomic environment that contains only one atomic species, and a Gaussian function of width &#963; is then placed on each atom i in to make an atomic density function:</p><p>Here, r denotes a point in Cartesian space, r i is the position of atom i relative to the central atom of , and the cutoff function f cut smoothly decays to zero beyond the cutoff radius r cut . This density representation ensures invariance with respect to translations and permutations of atoms of the same species but not rotations.</p><p>To obtain a rotationally invariant descriptor, one expands the density in a basis of spherical harmonics, Y lm (r), and a set of orthogonal radial functions, g n (|r|), as Chemical Reviews pubs.acs.org/CR Review <ref type="url">https://doi.org/10.1021/acs.chemrev.1c00107</ref> Chem. Rev. 2021, 121, 9816-9872 c g Y r r r ( ) ( ) ( ) nlm nlm n lm</p><p>to construct the power spectrum of the density using the expansion coefficients:</p><p>One then obtains a vector of descriptors &#968; = {&#968; nn&#8242;l } by considering all components l &#8804; l max and n, n&#8242; &#8804; n max that act as band limits that control the spatial resolution of the atomic density. The generalization to more than one chemical species is straightforward: <ref type="bibr">459</ref> one constructs separate densities for each species &#945; and then computes the power spectra ( )</p><p>for each pair of elements &#945; and &#945;&#8242;, where the two species indices correspond to the c* and c coefficients, respectively. The resulting vectors corresponding to each of the &#945; and &#945;&#8242; pairs are then concatenated to obtain the descriptor vector of the complete environment.</p><p>Atom-centered symmetry functions (ACSFs), or sometimes called Behler-Parrinello symmetry functions, <ref type="bibr">411</ref> descriptors differ from SOAP in that they project the atomic densities over selected 2-body or 3-body symmetry functions. FCHL <ref type="bibr">417</ref> descriptors follow similar principles while also considering the correlations between the atomic densities coming from different chemical species. The many-body tensor representation (MBTR) <ref type="bibr">418</ref> approach involves taking the histograms of atom counts, inverse pairwise distances, and angles. Atomic cluster expansion (ACE) descriptors <ref type="bibr">420</ref> first express atomic densities using spherical harmonics and then generate invariant products by contracting the spherical harmonics with the Clebsch-Gordan coefficients.</p><p>Length-Scale Hyperparameters. Most atomic descriptors use length-scale hyperparameters specifically chosen for a given problem and system. <ref type="bibr">276,</ref><ref type="bibr">411,</ref><ref type="bibr">412,</ref><ref type="bibr">415,</ref><ref type="bibr">417,</ref><ref type="bibr">418,</ref><ref type="bibr">454,</ref><ref type="bibr">455</ref> There are several ways to automate hyperparameter selections. Ref 374 introduced general heuristics for choosing the SOAP hyperparameters for a system with arbitrary chemical composition based on characteristic bond lengths. Ref 465 adopts the strategy to first generate a comprehensive set of ACSFs and then select a subset using the sparsification methods such as farthest point sampling (FPS) <ref type="bibr">466</ref> and CUR matrix decomposition. <ref type="bibr">467</ref> Incompleteness of Atomic Descriptors. A structural descriptor is complete when there is no pair of configurations that produces the same descriptor. <ref type="bibr">468</ref> For atomic descriptors, this means that different atomic environments&#57557;after considering the invariances of rotation, translation, and permutation of identical atoms&#57557;should adopt distinct descriptors. Without completeness, any ML model using the descriptors as input will give identical predictions of physically different systems. Ensuring completeness while preserving the invariances is nontrivial, however. One of the simplest descriptors is based on permutationally invariant pairwise atomic distances (2-body descriptors), and ref 412 demonstrated that these are generally not complete since one can construct two distinct tetrahedra using the same set of distances. Many have assumed that permutationally invariant 3-body atomic descriptors uniquely specify atomic environments because of the tremendous success of ML models for chemical systems and particularly MLPs. However, refs 469 and 468 exemplify that structural degeneracies can be found even when using 3-or 4-body descriptors. This underscores an important shortcoming of state-of-the-art 3-body descriptors, such as ACSF, <ref type="bibr">411</ref> SOAP, <ref type="bibr">412</ref> FCHL, <ref type="bibr">417</ref> and MBTR. <ref type="bibr">418</ref> ACE 420 should be a complete descriptor of local environments, but its reliance on spherical harmonic expansion and the subsequent contraction makes their evaluations expensive. Hence, there are still opportunities to develop improved atomic descriptors.</p><p>4.1.3. Locality Approximation. Representing a manybody chemical system in terms of atomic environments brings physical significance since certain extensive physical properties (e.g., the total energy, total electrostatic charge, and polarizability of a system) can be approximated by the sum of the atomic contributions coming from each atomic environment, for example, ( )</p><p>. This approximation is valid because the atomic contribution associated with a central atom is largely determined by its neighbors, and long-range interactions can be approximated in a mean-field manner without explicitly considering distant atoms. Such "locality" is tacitly assumed in many ML models for CompChem, and it is a crucial necessity for most common atomistic potentials and MLPs (section 2.2.6.). Most MLPs (e.g., BPNN, <ref type="bibr">274</ref> GAP, <ref type="bibr">276</ref> and DeepMD 462 ) approximate the total energy of a system as sums of local atomic energies.</p><p>Figure <ref type="figure">7</ref> illustrates locality by showing a KPCA map of the atom environments of carbon in the QM9 set (see section 3.3 for more detailed descriptions of the data set). By color-coding the KPCA plot with the local energies from a SOAP-based GAP model trained on QM9 energies, <ref type="bibr">470</ref> one observes a systematic and smooth trend in energies across clusters. The total molecular energy can then be accurately predicted by the sum of local energies, which means the total energy can be approximated on the basis of all the local environments contained in the molecule. For example, an NN potential trained on liquid water simulations can predict the densities, lattice energies, and vibrational properties of diverse ice phases because the local atomic environments found in liquid water span the similar environments as those observed in ice phases. <ref type="bibr">471</ref> Another GAP potential of carbon trained on amorphous structures and other crystalline phases predicted novel carbon structures in random structure searches as well as approximate reaction barriers. <ref type="bibr">472,</ref><ref type="bibr">473</ref> The locality approximation is typically rationalized based on the multiscale nature of interatomic interactions in chemical systems. It is generally expected that shorter interatomic distances correspond to stronger interactions, such that a cutoff may be imposed after a certain radial distance d given a certain energy accuracy threshold &#1013;. The multiscale nature of physical interactions underlies the usual classification of chemical interactions, from strong covalent bonds and ionic interactions to weaker noncovalent hydrogen bonds and van der Waals interactions. However, our understanding of noncovalent interactions in large molecules and materials is still emerging, <ref type="bibr">36</ref> and no general rules-of-thumb exist to define the cutoff distance d corresponding to a defined &#1013;. Moreover, the sufficiency of the locality argument also depends on the phase of the system and whether the system is extended or not. <ref type="bibr">474</ref> Hence, for systems having long-range interactions (which includes most chemical systems), the locality assumption needs revision. There are currently three schools of approaches handling the long-range interactions. The first is to use global ML models, such as (s-)GDML, <ref type="bibr">207,</ref><ref type="bibr">372</ref> which learn global interactions directly. Global models tend to be more data-efficient because they focus on learning a full molecular or material PES, but this significantly limits transferability since the ML model alone can only be used on the system it was trained upon. The second is to learn the charges <ref type="bibr">475,</ref><ref type="bibr">476</ref> and multipoles <ref type="bibr">477</ref> for each atom, and then the long-range electrostatic interactions based on environmentdependent charges or multipoles can be explicitly included using Coulomb's law. To ensure that the sum of the atomic charges reaches neutrality, charge equilibration schemes can be used. <ref type="bibr">478</ref> The third is to capture the long-range electrostatic effects by introducing a nonlocal long-distance equivariant (LODE) representation, <ref type="bibr">479,</ref><ref type="bibr">480</ref> which is dependent on the electrostatic field generated by the decorated atom density.</p><p>4.1.4. Advantages of Built-In Symmetries. Built-in symmetry in ML models substantially compresses the dimensionality of atomic representations and ensures that physically equivalent systems are predicted to have identical properties. One of the most rigorous ways of imposing symmetry onto a model f is via the invariant integration over the relevant group</p><p>where P &#960; x is a permutation of the input. However, the cardinality of even basic symmetry groups is exceedingly high, which makes this operation prohibitively expensive. This combinatorial challenge can be solved by limiting the invariant integral to the physical point group and fluxional symmetries that actually occur in the training data set, as done in sGDML. <ref type="bibr">207</ref> Alternative approaches, such as parameter sharing <ref type="bibr">352,</ref><ref type="bibr">[421]</ref><ref type="bibr">[422]</ref><ref type="bibr">[423]</ref> or density representations, <ref type="bibr">276</ref> have also proven effective. For example, the DeepMD potential has two versions, the Smooth Edition (DeepPot-SE) explicitly preserves all the natural symmetries of the molecular system, and the other version that does not. <ref type="bibr">462</ref> The DeepPot-SE offers much improved stability and accuracy. <ref type="bibr">207,</ref><ref type="bibr">462</ref> For ML predictions of scalar properties, the rotationally invariant atomic descriptor framework described earlier is appropriate. One may wish to predict vectorial or tensorial properties including dipole moments, polarizability, and elasticity. A covariant version of descriptors may be advantageous, and this can be expressed as</p><p>where S indicates a symmetry operation such as a rigid rotation about an axis. Ref 481 proposed a general method for transforming a standard kernel for fitting scalar properties into a covariant one. Ref 482 derived a rotational-symmetryadapted SOAP kernel, which can be understood as using the angular-dependent SOAP vectors based on spherical harmonics expansions as the descriptors. Note that the SOAP kernels for learning scalar properties introduced in ref 412 remove angular dependencies by summing up the SOAP vectors in separate spherical harmonics channels. Symmetry can be further exploited into "alchemical" representations that incorporate similarity between chemical species that are relatable by changing one atom into another. The FCHL <ref type="bibr">417</ref> representation considers the similarity between elements in the same row and columns of the periodic table and performs very well on chemical compounds across chemical space. Ref 483 compiled a data-driven periodic table of the elements by fitting to an elpasolite data set using an alchemical representation.</p><p>4.1.5. End-to-End NN Representations. All descriptors introduced above rely on a suitable set of hyperparameters (e.g., length scales, radial and angular resolution). Determining an optimal set of hyperparameters can be a tedious process, especially when heuristics are unavailable or fail due to the structural and compositional complexity of the system. A poor choice of descriptors can limit the accuracy of the final ML model, for example, when certain interatomic distances can not be resolved.</p><p>End-to-end NN representations follow a different strategy to learn a representation directly from reference data. Using atom types and positions of a system as inputs, end-to-end NNs construct a set of atom-wise features x i . These features are then used to predict the property of interest, for example, the energy as a sum of atom-wise contributions. Unlike static descriptors, the representation is also optimized as part of the overall training process. This way end-to-end NNs can adapt to structural features in the data and the target properties in a fully automatic fashion to eliminate the need for extensive feature engineering from the practitioner.</p><p>The deep tensor NN framework (DTNN) <ref type="bibr">352</ref> introduced a procedure to iteratively refine a set of atom-wise features {x i } based on interactions with neighboring atoms. Higher-order interactions can then be captured in an hierarchical fashion. For example, a first information pass would only capture radial information, but further interactions would recover angular relations and so on. In DTNN, a learnable representation depending only on atom types x i 0 = e z i serves as an initial set of features. These are then refined by successive applications of an update function depending on the atomic environment that takes the general form i k j j j j j j j y</p><p>Here, l indicates the number of overall update steps. The sum runs over all atoms j in the local environment, and a cutoff function f cut ensures smoothness of the representation. Each feature is updated with information from all neighboring atoms through the interaction function G. Apart from the neighbor features x j , G also depends on the interatomic distance |r ir j |, which is usually expressed in the form of a radial basis vector g. After the update, an atom-wise transformation F can be applied to further modulate the features. Since each update depends only on the interatomic distances and the summation over neighboring atoms is commutative, end-to-end NNs of this type automatically achieve a representation that is invariant to rotation, translation and permutations of atoms. Using these atom-type dependent embeddings compactly encodes elemental information. This is advantageous for systems comprised of many different chemical elements. Such multicomponent systems can be problematic to treat with predefined descriptors (e.g., ACSFs or SOAP), as these typically introduce additional entries for each possible combination of atom types, resulting in a large number of descriptor dimensions.</p><p>Since the introduction of DTNN, many different types of end-to-end NNs have been developed, and these vary by the choice for the functions F and G. For example, SchNet <ref type="bibr">434</ref> uses continuous convolutions inspired by convolutional neural networks (CNNs) to describe the interatomic interactions. In this case, the update in eq 36 takes the form i k j j j j j j j y</p><p>where the feature transformation (NN tr ) and the radial dependence (NN rad ) are both modeled as trainable NNs.</p><p>Other ML models introduce additional physical information. The hierarchical interacting particle NN (HIP-NN) <ref type="bibr">484</ref> enforces a physically motivated partitioning of the overall energy between the different refinement steps, while the PhysNet architecture <ref type="bibr">485</ref> introduces explicit terms for longrange electrostatic and dispersion interactions. In ref 421, Gilmer et al. categorize graph networks of this general type as message-passing NNs (MPNNs) and introduce the concept of edge updates. These make it possible to use interatomic information beside the radial distance metric in the refinement procedure, and they have since been adapted for other architectures. <ref type="bibr">486</ref> Another interesting extension are end-to-end NNs incorporating higher-order features beside the scalar x i used in the original DTNN framework. These are equivariant features that encode rotational symmetry and can be based on angles, dipole moment vectors, or features that can be expressed as spherical harmonics with l &gt; 0. This enables the exchange of only radial information between atoms in each interaction pass and instead include higher structural information, such as dipole-dipole interactions or angular information. In addition, equivariant end-to-end NNs can also be used to predict vectorial or tensorial properties in a manner similar to the rotational-symmetry-adapted SOAP kernel. Examples include TensorField networks, <ref type="bibr">464</ref> Cormorant, <ref type="bibr">463</ref> DimeNet, <ref type="bibr">487</ref> PiNet, <ref type="bibr">488</ref> and FieldSchNet. <ref type="bibr">300</ref> </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">From Descriptors to Predictions</head><p>After a descriptor vector for each chemical structure is defined, one can then construct the design matrix and the kernel matrix for a set of structures. These matrices can then be used as the input of ML models. As described in section 2, supervised ML methods, such as NNs and GPs, can be used to approximate nonlinear and high-dimensional functions, particularly when massive amounts of training data become available. Thus, one should expect that using CompChem would be very useful for generating a large amount of almost noise-free training data of specific systems or atomic configurations, as long as a physically accurate method is being applied in the right way with appropriate computational resources. In contrast, experimental observations can be difficult to measure and reproduce precisely. Note that the aim of most CompChem +ML efforts have a similar scope as decades-old quantitative structure activity/property relationship (QSAR/QSPR) models that are often based on experiments or CompChem modeling. <ref type="bibr">326,</ref><ref type="bibr">327,</ref><ref type="bibr">489</ref> Thus, researchers in CompChem+ML should be aware of potentially relatable work done by the QSAR/QSPR communities, and to what extent questions being posed have been sufficiently answered. On the other hand, ML usually provides higher accuracy than non-ML Chemical Reviews pubs.acs.org/CR Review statistical models, and so QSAR/QSPR efforts have been turning toward ML models as well. <ref type="bibr">490</ref> We have explained how data from different CompChem methods, each containing different degrees of physical rigor, can be used to train ML models. ML models in turn can be created to approximate underlying high-dimensional functions intrinsic to physical systems. For example, research efforts are toward learning electron densities, <ref type="bibr">491</ref> density functionals, <ref type="bibr">162</ref> and molecular polarizabilities. <ref type="bibr">492</ref> Besides these direct learning strategies, ML has been used to enhance the performance and suitability of CompChem models. As mentioned in section 1, the &#916;-ML <ref type="bibr">493</ref> approach is now a common technique for adapting an ML model that improves the quality of a theoretically insufficient but computationally affordable method. This approach has been used to learn many body corrections for water molecules to allow a relatively inexpensive KS-DFT approach like BLYP to more accurately reproduce CCSD(T) data. <ref type="bibr">494</ref> Along similar lines, Shaw and co-workers used CompChem features along with an NN to reweight terms from an MP2 interaction energy to provide ML-enhanced methods with increased performance. <ref type="bibr">126</ref> Miller and co-workers have developed ML-models where molecular orbitals themselves are learned to generate a density matrix functional that provides CCSD(T)-quality PESs with a single reference calculation. <ref type="bibr">495</ref> von Lilienfeld and coworkers have investigated how the choice of regressors and molecular representations for ML models impacts accuracy, and their findings suggest ways that ML models may be trained to be more accurate and less computationally expensive than hybrid DFT methods. <ref type="bibr">496</ref> Burke and co-workers have studied how ML methods can result in improved understanding and more physical exact KS-DFT <ref type="bibr">181,</ref><ref type="bibr">[497]</ref><ref type="bibr">[498]</ref><ref type="bibr">[499]</ref> and OFDFT functionals. <ref type="bibr">161</ref> Brockherde et al. have presented an approach, where ML models can directly learn the Hohenberg-Kohn map from the one-body potential efficiently to find the functional and its derivative. <ref type="bibr">162,</ref><ref type="bibr">184</ref> Akashi and co-workers have also reported the out-of-training transferability of NNs that capture total energies, which shows a path forward to generalizable methods. <ref type="bibr">500</ref> Toward predictive insights, there are many other approaches that are broadly useful. One can exploit the "universal approximator" nature of ML architectures to find a function that gives the best solution in a variational setting. For instance, using restricted Boltzmann machines <ref type="bibr">501</ref> or deep NNs as a basis representation of wavefunctions <ref type="bibr">105,</ref><ref type="bibr">106,</ref><ref type="bibr">502</ref> in Quantum Monte Carlo calculations. Alternatively, the use of active learning might increase the efficiency, accuracy, scalability, and transferability of ML models. <ref type="bibr">[503]</ref><ref type="bibr">[504]</ref><ref type="bibr">[505]</ref>  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">CompChem Data</head><p>We have laid the general framework for CompChem+ML studies, but this direction would not be complete without more details about training data (i.e., garbage in, garbage out). We now review the landscape of data sets in CompChem and how they will likely evolve over time. The past decade has seen continually increasing usefulness and availabilty of "big data" from CompChem that include community-wide data repositories comprised of millions of atomistic structures along with diverse physical and chemical properties. <ref type="bibr">[506]</ref><ref type="bibr">[507]</ref><ref type="bibr">[508]</ref><ref type="bibr">[509]</ref> Such repositories are becoming the norm, and it is more customary for different users to deposit raw or processed simulation data there for the benefit of the research community. This brings the possibility of robust validation tests for ML models, but it also necessitates approaches that are well-equipped to handle large and complex data sets. Typical data sets may come from diverse origins such as MD trajectories from ab initio simulations, data sets of small molecules and molecular conformers, or other training sets used for developing ML and non-ML FFs for specific applications. As the data sets grow, so do the scope of publications that involve ML as shown in Figure <ref type="figure">1</ref>. 4.3.1. Benchmark Data Sets. ML models must be validated before they can be trusted for predictions. Validations of descriptors or model trainings are performed on benchmark data sets, and several popular ones are summarized in Table <ref type="table">5</ref>. These allow ML models to be compared on the same ground and provide large amounts of data for robust training. Their availability to the public also ensures that the data sets can evolve with time and be extended as a part of community efforts. <ref type="bibr">529</ref> Among the entries in Table <ref type="table">5</ref>, the most often used one is the QM9 set, which consists of approximately 134 000 of the smallest organic molecules that contain up to 9 heavy atoms (C, O, N, or F; excluding H) along with their CompChemcomputed molecular properties such as total energies, dipole moments, HOMO-LUMO gaps, etc. Several ML studies have already been published using this data set (see Figure 8, ref  496). A popular challenge associated with QM9 is to develop a next-generation ML model that learns the electronic energies of random assortments of organic molecules with higher accuracy and less required training data than other existing models. Doing so tests next generation molecular representations and training algorithms. Figure <ref type="figure">8</ref> illustrates how the choice of architecture and descriptors can influence the predictive performance and data efficiency of ML models using different properties of the QM9 data set as examples. The next significant advance will potentially be due to a combination of supervised and unsupervised learning models.</p><p>4.3.2. Visualization of Data Sets. As the structural data sets grow it becomes infeasible to manually identify hidden patterns or curate the data. Data-driven and automated frameworks for visualizing these data sets become increasingly popular. <ref type="bibr">[530]</ref><ref type="bibr">[531]</ref><ref type="bibr">[532]</ref><ref type="bibr">[533]</ref> Dimensionality reduction effectively translates the high dimensional data (i.e., the xyz-coordinates for molecules or materials in different atomic configurations) into a low-dimensional space easily visualized on paper or a computer screen. In this way, entries such as those in the QM9 set can be shown (see Figure <ref type="figure">9</ref>). The KPCA maps in Figure <ref type="figure">9</ref> are based on the dimensionality reduction of the global SOAP descriptors, which are constructed by combining all the atomic SOAP descriptors using eq 30. Each dot represents a small molecule in the QM9 set, and the maps, thus, illustrate the similarity between the molecules, instead of the relations between the carbon atomic environments in Figure <ref type="figure">7</ref>. The maps in Figure <ref type="figure">9</ref> are color-coded using different molecular properties, such as the atomization energies, composition, and optical properties, and these properties are strongly correlated with the principal axes. These KPCA maps are, therefore, an intuitive and condensed way to help navigate the QM9 set. Similarly, ref 321 used SOAP-sketchmaps in conjunction with quasi-chemical theory to visualize similarities in local solvation structures and thus show an unsupervised learning procedure to identify structures that significantly impact solvation energies of small ions.</p><p>Generally speaking, these data-driven maps are generated by processing the design matrix (or kernel matrix) associated with a data set using dimensionality reduction techniques introduced in section 3.2. A simple option is to use the ASAP code, <ref type="bibr">374</ref> a Python-based command line tool, that automates analysis and mapping. Figures <ref type="figure">7</ref> and <ref type="figure">9</ref> were generated using ASAP using only two commands that are displayed in the figure. Data sets can also be explored in an intuitive manner using interactive visualizers <ref type="bibr">534</ref> that run in a web browser and display 3D-structures corresponding to each atomistic structure in the data set.</p><p>4.3.3. Text and Data Mining for Chemistry. Conventional publications are an essential part of any CompChem knowledge base, and ML is becoming useful at accelerating information extraction from the scientific literature via text mining. <ref type="bibr">[535]</ref><ref type="bibr">[536]</ref><ref type="bibr">[537]</ref> This topic was previously comprehensively reviewed in the context of cheminformatics. <ref type="bibr">538,</ref><ref type="bibr">539</ref> Natural language processing has already driven text-mining efforts for materials science discovery <ref type="bibr">538</ref> and experimental synthesis conditions of oxides. <ref type="bibr">528,</ref><ref type="bibr">540</ref> CompChem+ML can also amplify existing efforts in chemometrics, <ref type="bibr">541</ref> the science of data-driven extraction of chemical information. <ref type="bibr">542</ref> This area has also branched into related disciplines of data mining for specific classes of materials <ref type="bibr">543</ref> and catalysis informatics. <ref type="bibr">544</ref> These approaches have great promise, especially for deriving information and knowledge from data, but it remains challenging to implement these in ways that achieve insight (and true impact).</p><p>Some have shown paths forward for doing so. For example, ML models can obtain knowledge from failed experimental data more reliably than humans who are more susceptible to survivor bias, <ref type="bibr">545</ref> and it can also be used to distill physical laws and fundamental equations using experimental <ref type="bibr">363</ref> and computational data. <ref type="bibr">546</ref> ML models can also be used to reliably predict SMILES representations (a string-based representation of molecular graphs) that allow encoded information to be derived from low-resolution images found in the literature. <ref type="bibr">547</ref> ML models can interpret experimental X-ray absorption near edge structure (XANES) data and predict real space information about coordination environments. <ref type="bibr">548</ref> Likewise, scanning tunneling microscopy (STM) data can be used to classify structural and rotational states on surfaces, <ref type="bibr">549</ref> and name indicators can be used to predict in tandem mass spectrometry (MS/MS) properties. <ref type="bibr">550</ref> In closing, we see exciting opportunities for future applications that complement data and text mining to chemometrics through chemical space.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4.">Transforming Atomistic Modeling</head><p>We previously mentioned that ML can handle large data sets and extract insights while circumventing the high cost of quantum-mechanical calculations by statistical learning. CompChem+ML also has great potential in developing MLPs. Car and Parrinello proposed running MD using electronic-structure methods in 1985. <ref type="bibr">551</ref> These are now mainstream but also quite computationally demanding and normally restricted to small system sizes (&#8764;100 atoms) and short simulation times (&#8764;10 -12 s). Alternatively, accurate atomistic potentials introduced in section 2.2.6 have been developed to allow Monte Carlo and MD simulations, but sufficiently accurate potentials are sometimes not available. MLPs have emerged as way to achieve as high accuracy as KS-DFT or correlated wavefunction methods but with a fraction of the cost. MLPs have been constructed for far-reaching systems from small organic molecules to bulk condensed materials and interfaces. <ref type="bibr">433,</ref><ref type="bibr">552,</ref><ref type="bibr">553</ref> Several of the coauthors of the current review have also written separate review focused more narrowly on this topic, <ref type="bibr">554</ref> and so, we only provide a brief overview here.</p><p>Training an MLP to reproduce a system's PES usually requires generating diverse and high quality CompChem data points that cover the relevant temperature and pressure conditions, reaction pathways, polymorphs, defects, compositions, etc. <ref type="bibr">[555]</ref><ref type="bibr">[556]</ref><ref type="bibr">[557]</ref><ref type="bibr">[558]</ref><ref type="bibr">[559]</ref><ref type="bibr">[560]</ref><ref type="bibr">[561]</ref><ref type="bibr">[562]</ref> After data points comprised of atomic configurations, system energies, and forces are obtained, different methods for constructing MLPs employ either different descriptors (see a list of examples in Table <ref type="table">4</ref>) or different ML architectures to perform interpolations of the full PES. Again, smoothness is an essential feature for any PES, so special considerations are needed to avoid numerical noise that would result in discontinuities. <ref type="bibr">563,</ref><ref type="bibr">564</ref> Kernel method-based MLPs, such as GAP <ref type="bibr">276,</ref><ref type="bibr">565</ref> and sGDML, <ref type="bibr">207,</ref><ref type="bibr">372,</ref><ref type="bibr">566</ref> ensure smoothness by relying on smoothly varying basis functions, but the scaling of kernel-based methods with respect to the number of training points is challenged without reduction mechanisms. <ref type="bibr">396,</ref><ref type="bibr">567</ref> As a much more efficient but somewhat less accurate alternative to GAP, SNAP <ref type="bibr">568</ref> uses the coefficients of the SOAP descriptors and assumes a linear or quadratic relation between energies and the SOAP bispectrum components. <ref type="bibr">569</ref> The most popular MLPs are currently NNbased due to their flexibility and capacity to train based on large amounts data. Among these, ANI <ref type="bibr">511,</ref><ref type="bibr">513</ref> and BPNN <ref type="bibr">274,</ref><ref type="bibr">433,</ref><ref type="bibr">570</ref> potentials use ACSF descriptors as inputs, while Deep NNs, such as SchNet <ref type="bibr">422,</ref><ref type="bibr">434,</ref><ref type="bibr">571</ref> and DeepMD 572 use the coordinates and nuclear charges of atoms. We now focus on a few example applications.</p><p>4.4.1. Predicting Thermodynamic Properties. Many CompChem efforts focus on predicting thermodynamic properties at finite temperatures, such as heat capacity, density, and chemical potential. Although many physical properties are already accessible from MD simulations, doing estimations of free energies that establish the relative stability of different states using electronic structure methods remains difficult. The configurational part of the Gibbs free energy of a bulk system that has N distinguishable particles with atomic coordinates r = {r 1...N }, and the associated potential energy U(r) can be expressed as</p><p>integrated over all possible coordinates r, where k B is the Boltzmann constant. In order to rigorously determine G, one must exhaustively sample the configuration space that has relatively high weight arising from the</p><p>. This normally requires thermodynamic integration or enhanced sampling methods (e.g., umbrella sampling, <ref type="bibr">573</ref> metadynamics, <ref type="bibr">574</ref> or transition path sampling <ref type="bibr">575</ref> ), that require simulation times and scales far beyond what is accessible with MD simulations based on KS-DFT or correlated wavefunction methods.</p><p>However, MLPs have unleashed both limits on the time scale and system size. An early example, <ref type="bibr">576</ref> used an MLP with umbrella sampling <ref type="bibr">573</ref> and the free energy perturbation method <ref type="bibr">577</ref> to reveal the influence of van der Waals corrections on the thermodynamic properties of liquid water. Later, the combination of an MLP trained from hybrid DFT data and free energy methods reproduced several thermodynamic properties of water from quantum mechanics, including the density of ice and water, the difference in melting temperature for normal and heavy water, and the stability of different forms of ice. <ref type="bibr">578,</ref><ref type="bibr">579</ref> Ref 580 employed the DeepMD approach to study the relatively long time-scale nucleation of gallium. MLPs for high-pressure hydrogen provided evidence on how hydrogen gradually turns into a metal in giant planets. <ref type="bibr">581</ref> In all these examples, high accuracy and long time scales were required to model the specific phenomena and reveal physical insights, and it is precisely the combination of CompChem +ML that enables both.</p><p>4.4.2. Nuclear Quantum Effects. As mentioned in section 2.2.5, NQEs of chemical systems having light elements bring challenges for atomistic modeling because the added mobility of lighter atoms in dynamics simulations requires higher computational cost to treat. To make the matter even more complicated, many atomistic potentials (see section 2.2.6), particularly the ones for water or organic molecules, cannot be used to model NQEs, because they often describe colavent bonds as rigid and thus cannot describe the fluctuations of the bond lengths and angles. As a remedy, several studies have been performed by training an MLP using higher rungs of KS-DFT (e.g., hybrid-DFT or meta-GGA) and then using this potential in PIMD simulations. <ref type="bibr">578,</ref><ref type="bibr">[582]</ref><ref type="bibr">[583]</ref><ref type="bibr">[584]</ref> The study of water mentioned in the previous section, which used MLPs trained from hybrid DFT, revealed that NQEs were critical for promoting the hexagonal packing of molecules inside ice that ultimately lead to the 6-fold symmetry of snowflakes. <ref type="bibr">578</ref> Highly data efficient ML potentials can even be trained on reference data at the computationally very expensive quantum-chemical CCSD(T) level of accuracy. For example, the sGDML <ref type="bibr">206,</ref><ref type="bibr">207,</ref><ref type="bibr">585</ref> approach has been shown to faithfully reproduce such FFs for small molecules, which were then used to perform simulations with effectively fully quantized electrons and nuclei.</p><p>4.5. ML for Structure Search, Sampling, and Generation Locating stationary points on the PES is a frequent task in CompChem, since these are needed for explaining reaction kinetics. Explorations for stationary points normally require many energy and force evaluations. ML approaches are being implemented to dramatically accelerate minimum energy as Chemical Reviews pubs.acs.org/CR Review <ref type="url">https://doi.org/10.1021/acs.chemrev.1c00107</ref> Chem. Rev. 2021, 121, 9816-9872</p><p>well as saddle-point optimizations. <ref type="bibr">[293]</ref><ref type="bibr">[294]</ref><ref type="bibr">[295]</ref><ref type="bibr">565,</ref><ref type="bibr">[586]</ref><ref type="bibr">[587]</ref><ref type="bibr">[588]</ref> Bernstein et al. proposed an automated protocol that iteratively explores structural space using a GAP potential. <ref type="bibr">565</ref> Bisbo and Hammer employed an actively learned surrogate model of the PES to perform local relaxations while only performing single-point quantum-mechanical calculations for selected structures with high values of acquisition. 586 Work in refs 293 and 295-297 accelerated nudged elastic band (NEB) calculations by incorporating a surrogate ML models.</p><p>ML can also dramatically accelerate the challenge of efficiently sampling equilibrium or transition states by accelerating enhanced sampling methods such as umbrella sampling <ref type="bibr">573</ref> and metadynamics. <ref type="bibr">574</ref> These procedures make use of collective variables (CVs) that define a reaction coordinate, and computing the associated free energy surface (FES) amounts to generating the marginal probability distribution in these CVs. Unfortunately, the choice of the CVs is not always clear for specific systems, and ML has shown some promise in guiding their determination. <ref type="bibr">[589]</ref><ref type="bibr">[590]</ref><ref type="bibr">[591]</ref> Another direction is to exploit that ML models can be considered as universal approximators of FESs. <ref type="bibr">592</ref> For example, there are reports of adaptive enhanced sampling methods using a Gaussian Mixture model, <ref type="bibr">593</ref> using an NN architecture to represent the FES <ref type="bibr">594</ref> or the bias function in variational sampling simulations. <ref type="bibr">595</ref> ML methods also offer fundamentally new ways to explore chemical compound and configuration space. Generative models can learn the structural and elemental distribution underlying chemical systems, and once trained, these models can then be used to directly sample from this distribution. It is furthermore possible to bias the generated structures toward exhibiting desired properties, for example, drug activity or thermal conductivity. As a consequence, generative models offer exciting new avenues in drug and materials design. <ref type="bibr">596,</ref><ref type="bibr">597</ref> Generative methods in CompChem include recurrent neural networks (RNNs), which can be used for the sequential generation of molecules encoded as SMILES strings. <ref type="bibr">[598]</ref><ref type="bibr">[599]</ref><ref type="bibr">[600]</ref> Segler et al. demonstrated how such a recurrent model can first learn general molecular motifs and then be fine-tuned to sample molecules exhibiting activity against a variety of medical targets. <ref type="bibr">599</ref> Autoencoders (AE) are another frequently used ML method for molecular generation. AEs learn to transform molecular graphs or SMILES into a low-dimensional feature space and backward. The resulting feature vector represents a smooth encoding of the molecular distribution and can be used to effectively sample chemical space. <ref type="bibr">[601]</ref><ref type="bibr">[602]</ref><ref type="bibr">[603]</ref><ref type="bibr">[604]</ref><ref type="bibr">[605]</ref><ref type="bibr">[606]</ref> By applying a variational AE to the QM9 and ZINC databases, Gomez-Bombarelli et al. could generate several optimized functional compounds. <ref type="bibr">607</ref> An interesting extension to AEs are conditional AEs, which not only capture the distribution of molecular structures but also dependencies on various properties. <ref type="bibr">426,</ref><ref type="bibr">608</ref> This makes it possible to directly generate structures exhibiting certain property ranges or combinations without the need for biasing or additional optimization steps. AEs can also form the basis of another approach for exploring chemical space called generative adversarial networks (GANs). <ref type="bibr">609,</ref><ref type="bibr">610</ref> In a GAN, a generator model (often an AE) attempts to create samples that closely match the underlying data, while a discriminator tries to distinguish true from generated samples. These architectures can be enhanced by using RL objectives. RL learns an optimal sequence of actions (e.g., placement of atoms) leading to a desired outcome (e.g., molecule with certain property). This makes it possible to drive generative processes toward certain objectives, allowing for the targeted generation of molecules with particular properties. <ref type="bibr">[611]</ref><ref type="bibr">[612]</ref><ref type="bibr">[613]</ref><ref type="bibr">[614]</ref> RL in general is a promising alternative strategy for generative models, <ref type="bibr">615,</ref><ref type="bibr">616</ref> and they offer the possibility for tight integration into drug design cycles. <ref type="bibr">617</ref> Alternative approaches combine autoregressive models with graph convolution networks. <ref type="bibr">618,</ref><ref type="bibr">619</ref> While these methods use SMILES or graphs to encode molecular structures, generative models have recently been extended to operate on 3D coordinates of molecules and materials. <ref type="bibr">620,</ref><ref type="bibr">621</ref> Gebauer et al. proposed an autoregressive generative model based on the SchNet architecture, called g-SchNet. <ref type="bibr">622</ref> Once trained on the QM9 data set, g-SchNet was able to generate equilibrium structures without the need for optimization procedures. It was further found, that the model could be biased toward certain properties. In another promising approach, Noe et al. used an invertible NN based on normalizing flows to learn the distribution of atomic positions (e.g., sampled from an MD trajectory). This network can then be used to directly sample molecular configurations by sampling from this distribution without performing costly simulations. <ref type="bibr">298</ref> </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.6.">Multiscale Modeling</head><p>Multiscale modeling is a term for including simulation or information from different scales (see Figure <ref type="figure">3</ref>). ML has been introduced into QM/MM-like schemes that enable improved multiscale simulations, <ref type="bibr">300,</ref><ref type="bibr">325,</ref><ref type="bibr">623</ref> and on the side of coarsegraining. <ref type="bibr">624</ref> Different coarse-graining potentials have been developed, <ref type="bibr">625</ref> but the inherent functional form for these potentials relies on CPI as well as trial-and-error procedures. Several works used ML for constructing coarse-grained potentials by matching mean forces. <ref type="bibr">449,</ref><ref type="bibr">450,</ref><ref type="bibr">626,</ref><ref type="bibr">627</ref> In closing, we see promise for incorporating experimental priors into ML models, for instance, using experimental measurements to improve an ML PES by complementing them with experimental data. We are not aware of such efforts for developing highly accurate MLPs beyond the atomic scale, although much work has been done along this line to refine FFs of RNAs and proteins, often incorporating methods from ML, including the maximum entropy approach. <ref type="bibr">628</ref> </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">SELECTED APPLICATIONS AND PATHS TOWARD INSIGHTS</head><p>The central challenge posed at the beginning of this review was how to identify and make chemical compounds or materials having optimal properties for a given purpose. To do so would help address critical and broad issues from pollution to global warming to human diseases. Traditional developments are often slow, expensive, and restricted by nontransferable empirical optimizations, and so efforts have turned to CompChem+ML to alleviate this. <ref type="bibr">515,</ref><ref type="bibr">525,</ref><ref type="bibr">629,</ref><ref type="bibr">630</ref> CompChem+ML are enabling searches through larger areas of chemical space much faster than before. <ref type="bibr">20,</ref><ref type="bibr">[631]</ref><ref type="bibr">[632]</ref><ref type="bibr">[633]</ref><ref type="bibr">[634]</ref> This section is not to extensively review the large amount of work using CompChem+ML in these different areas, but rather to highlight examples of applications that have resulted in notable insights so that others might use these notable works as templates for future efforts.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.">Molecular and Material Design</head><p>Molecules and materials design is usually considered to be an optimization problem. <ref type="bibr">270,</ref><ref type="bibr">426,</ref><ref type="bibr">602,</ref><ref type="bibr">607,</ref><ref type="bibr">635</ref> Thus, a comprehensive understanding of chemical space is needed to identify Chemical Reviews pubs.acs.org/CR Review <ref type="url">https://doi.org/10.1021/acs.chemrev.1c00107</ref> Chem. Rev. 2021, 121, 9816-9872</p><p>compounds with desired properties that are subject to certain required constraints (e.g., a specific thermal stability or a suitable optical gap for absorbing sunlight). Those properties will also depend on many key variables (e.g., constitutive elements, crystal forms, geometrical and electronic characteristics, among others), which make the property prediction complex. <ref type="bibr">531</ref> CompChem calculations as explained in section 2 should provide a continuous description of properties across a continuous representation (i.e., a descriptor or fingerprint) of molecules that is used to map molecular configurations to target properties, and vice versa. ML methods then can be implemented to search large databases to extract structureproperty relationships for designing compounds with specific characteristics. <ref type="bibr">531,</ref><ref type="bibr">[635]</ref><ref type="bibr">[636]</ref><ref type="bibr">[637]</ref> Optimizations would then be performed on the structure-based function learned from training configurations, and the composition of the chemical compound would then be recovered back from the continuous representation.</p><p>As a protoypical example of molecular design via highthroughput screening, Gomez-Bombarelli et al. <ref type="bibr">632</ref> showed a computation-driven search for novel thermally activated delayed fluorescence organic light-emitting diode (OLED) emitters. That work first filtered a search space of 1.6 million molecules down to approximately 400 000 candidates using ML to anticipate criteria for desirable OLEDs. For the purpose of evaluating candidates, they estimated an upper bound on the delayed fluorescence rate constant (k TADF ). TD-DFT calculations were then used to provide refined predictions of specific properties of thousands of promising novel OLED molecules across the visible spectrum so that synthetic chemists, device scientists, and industry partners would be able to choose the most promising molecules for experimental validation and implementation. Notably, this example of CompChem+ML resulted in new devices that exhibited an external quantum efficiency of over 22%. Figure <ref type="figure">10</ref> shows the high accuracy of ML in predicting useful properties for high-throughput screening of molecules and materials based on k TADF calculations. This work exemplifies how ML can accelerate the design of novel compounds in such a way that could not be possible using traditional CompChem methods alone.</p><p>Integrations of features relevant to learning tasks allow one to improve the accuracy of ML predictions for a given target property. Park and Wolverton 638 improved the performance of the crystal graph convolution neural network (CGCNN) <ref type="bibr">639</ref> by adding to the original framework information about the Voronoi tessellated crystal structures, which are explicit 3-body correlations of neighboring constituent atoms, and an optimized representation of interatomic bonds. The new approach that was labeled as iCGCNN achieved a predictive accuracy 20% higher than that of the original CGCNN when determining thermodynamic stabilities of compounds (i.e., predictions of hull distances). When used for high-throughput searches, iCGCNN exhibited a success rate higher than an undirected high-throughput search and higher than that of CGCNN. Figure <ref type="figure">11</ref> shows the improvement in predictions of nearly stable compounds after using more appropriate descriptors. This study showcases how descriptors can be tailored to further enhance the success of ML-aided highthroughput screening.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.">Retrosynthetic Technologies</head><p>A grand challenge in chemistry is to understand synthetic pathways to desired molecules. <ref type="bibr">640,</ref><ref type="bibr">641</ref> Retrosynthesis involves the design of chemical steps to produce molecules and materials that would be crucial to drug discovery, medicinal chemistry, and materials science. As a different kind of optimization problem, the general tactic is to analyze atomic scale compounds recursively, map them onto synthetically achievable building blocks, and then assemble those blocks into the desired compound. <ref type="bibr">[642]</ref><ref type="bibr">[643]</ref><ref type="bibr">[644]</ref> Three main issues make retrosynthesis a formidable intellectual challenge. <ref type="bibr">645</ref> First, simple combinatorics make the space of possible reactions greater than the space of possible molecules. Second, reactants seldom contain only one reactive functional group, and thus require predictions of multiple functional groups. Third, one failed step in the route can invalidate the entire synthesis because organic synthesis is a multistep process.</p><p>Given these challenges, ML is becoming more established in determining reaction rules from CompChem data. <ref type="bibr">641</ref> Computer-aided synthesis planning was actually first attempted in the 1960s. <ref type="bibr">646</ref> Many have since attempted to formalize chemical perception and synthetic thinking using computer programs. <ref type="bibr">[647]</ref><ref type="bibr">[648]</ref><ref type="bibr">[649]</ref> These programs are typically based on one of three possible algorithms: <ref type="bibr">649</ref> 1. Algorithms that use reaction rules (manually encoded or automatically derived from databases). 2. Algorithms that use principles of physical chemistry based on ab initio calculations to predict energy barriers. 3. Algorithms based on ML techniques. ML approaches are used to try to overcome the generalization issues of rule-based algorithms (that normally suffer from incompleteness, infeasible suggestions, and human bias) while also avoiding the high cost of CompChem calculations. It is now possible to obtain purely data-driven approaches for synthesis planning, which are promoting a rapid advancement in the field. For example, Coley and co-workers <ref type="bibr">650</ref> designed a data-driven metric, SCScore, for describing a real synthesis modeled after the idea that products are, on average, more Mater. 2016, 15, 1120-1127. <ref type="bibr">632</ref> Copyright 2016 Springer Nature, Nature Materials. synthetically complex than each of their reactants. The definition of a metric for selecting the most promising disconnections that produce easily synthesizable compounds is crucial for avoiding combinatorial explosions. Figure <ref type="figure">12</ref> shows that a data-driven metric, the SCScore, is more suitable than other heuristic metrics to perceive the complexity of each step in a given synthesis. This work offered a valuable contribution to the retrosynthesis working pipeline by providing a method that implicitly learns what structures and motifs are more prevalent as reactants.</p><p>Apart from isolated approaches or algorithms to deal with specific tasks within retrosynthesis, there is already software available to advance this field. One example is the Chematica program, <ref type="bibr">651</ref> which has implemented a new module that combines network theory, modern high-power computing, AI, and expert chemical knowledge to design synthetic pathways. A scoring function is used to promote synthetic brevity and penalize any reactivity conflicts or nonselectivities, thus allowing it to find solutions that might be hard for a human to identify. Figure <ref type="figure">13A</ref> shows the decision tree for one of the almost 50 000 reaction rules used in Chematica. Reaction rules can be considered as the allowed moves from which the synthetic pathways are built, and such moves lead to an enormous synthetic space (the number of possibilities within n steps scales as 100 n ) as the one shown by the graph in Figure <ref type="figure">13B</ref>. Chematica explores this large synthetic space by truncating and reverting from unpromising connections and drives its searches to the most efficient sequences of steps. Moreover, in the pathways presented to the user, each substance can be further analyzed with molecular mechanics tools. This software was used to obtain insights into the synthetic pathways to eight targets (seven bioactive substances and one natural product). All of the computer-planned routes were not only successfully carried out in the laboratory, but they also resulted in improved yields and cost savings over previous known paths. This work opened an avenue for chemists to finally obtain reliable pathways from in silico retrosynthesis. For further reading we recommend the two-part reviews of Coley and co-workers. <ref type="bibr">652,</ref><ref type="bibr">653</ref> </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3.">Catalysis</head><p>Catalysis research involves understanding how to impact chemical product yields and selectivities. <ref type="bibr">654</ref> Traditional catalysis is normally discussed in textbooks in terms of homogeneous (i.e., within a solution phase), heterogeneous (occurring at a solid/liquid interface), and biological (occurring within enzymes and riboenzymes), but it is best not to use these terms too strictly because actual reaction  mechanisms can be quite complex and overall processes may sometimes exhibit characteristics of two or more of these classical processes. <ref type="bibr">[655]</ref><ref type="bibr">[656]</ref><ref type="bibr">[657]</ref> Modern research in catalysis has been interested in studying chemical reactivity and reaction selectivity arising from stimuli from solar-thermal energy, <ref type="bibr">658,</ref><ref type="bibr">659</ref> electrochemical potentials, <ref type="bibr">660</ref> photons, <ref type="bibr">[661]</ref><ref type="bibr">[662]</ref><ref type="bibr">[663]</ref><ref type="bibr">[664]</ref> plasmas, <ref type="bibr">665,</ref><ref type="bibr">666</ref> or other external resonances. <ref type="bibr">667</ref> Catalysis makes up roughly 35% of the world's gross domestic product, <ref type="bibr">668</ref> and it is important to guide toward the end goal of achieving greater sustainability with catalytic processes. <ref type="bibr">[669]</ref><ref type="bibr">[670]</ref><ref type="bibr">[671]</ref> These reasons help make catalysis a fertile training ground for applying and developing theoretical models (e.g., refs 672-674) that can be used along with CompChem or CompChem+ML. The research field is also burgeoning with many reports and review articles 544,675-679 that discuss perspectives and progress using ML methods for catalysis science. Here, we will mention notable examples. For example, CompChem+ML methods are enabling more data generation by allowing costly CompChem calculations to be run more efficiently, and more information means more comprehensive predictions of chemical and materials phase diagrams for catalysis <ref type="bibr">680,</ref><ref type="bibr">681</ref> as well as stability and reactivity descriptors identified on the fly. <ref type="bibr">[682]</ref><ref type="bibr">[683]</ref><ref type="bibr">[684]</ref><ref type="bibr">[685]</ref><ref type="bibr">[686]</ref> Figure <ref type="figure">14</ref> shows examples of the palettes of insight available using state-of-the-art CompChem +ML modeling for identifying activity and selectivity maps, as well as visualizations of data using t-SNE. <ref type="bibr">687</ref> Regarding modeling of deeply complex chemical environments, Artrith and Kolpak developed MLPs for investigating the relationships between solvent, surface composition and morphology, surface electronic structure, and catalytic activity  in systems composed of thousands of atoms interfaces. <ref type="bibr">689</ref> We expect such simulations for electro-and photocatalysis elucidation will continue to improve in size, scale, and accuracy. For other physical insights, new approaches by Kulik, Getman, and co-workers have also focused on developing ML models appropriate for elucidating complex d-orbital participation in homogeneous catalysis. <ref type="bibr">690</ref> Rappe and co-workers have used regularized random forests to analyze how local chemical pressure effects adsorbate states on surface sites for the hydrogen evolution reaction. <ref type="bibr">691</ref> Almost trivially simple ML approaches can be used in catalysis studies to deduce insights into interaction trends between single metal atoms and oxide supports, <ref type="bibr">692</ref> to identify the significance of features (e.g., adsorbate type or coverage), where CompChem theories break down, <ref type="bibr">693</ref> or they can be used to identify trends that result in optimal catalysis across multiple objectives, such as activity and cost (Figure <ref type="figure">15</ref>). <ref type="bibr">694</ref> ML is also opening opportunities for CompChem+ML studies on highly detailed and complex networks of reactions. <ref type="bibr">[695]</ref><ref type="bibr">[696]</ref><ref type="bibr">[697]</ref><ref type="bibr">[698]</ref><ref type="bibr">[699]</ref><ref type="bibr">[700]</ref> Such models in principle can then significantly extend the range of utility of microkinetics modeling for predictions of products from catalysis. <ref type="bibr">701,</ref><ref type="bibr">702</ref> ML also enables studies of complicated reaction networks that can allow predictions of regioselective products based on CompChem data, <ref type="bibr">703</ref> asymmetric catalysis important for natural product synthesis, <ref type="bibr">704,</ref><ref type="bibr">705</ref> and biochemical reactions. <ref type="bibr">706</ref> Efforts to better understand "above-the-arrow" optimizations of reaction conditions relate back to the challenge of retrosynthetic challenges. <ref type="bibr">707,</ref><ref type="bibr">708</ref> Ideally, these efforts will continue while making use of rapid advances in CompChem+ML that enable predictive atomistic simulations to be run faster and more accurately. We see reason for excitement for different approaches, but we again stress the importance of ensuring that models will provide unique and physical results (see section 3 where we discuss the risk of "clever Hans" predictors 360 ).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.4.">Drug Design</head><p>The central objective for drug discovery is to find structurally novel molecules with precise selectivity for a medicinal function. This involves identifying new chemical entities and obtaining structures with different physicochemical and polypharmacological properties (i.e., combinations of beneficial pharmacological effects or adverse side-effects). <ref type="bibr">709,</ref><ref type="bibr">710</ref> Drug discovery involves the identification of targets (a property optimization task, as in material design) and the determination of compounds with good on-target effects and minimal offtarget effects. <ref type="bibr">711</ref> Traditionally, a drug discovery program may take around six years before a drug candidate can be used in clinical trials, and six or seven more years are required for three clinical phases. Thus, it is important to identify adverse effects as soon as possible to minimize time and monetary costs. <ref type="bibr">712</ref> Accelerating drug discovery relies on predicting how and where a certain drug binds to more than one protein, a phenomenon that sometimes results in polypharmacology. Researchers are developing ready-to-use tools aimed to facilitate research for drug discovery, <ref type="bibr">713</ref> but CompChem+ML is expected to continue providing even more benefits to the drug development pipeline. <ref type="bibr">714</ref> In a recent study, Zhavoronkov et al. <ref type="bibr">617</ref> developed a deep generative model for de novo small-molecule design: the generative tensorial reinforcement learning (GENTRL) model that was used to discover potent inhibitors of discoidin domain receptor 1 (DDR1), a kinase target implicated in fibrosis and other diseases. The drug discovery process was carried out in only 46 days, beginning with the recollection of appropriate data for training and finishing with the synthesis and experimental test of some compounds (Figure <ref type="figure">16A</ref>). GENTRL was used to screen a total of 30 000 structures (some examples compared to the parent DDR1 kinase inhibitor are shown in Figure <ref type="figure">16B</ref>) down to only 40 structures that were randomly selected ensuring a coverage of the resulting chemical space and distribution of root-mean squared deviation values. Six of these molecules were then selected for experimental validation (see Figure <ref type="figure">16C</ref>), with one of them demonstrating favorable pharmacokinetics in mice. The predicted conformation of the successful compound according to pharmacophore modeling was very similar to the one predicted to be preferred and stable by CompChem methods. This work illustrates the utility of CompChem+ML approaches to give insights into drug design by rapidly giving compound candidates that are synthetically feasible and active against a desired target.</p><p>Besides generating new chemical structures with favorable pharmacokinetics, ML methods are also used in pharmaceutical research and development for peptide design, compound activity prediction and for assisting scoring protein-ligand interaction (docking). <ref type="bibr">709,</ref><ref type="bibr">[715]</ref><ref type="bibr">[716]</ref><ref type="bibr">[717]</ref> An example of the latter was proposed by Batra et al. <ref type="bibr">718</ref> for efficiently identifying ligands that can potentially limit the host-virus interactions of SARS-CoV-2. Those authors designed a high-throughput strategy based on CompChem+ML that involved high-fidelity docking studies to find candidates displaying high-binding affinities. The ML model was used to search through thousands of approved ligands by the Food and Drug Administration (FDA) and a million biomolecules in the BindingDB database. <ref type="bibr">514</ref> From these, insights were obtained for more than 19 000 molecules satisfying the Vina score (i.e., an important physicochemical measure of the therapeutic process of a molecule that is used to rank molecular conformations and predict free energy of binding). Figure <ref type="figure">17</ref> shows the Vina score predictions that led to the selection of the best candidates, some of which are also illustrated in the figure. The Vina scores for the top ligands were further confirmed using expensive docking approaches, resulting in the identification of 75 FDAapproved and 100 other ligands potentially useful to treat SARS-CoV-2. This study highlights a reasonable CompChem +ML strategy for making useful suggestions to aid expert biologists and medical professionals to focus in fewer candidates when performing either robust CompChem efforts or synthesis and trial experiments.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">CONCLUSIONS AND OUTLOOK</head><p>Recent CompChem methods, algorithms, and codes have empowered new studies for a wealth of physical and chemical insights into molecules and materials. Today, the combination of CompChem+ML can be equipped to address new and more challenging questions in different domains of physics, materials science, chemistry, biology, and medicine. Productive research efforts in this direction necessitate interdisciplinary teams and increasing availability of high-quality data across appropriate regions of chemical compound space. Discovering new chemicals and materials requires thorough investigations. One needs to predict reaction pathways and interactions between molecules, optimize environmental conditions for catalytic reactions, enhance selectivities that eliminate undesired side reactions or side effects, and navigate other system-specific degrees of freedom. Addressing this complexity calls for a statistical view on chemical design and discovery, and CompChem+ML provides a natural synergy for obtaining predictive insights to lead to wisdom and impact.</p><p>This Review provided a bird's-eye view of CompChem and ML and how they can be used together to make transformative impacts in the chemical sciences. The successes of CompChem +ML are particularly visible in physical chemistry and include drastic acceleration of molecular and materials modeling, discovery and prediction of chemicals with desired properties, prediction of reaction pathways, and design of new catalysts and drug candidates. Nevertheless, we have only begun to scratch the surface of how successful applications of ML in chemistry can bring impact. There are many conceptual, theoretical, and practical challenges waiting to be solved to enable further synergies within the troika of CompChem, ML, and CPI. Here we enumerate some of the challenges that we consider to be the most pressing and interesting at this moment:</p><p>1. Reliance on ML in CompChem algorithms must be increased: ML algorithms can be integrated into CompChem algorithms at almost any simulation level (Figure <ref type="figure">3</ref>). ML algorithms are already available to accelerate calculations of CompChem energies, navigations along reaction pathways, and sampling of larger regions of the PES, but the reluctance of their use impedes progress. In general, these algorithms must be made more effective, efficient, accessible, user-friendly, and reproducible to benefit fundamental and applied research (see for example, ref 719.).</p><p>2. More general ML approaches are needed: ML methods must continue to evolve beyond now-common applications of learning a narrow region of a PES or identifying straightforward structure/property relationships. New ML methods should have the capacity to predict energetic and electronic properties and their more convoluted relationships across chemical space. Such approaches should grow toward uniformly describing compositional (chemical arrangement of atoms a molecule) and configurational (physical arrangement of atoms in space) degrees of freedom on equal footing. Further progress in this field requires developing new universal ML models suitable for insights across diverse systems and physicochemical properties. 3. ML representations must include the right physics: ML methods that are claimed to be accurate but incorrectly describe the true physics of a system will eventually fail to achieve meaningful insights while lowering the reputation of other work in the field. Current ML representations (descriptors) can successfully describe local chemical bonding, but few if any are treating longrange electrostatics, polarization, and van der Waals dispersion interactions that are critical for rationalizing physical systems, both large and small. Combining intermolecular interaction theory (a key focus of advanced CompChem methods) with ML is an important direction for future progress toward studying complex molecular systems. 4. CompChem + ML applications need to strive toward achieving realistic complexity: Investigations using highly accurate CompChem methods normally require overly simplified model systems while more realistic model systems necessitate less accurate but computationally efficient CompChem methods. This compromise should no longer be necessary. We are due for a paradigm shift in how thermodynamics, kinetics, and dynamics of systems in complex chemical environments (e.g., for multiscale biological processes like drug design and/or catalytic processes at solid-liquid interfaces under photochemical excitations, etc.) can be treated more faithfully with less corner-cutting. An emerging idea is to dispatch ML approaches into computationally efficient model Hamiltonians for electronic interactions based on correlated wavefunction, KS-DFT, tight-binding, molecular orbital techniques, and/or the many-body dispersion method. ML can predict Hamiltonian parameters and the quantum-mechanical observables would be calculated via diagonalization of the corresponding Hamiltonian. The challenge is to find an appropriate balance between prediction accuracy and computational efficiency to dramatically enhance larger scale simulations. 5. Much more experimental data is needed: Validations of ML predictions require extensive comparisons with experimental observables such as reaction rates, spectroscopic observations, solvation energies, and melting temperatures. Such experiments may have previously been considered too routine, too mundane, or not insightful enough alone, but all high quality brings great value for future CompChem+ML efforts that tightly integrate quantum mechanics, statistical simulations, and fast ML predictions, all within a comprehensive molecular simulation framework. 720 6. Much more comprehensive data sets need to be assembled and curated: Current CompChem+ML efforts have profited heavily by the availability of benchmark data sets for relatively small molecules that allow a comparison of existing models. <ref type="bibr">413,</ref><ref type="bibr">527</ref> While efforts fixated on boosting prediction accuracies and shrinking down requisite training set sizes for ML models have had their merits, it is time to move on as further improvements are meaningless if the ML models are not making useful and insightful predictions themselves. More useful predictions will require knowledge from larger data sets, and these will inevitably contain heterogeneous combinations of different levels of theory or experiments that must be analyzed, "cleaned", and uncertainties adequately quantified for models to productively learn. Such hybrid data sets may be the key to arrive at novel hypotheses in chemistry that could then be experimentally tested. 7. Bolder and deeper explorations of chemical space are needed: So far most efforts to generate chemical data have focused on exploring parts of chemical space for new compounds for a targeted purpose. This should change. Combining ML model uncertainty estimates across broader swaths of chemical space could open pathways for fruitful statistical explorations, say, in an active learning framework. This could lead to discovering new synergies between data that otherwise would not have been possible to enable advances in scientific understanding and improve ML models. Generative models can bridge the gap between sampling and targeted structure generation imposing optimal compound properties, for example, for inverse chemical design. <ref type="bibr">125,</ref><ref type="bibr">621,</ref><ref type="bibr">622</ref> This and other reviews <ref type="bibr">20,</ref><ref type="bibr">554,</ref><ref type="bibr">634,</ref><ref type="bibr">[720]</ref><ref type="bibr">[721]</ref><ref type="bibr">[722]</ref><ref type="bibr">[723]</ref><ref type="bibr">[724]</ref> have stated how ML has become instrumental for recent progress in CompChem. We would like to also mention inspirations that ML has drawn from being applied to physical and chemical problems.</p><p>ML methods generally assume that data is subject to measurement noise while CompChem data is generally approximate but also noise-free from a statistical perspective. ML modeling still requires regularization, but regularizers should reflect the underlying physics of molecular and Chemical Reviews pubs.acs.org/CR Review materials systems. ML models used in applications of vision contain discrete convolution filters that are suboptimal for chemical modeling, but recognition of this shortcoming has led to novel continuous convolution filters that are well suited for chemistry and have also become a popular novel architecture for core ML methods. <ref type="bibr">434</ref> Furthermore, invariances, symmetries, and conservation laws are key ingredients to physical and chemical systems. Incorporating them into ML has led to novel and useful models for chemistry since they can learn from significantly less data, which then makes it possible to build force fields at unprecedentedly high levels of theory. <ref type="bibr">206,</ref><ref type="bibr">207,</ref><ref type="bibr">372</ref> Using these powerful ML techniques for computer vision, natural language processing, and other applications is currently being explored. Structural information from molecular graphs provide basis for novel tensor NNs or message passing architectures, <ref type="bibr">352,</ref><ref type="bibr">421</ref> as well as graph explanation methods. <ref type="bibr">725</ref> Many further challenges exist that have led or will lead to mutual bidirectional cross-fertilization between ML and chemistry. These interdisciplinary efforts also initiate progress in respective application domains. The power of this path is that solving a burning problem in chemistry with a novel crafted ML model may also result in unforeseen insights in how to better design core ML methods. Interestingly, the exploratory usage of ML for knowledge discovery in chemistry typically requires novel ML models and unforeseen scientific innovations, and this can lead to interesting insight that is not necessary limited to chemistry alone, rather it is likely to go beyond.</p><p>To conclude, the past decade has shown that it has not been enough to just apply existing ML algorithms, but breakthroughs are happening by a handshaking of innovations resulting in novel ML algorithms and architectures driven by the pursuit of novel insights in chemistry while retaining a deep understanding about the underlying physical and chemical principles. Research programs that foster interdisciplinary exchange, such as IPAM (<ref type="url">www.ipam.ucla.edu</ref>), have seeded this progress, and these should be continued. Mixed teams with members educated in different aspects of physics, chemistry and ML have been instrumental. This also brings the need to solve the new educational challenge of developing new generations of researchers with an academic curriculum that interweaves chemistry, physics and computer science to enable a meaningful (multilingual) research contribution to this exciting emerging field. Tkatchenko serves on editorial boards of two society journals: Physical Review Letters (APS) and Science Advances (AAAS). He received a number of awards, including elected Fellow of the American Physical Society, the 2020 Dirac Medal from WATOC, the Gerhard Ertl Young Investigator Award of the German Physical Society, and two flagship grants from the European Research Council: a Starting Grant in 2011 and a Consolidator Grant in 2017. His group pushes the boundaries of quantum mechanics, statistical mechanics, and machine learning to develop efficient methods to enable accurate modeling and obtain new insights into complex materials. ACRONYMS ACE atomic cluster expansion ACS American Chemical Society ACSF atom-centered symmetry function AE autoencoders AI artificial intelligence API application programming interfaces BoB bag of bonds BOP bond order potential BP back-propagation CGCNN crystal graph convolution neural network CNN convolutional neural network COSMO conductor-like screening model C-PCM conductor polarizable continuum solvent model CASPT2 complete active space perturbation theory CASSCF complete active space self-consistent field CBS complete basis set CI configuration interaction CMD centroid molecular dynamics CompChem computational chemistry CPI chemical and physical intuition CV collective variable DDR1 discoidin domain receptor 1 DeepPot-SE smooth edition version of the DeepMD potential D-PCM dielectric polarizable continuum solvent model DFT density-functional theory DFTB density functional tight binding DLPNO domain-based local pair natural orbital DMRG density matrix renormalization group theory DTNN deep tensor neural network EAM embedded atom method EANN embedded atom neural network ECP effective core potential FCHL Faber-Christensen-Huang-Lilienfeld FCI full configuration interaction FDA Food and Drug Administration FES free energy surface FF force field FPS farthest point sampling GAN generative adversarial network GENTRL generative tensorial reinforcement learning GGA generalized gradient approximation GP Gaussian processes GPU graphical processing units GVB generalized valence bond HEAT high accuracy extrapolated ab initio thermochemistry HF Hartree-Fock HIP-NN hierarchical interacting particle neural network ICA independent component analysis Chemical Reviews pubs.acs.org/CR Review <ref type="url">https://doi.org/10.1021/acs.chemrev.1c00107</ref> Chem. Rev. 2021, 121, 9816-9872 IEFPCM integral equation formulation of polarizable continuum solvent model KE kinetic energy KPCA kernel principal component analysis KRR kernel ridge regression KS Kohn-Sham LDA local density approximation LJ Lennard-Jones MBTR many-body tensor representation MD molecular dynamics MEAM modified embedded atom method ML machine learning MLP machine learning potential MPNN message-passing neural network MRCC multireference coupled cluster MRCI multireference configuration interaction MS/MS tandem mass spectroscopy NDDO neglect of diatomic differential overlap NEB nudged elastic band NMR nuclear magnetic resonance NN neural network NQE nuclear quantum effect OF orbital-free OLED organic light-emitting diode PCA principal component analysis PCM polarizable continuum solvent model PES potential energy surface PIMD path integral molecular dynamics QM quantum mechanics QSAR/QSPR quantitative structure activity/property relationship RE-Match regularized entropy match RI resolution of the identity RISM reference interaction site model RL reinforcement learning RMSD root mean squared displacement RNN recurrent neural network SCRF self-consistent reaction field SOAP smooth overlap of atomic positions STM scanning tunneling microscopy SVM support vector machine t-SNE t-distributed stochastic neighbor embedding TD time-dependent UMAP uniform manifold approximation and projection XAI explainable artificial intelligence XANES X-ray absorption near edge structure</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>https://doi.org/10.1021/acs.chemrev.1c00107 Chem. Rev. 2021, 121, 9816-9872</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_1"><p>&#8730; &#8730; X &#8730; similarity metrics root mean square deviation of atomic positions (RMSD) 454 &#9398; 1,2-body terms, input matching X X &#9675; &#9675; X &#8730; X overlap matrix 454 &#9398; 1,2-body terms, input matching X X &#8730; &#8730; &#8730; &#8730; X</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_2"><p>&#8730;&#9675; f &#8730; &#8730; a "&#8730;" = satisfies condition; "&#9675;" = partially satisfies condition; "X" = does not satisfy condition. b Computational efficiency ranks with grades &#9398;-&#9401; in descending order. The efficiency class reflects the extent that the descriptor requires expensive operations (e.g., a hierarchical processing or matching of inputs). c Descriptor has been used within periodic boundary conditions. d "T" = translational; "R" = rotational; "P" = permutational. e In this context, a descriptor is referred to as smooth if its first derivative with respect to nuclear positions is continuous. f Only invariant to permutations represented in the training data.</p></note>
		</body>
		</text>
</TEI>
