<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Composition design of high-entropy alloys with deep sets learning</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>12/01/2022</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10323134</idno>
					<idno type="doi">10.1038/s41524-022-00779-7</idno>
					<title level='j'>npj Computational Materials</title>
<idno>2057-3960</idno>
<biblScope unit="volume">8</biblScope>
<biblScope unit="issue">1</biblScope>					

					<author>Jie Zhang</author><author>Chen Cai</author><author>George Kim</author><author>Yusu Wang</author><author>Wei Chen</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Abstract            High entropy alloys (HEAs) are an important material class in the development of next-generation structural materials, but the astronomically large composition space cannot be efficiently explored by experiments or first-principles calculations. Machine learning (ML) methods might address this challenge, but ML of HEAs has been hindered by the scarcity of HEA property data. In this work, the EMTO-CPA method was used to generate a large HEA dataset (spanning a composition space of 14 elements) containing 7086 cubic HEA structures with structural properties, 1911 of which have the complete elastic tensor calculated. The elastic property dataset was used to train a ML model with the Deep Sets architecture. The Deep Sets model has better predictive performance and generalizability compared to other ML models. Association rule mining was applied to the model predictions to describe the compositional dependence of HEA elastic properties and to demonstrate the potential for data-driven alloy design.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>INTRODUCTION</head><p>High-entropy alloys (HEA) are a new class of multi-principal element materials with diverse and fascinating structure-property relationships <ref type="bibr">1</ref> . The term "high entropy" was coined based on the idea that a single solid-solution phase can be stabilized with a high configurational entropy associated with the random mixing of multiple elements at similar atomic fractions <ref type="bibr">2</ref> . Numerous studies have revealed a wide variety of structural and functional properties <ref type="bibr">[2]</ref><ref type="bibr">[3]</ref><ref type="bibr">[4]</ref><ref type="bibr">[5]</ref><ref type="bibr">[6]</ref><ref type="bibr">[7]</ref> in this large space of materials, ranging from cryogenic ductility <ref type="bibr">8</ref> , high strength <ref type="bibr">9,</ref><ref type="bibr">10</ref> , corrosion resistance <ref type="bibr">11,</ref><ref type="bibr">12</ref> , to excellent wear behavior <ref type="bibr">13</ref> , and thermoelectric properties <ref type="bibr">14</ref> . High-entropy materials represent a fast-growing field in materials research, covering both alloys and ceramics such as high-entropy nitrides, carbides, and oxides <ref type="bibr">15,</ref><ref type="bibr">16</ref> , and have many important applications such as protective coatings and energy storage <ref type="bibr">17</ref> .</p><p>Designing a HEA often involves painstaking experimental and computational studies using a trial-and-error approach to explore the astronomical composition space. The experimental approach involves expensive and time-consuming processes of synthesis, characterization, and analysis, which often studies only a few candidate compositions at a time. Even with the explosive growth of computing power, high-throughput density functional theory (DFT) calculations are still incapable of computing the complete multi-element alloy space. Empirical models that predict the general trends in the multi-principal element space are instrumental in accelerating the exploration of the HEA composition space. For example, Senkov et al. performed high-throughput CALPHAD calculations and concluded that more elements do not necessarily stabilize solid-solution phases in HEAs <ref type="bibr">18</ref> . Lederer et al.  proposed the "LVTC" model that can accurately predict the transition temperatures of solid solution HEAs <ref type="bibr">19</ref> . By employing the state-of-the-art machine learning (ML) algorithms, it is possible to explore the high-dimensional composition space much more efficiently <ref type="bibr">20</ref> . However, the application of ML on HEA studies is often hindered by the scarcity of HEA property data, especially quality experimental data. Two recently compiled experimental HEA datasets have the phase composition for 401 HEAs <ref type="bibr">4</ref> and mechanical properties for 630 HEAs <ref type="bibr">21</ref> . These datasets have enabled the development of predictive models on the phase selection rules by ML methods such as artificial neural network <ref type="bibr">22</ref> and Gaussian process classification <ref type="bibr">23</ref> . Nonetheless, ML models trained with small datasets usually do not generalize well. In many cases, researchers have to limit the scope of their ML models to specific alloy systems <ref type="bibr">24,</ref><ref type="bibr">25</ref> .</p><p>The goal of this study is to integrate high-throughput firstprinciples calculations and the Deep Sets architecture to understand the effects of elemental combinations on the HEA properties over a broad composition space. We choose the elastic properties of HEAs as a case study. Elasticity describes the resistance for deformation between atoms before yielding, providing a critical starting point to study the mechanical properties of HEAs <ref type="bibr">26</ref> . For instance, the ductility of an alloy can be estimated using the Pugh's ratio which can be tailored by doping HEAs <ref type="bibr">27</ref> . A perfect elastic isotropy has also been achieved in HEAs by composition design with the aim of controlling the deformation behavior <ref type="bibr">28,</ref><ref type="bibr">29</ref> . First-principles DFT calculations are predictive methods that can reliably compute the elastic constants of the HEAs <ref type="bibr">28,</ref><ref type="bibr">[30]</ref><ref type="bibr">[31]</ref><ref type="bibr">[32]</ref> . DFT calculations with coherent potential approximation (CPA) and supercell methods usually give similar results on the elastic moduli of disordered HEA sytems <ref type="bibr">26</ref> . Our recent study on the elasticity of Al 0.3 CoCrFeNi HEA also proved that first-principles and ML methods can give accurate predictions of HEA elasticity that are comparable to neutron diffraction measurements. Owning to the rapid development of highthroughput first-principles calculations <ref type="bibr">33</ref> , accurate elasticity data for ordered inorganic structures are readily available for datadriven materials research <ref type="bibr">34,</ref><ref type="bibr">35</ref> . However, similar datasets are not available for HEAs yet. Many studies still rely on estimations with the Vegard's law to predict the HEA elastic modulus as a compositionally weighted average of elemental properties <ref type="bibr">36,</ref><ref type="bibr">37</ref> .</p><p>In this study, we quantitively map the elastic properties of a 14element HEA space with the integration of high-throughput firstprinciples calculations, deep learning, and association-rule analysis (Fig. <ref type="figure">1</ref>). We generate a dataset of 3579 quaternary HEA compositions from high-throughput DFT calculations with the exact muffin-tin orbitals and coherent potential approximation (EMTO-CPA) method <ref type="bibr">38,</ref><ref type="bibr">39</ref> . Recent progress in graph representation learning <ref type="bibr">[40]</ref><ref type="bibr">[41]</ref><ref type="bibr">[42]</ref><ref type="bibr">[43]</ref><ref type="bibr">[44]</ref> for link prediction <ref type="bibr">45</ref> , graph classification <ref type="bibr">46</ref> , physics simulation <ref type="bibr">47</ref> , and combinatorial optimization <ref type="bibr">48,</ref><ref type="bibr">49</ref> , has produced a growing body of literature applying graph neural networks on property prediction for materials <ref type="bibr">50,</ref><ref type="bibr">51</ref> and molecules <ref type="bibr">52,</ref><ref type="bibr">53</ref> . One notable challenge when applying graph representation learning to HEAs is that they usually contain simple geometrical lattice but randomness in the elemental site occupancy. Representing HEAs as neighborhood graphs <ref type="bibr">50</ref> is inefficient because the final representation will be the specific instantiation of the underlying random configurations. Conventional ML architectures use either engineered elemental properties (e.g., different types of means) or compound properties as features. Directly using elemental properties as features introduces permutation variance in the model making predictions dependent on the specific order of the elements in the feature vector. The feature engineering of elemental properties can also be tedious and inefficient. If there were data for a very large number of HEA configurations, a neural network would theoretically be able to capture the permutation invariance of elements. However, the amount of available material property data is often insufficient in reality.</p><p>To overcome this problem, we represent HEAs as sets of elements and employ the Deep Sets 54 architecture for predicting elastic properties. Deep Sets is a recently developed deep learning architecture that can represent any invariant function over a set. Compared with other ML models, our Deep Sets models show superior predictive performance in the broad HEA space. We further perform association-rule analysis to understand the trends of elemental effects on the elastic properties of HEAs and leverage these insights on the composition design of HEAs with targeted properties. Our study showcases an efficient, accurate, and generalizable approach to study multi-element materials systems.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>RESULTS</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Validations of HEA property predictions from EMTO-CPA calculations</head><p>In total, the equation of state (EOS) and bulk modulus (B) of 7086 cubic quaternary HEA phases were successfully calculated, corresponding to 3579 compositions, 75 of which contained Ta. Due to the lack of enough Ta containing compositions being successfully calculated, Ta containing compositions were not used in the training of the Deep Sets model. Of the 996 equimolar compositions completed, 962 were more stable in the bodycentered cubic (BCC) phase. The elastic properties of all 996 compositions were calculated using the stable phase for each composition. It should be noted that there are 1001 possible quaternary equimolar compositions given 14 elements, but the following compositions AlCuMnNi, CoCuNiZr, AlCrMoV, CoCrNbNi, and CoCuFeHf are missing from the dataset due to convergence issues. Of the 2508 non-equimolar compositions that were completed, 2331 compositions were more stable in the BCC phase. The cubic elastic constants were calculated for 840 nonequimolar compositions whose formation energy is lower than 0.15 eV/atom. To our knowledge, this is the largest dataset of HEA structures with calculated stability and elastic property information with 7086 structures and over 3579 compositions. Additionally, a set of 264 structures (132 compositions) were separately calculated for validation. The elastic property, crystal structure information, and total energy of each structure is organized into JSON files using key-value pairs; Supplementary Table <ref type="table">1</ref> shows the labels used as keys, and a description of the values.</p><p>The high-throughput EMTO-CPA results were validated by comparing with reported experimental and computational results in literature. As shown in Supplementary Table <ref type="table">2</ref>, the EMTO-CPA predicted preferred cubic-phase type and lattice parameters agree well with reported experimental results <ref type="bibr">[55]</ref><ref type="bibr">[56]</ref><ref type="bibr">[57]</ref><ref type="bibr">[58]</ref><ref type="bibr">[59]</ref><ref type="bibr">[60]</ref><ref type="bibr">[61]</ref><ref type="bibr">[62]</ref><ref type="bibr">[63]</ref><ref type="bibr">[64]</ref> . EMTO-CPA gives the correct phase for all HEA systems and a small mean absolute error Fig. <ref type="figure">1</ref> Data-driven workflow to map the elastic properties of the high-entropy alloy space. A dataset of the elastic properties of quaternary HEA compositions, containing elements from the set of the 14 highlighted elements, is created using high-throughput EMTO-CPA calculations. After training Deep Sets models on this dataset, association rule mining is used on Deep Sets model predictions to discover trends between elemental combinations and elastic properties. These trends are in the form of association rules which are then visualized using network graphs. Deep Sets neural network architecture explanation: Each element feature vector in the input set is transformed by the same mapping function, &#981;. The resulting vectors are summed in a pooling operation which ensures permutation invariance. Finally, the result of the pooling operation is passed to a MLP (multi-layer perceptron), &#961;, which maps the input to the prediction values.</p><p>(MAE) of 1.1% for the lattice parameters. In general, the elastic moduli predicted from DFT calculations are comparable to experiments, with an error smaller than 15% for most inorganic phases <ref type="bibr">34</ref> . For random alloys, EMTO-CPA calculations do not consider local lattice distortions, which can slightly overestimate the elastic moduli than results from the supercell method, but the quantitative trends in elastic moduli are similar <ref type="bibr">31</ref> . Supplementary Table <ref type="table">3</ref>, Supplementary Table <ref type="table">4</ref>, and Supplementary Fig. <ref type="figure">1</ref> compare EMTO-CPA predicted HEA elastic properties from our high-throughput calculations with literature <ref type="bibr">27,</ref><ref type="bibr">29,</ref><ref type="bibr">[65]</ref><ref type="bibr">[66]</ref><ref type="bibr">[67]</ref><ref type="bibr">[68]</ref><ref type="bibr">[69]</ref><ref type="bibr">[70]</ref><ref type="bibr">[71]</ref> . Good agreements are found between these EMTO-CPA results. For elastic constants C 11 and C 12 , the MAEs are about 5%. C 44 shows a larger MAE of about 10%. The discrepancies can be attributed to different exchange-correlation functionals and numerical uncertainties. The MAEs for all polycrystalline elastic moduli are about 5%. For Poisson's ratio and Pugh's ratio, the MAEs are 1.8% and 4.0%, respectively. Owning to difficulties in measuring the elastic properties of HEAs, published experimental HEA elasticity data are scarce. Our results indicate first-principles tools can be an efficient tool to generate reliable fundamental HEA data over a large composition space.</p><p>In literature, Vegard's law, or the rule of mixture (ROM), is a popular method to estimate the elastic moduli of HEAs. Our EMTO-CPA dataset allows a quantitative assessment of the accuracy of ROM for HEAs. A ROM estimation of elastic moduli can be made using Eq. (1) <ref type="bibr">37</ref> or Eq. ( <ref type="formula">2</ref>) <ref type="bibr">72</ref> , corresponding to the upper and lower limit of the estimation.</p><p>where c i is the molar fraction, V i is the molar volume, and M i is the elastic moduli of ith element.</p><p>We used EMTO-CPA to calculate the equilibrium volume and bulk modulus of pure elements, which were then used to make ROM estimations of the Wigner-Seitz radius (sws) and bulk moduli of HEAs. Compared with other ROM estimations, we found the average value of Eqs. ( <ref type="formula">1</ref>) and ( <ref type="formula">2</ref>) gives slightly better agreement with the EMTO-CPA values, especially when Al or Ti are in the HEA. Figure <ref type="figure">2</ref> compares sws and B from the ROM estimations for quaternary equimolar HEAs with the EMTO-CPA predictions. The ROM estimation does not consider chemical interactions between different species, so a certain degree of discrepancy in B is expected, as is in the ROM estimated sws. However, the discrepancies in the ROM estimation of B are still quite significant. For many systems, the ROM estimates are significantly higher than the EMTO-CPA predicted B. Due to the disregard of local atomic relaxations, the EMTO-CPA elastic moduli are already overestimated. When taking this factor into account, the ROM estimation can overestimate true HEA elastic modulus by a large fraction. While it is convenient to use ROM to estimate the elastic properties <ref type="bibr">36,</ref><ref type="bibr">37</ref> , our results show reliable predictions should be made with DFT calculations or data-driven predictive modeling.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Performance of deep sets prediction of HEA elastic properties</head><p>Tables <ref type="table">1</ref> and <ref type="table">2</ref> compare the performance of HEA property predictions between Deep Sets and other ML models. Regardless of the model chosen, the prediction MAE for non-equimolar quaternary HEAs is slightly larger than equimolar systems, indicating that predicting properties for non-equimolar systems is more challenging. For all properties, simple models such as knearest neighbor (KNN) and linear regression (LR) generally perform much worse than random forest (RF), support vector machine (SVM), gradient boosting tree (GBT), and Deep Sets models. Among the ML models, the Deep Sets model  Lastly, we tested the generalization performance of our Deep Sets model on a widely studied HEA system Al x CoCrFeNi. Table <ref type="table">3</ref> shows a comparison between the Deep Sets predictions and the EMTO-CPA calculations reported in the literature <ref type="bibr">65</ref> . The result shows a very encouraging agreement especially considering that the Deep Sets model was not trained by any quinary HEA composition.</p><p>Fig. <ref type="figure">3</ref> Boxplots comparing the accuracy of four models (GBT, RF, KNN, and Deep Sets) predicting (from top to bottom, left to right) lattice constant, bulk modulus, and elastic constants C 11 , C 12 , and C 44 . The orange line in the boxplot represents the median value; the lower and upper limits of the box represent the 25th and 75th percentiles respectively, and the whiskers extend to 1.5 times the interquartile range. The color of the scatterplot points corresponds to the colorbar which represents the percentage deviation of the predicted value from the EMTO-CPA calculated value. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Elemental effects on HEA elasticity uncovered from association rule mining</head><p>The rules generated by association rule mining (ARM) are visualized in graph representations in Fig. <ref type="figure">4</ref>. The nodes represent element types. The size of the node represents element fractions:</p><p>The larger the node, the larger the fraction is represented. When a pair of nodes are connected by lines, referred to as edges, the edge colors and widths represent the consequents and lift values of the association rules in which the pair of nodes makes up the antecedent. When the antecedent of an association rule is a single element, the elastic property consequent and lift value of the rule is represented by the color and width of the node outline. The redder (bluer) the color of the edge or node outline, the higher (lower) the value of the elastic property consequent. The node outlines and connections for Fig. <ref type="figure">4</ref>(f) are mapped to a different colorbar to emphasize rules that predict Zener ratios close to 1.0 for the isotropic case. Zener ratios lower than 1.0 are blue, those close to 1.0 are green, and those that are greater are mapped to yellow, orange, and red. From Fig. <ref type="figure">4</ref>(a) it is observed that most element pairs that decrease (increase) B involve elements with low (high) elemental B. For reference, the elements ranked from the lowest to the highest B (obtained for BCC using EMTO-CPA) are Al, Zr, Ti, Fe, Hf, Cu, Nb, Co, V, Ni, Mn, Mo, Cr, and W. However, it is observed that while elemental Mn has a relatively high B, Mn can lower B in an HEA on its own as well as when combined with Zr or Hf. Cr also has high B, but it decreases B when paired with Zr or Hf. These trends show that expectations based on ROM are sometimes contradicted and depends on the specific combination. B is related to the bonding strength among atoms. W may be expected to have an attractive force on the other elements due to its high electronegativity, thus increasing B when paired with the 3d transition metals. The decrease in B that accompanies the (Mn, Zr), (Mn, Hf), (Cr, Zr), and (Cr, Hf) combinations may be due to the low B of Zr and Hf, and the half-filled 3d orbitals of Mn and Cr. A composition containing such a combination would have an element with low B (Hf or Zr) as well as an element (Cr or Mn) that is less likely to be attracted to other constituent elements. In the case of Cr, high B can still be achieved when it is paired with an element with very high electronegativity such as W.</p><p>Figure <ref type="figure">4</ref>(b) shows that Young's modulus, E, is decreased when Zr is combined with almost any other element. Even a small fraction of W results in lowered Young's modulus with Zr. Only when the composition contains Cr or a larger fraction of W is an increase in E predicted with Zr. E is related to bulk and shear moduli assuming a cubic symmetry and elastic isotropy (Eq. 14). W and Cr are expected to increase both B and G, so an increase in E due to W and Cr alloying is expected. An increase in E may be attributed to increased B and/or G which is in turn due to mechanisms such as an increased interatomic attraction, or an increase in the covalent characteristic of the bonding that results in stiffer bonds. Some differences between the figures for B and E are that a combination of Zr with Ni, V, or W can decrease E, but not B, while a combination of Zr and Cr can decrease B, but not E. These trends demonstrate the potential of tuning elastic properties by predictive models such as the Deep Sets model and descriptive models such as ARM.</p><p>Figure <ref type="figure">4</ref>(c) shows that Zr combined with Cu, Al, Hf, or Ni, and Hf combined with Cu, Al, or Ni lower G. On the other hand, Cr or W alone increases G. Notably, combinations involving Mn which are present in the plots for E and B are missing in the plot for G. This may be attributed to the half-filled 3d orbitals of Mn diminishing the effect Mn has on the characteristic nature of interatomic bonding, which in turn means that Mn does not affect G as much in the systems we considered.</p><p>Figure <ref type="figure">4</ref>(d) represents the association rules for Pugh's ratio B/G. Most notably, combinations of Cu and Ni increase B/G. The Pugh's condition predicts that when the ratio B/G is greater than 1.75, the material will be ductile. A related criterion known as Pettifor's criterion states that when Cauchy pressure C 12 -C 44 (Eq. 3) is positive, the bonding characteristic is predicted to be metallic, and the material is predicted to be intrinsically ductile. Otherwise, a negative Cauchy pressure suggests the bonding characteristic is predicted to be covalent and the material is predicted to be intrinsically brittle <ref type="bibr">75</ref> .</p><p>where G v is the Voight averaged shear modulus, and B is bulk modulus <ref type="bibr">75</ref> . The elements involved in combinations that increase B/G such as Cu, Ni, Co are late transition metals with greater valence electron concentration (VEC). Barring strong interatomic interactions that cause directional bonding with angular characteristics, the increased electron density due to the increased VEC might be expected to increase the metallic character of interatomic bonding. Poisson's ratio which is represented in Fig. <ref type="figure">4</ref>(e) has mostly similar trends as Pugh's ratio, with a few differences. This observation is expected since Poisson's ratio can be expressed in terms of Pugh's ratio, &#957;</p><p>. Therefore, the same physics that underlies the B/G trends might be applicable to Poisson's ratio. Figure <ref type="figure">4</ref>(f) shows that combinations of V, W, and Cr are predicted to produce near isotropic Zener ratios. Senkov and Miracle derived a modified expression for the Pugh's condition to take elastic anisotropy into account (Eq. 4) <ref type="bibr">75</ref> :</p><p>Interestingly, from Eq. ( <ref type="formula">4</ref>), increasing the elastic anisotropy also increases the Pugh's ratio threshold for ductile materials which means that the more elastically anisotropic a material is, the greater the Pugh's ratio is needed for the material to be ductile. This observation was confirmed with an analysis of 308 intermetallic compounds and 24 metals <ref type="bibr">75</ref> . Relating this result to the rules in Fig. <ref type="figure">4</ref>(f), compositions containing Al, and Cu will need larger Pugh's ratios to satisfy the modified Pugh's condition, but as seen in Fig. <ref type="figure">4(d</ref>) Cu does increase the Pugh's ratio, as does Al combined with Zr.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>DISCUSSION</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Discovery of HEA compositions with tailored elasticity</head><p>Predictions from the Deep Sets model can be used to screen the HEA space in search of compositions with desired elastic properties. In combination with the association rules, one can further explain the predicted properties in the context of the general property trends in the whole composition space. However, we note that the predictive power of any ML model is limited by the training data. For example, we only consider single cubic phase solid solutions in this study. Additional thermodynamic modeling or experimental studies are necessary to determine the phase selection of the compositions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Ductile HEAs with high Young's moduli</head><p>One of the most desirable property combinations of structural HEAs is high Young's modulus (or bulk modulus), hightemperature softening resistance, good ductility, and low weight density. An example of a refractory HEA with lowdensity, high-ductility, and temperature resistance reported in literature is Nb 40 Ti 25 Al 15 V 10 Ta 5 Hf 3 W 2 <ref type="bibr">76</ref> . However, this HEA is a two-phase alloy with a BCC matrix and B2 nanoprecipitates. With our materials screening it may be possible to discover a singlephase solid solution with comparable properties. When a system's Pugh's ratio B/G is greater than 1.75, the system can be considered ductile. Figure <ref type="figure">5</ref> visualizes the trends of B/G vs. E for the 14-element quaternary HEA space. Detailed screening results are in Supplementary Table <ref type="table">5</ref>. It is obvious the B/G generally decreases with E, indicating high E HEAs usually do not have good ductility. The inset of Fig. <ref type="figure">5</ref> highlights a region where a balance of high B/G and E can be achieved. In particular, the CrMnTiV system can have high Young's moduli and good ductility. The pale blue edges for (Cr, Mn) and (Mn, V) pairs in Fig. <ref type="figure">4(d</ref>) indicate these element pairs tend to produce intermediate to low Pugh's ratios, which is consistent with the screening. The Pugh's ratio of CrMnTiV also has a similar value of 2.530 from DFT calculations using a 64-atom special quasirandom structure. There is no publication reporting the experimental synthesis and elastic constants of the CrMnTiV HEA. Senkov et al reported that CALPHAD modeling predicts the equimolar system CrMnTiV to be a single BCC phase <ref type="bibr">18</ref> , but Yoav et al indicated the phase composition as inconclusive using their "LTVC" approach <ref type="bibr">19</ref> .</p><p>Fig. <ref type="figure">4</ref> Graph representations of association rules between elements and elastic properties of HEAs. Results for (a) bulk modulus, (b) Young's modulus, (c) shear modulus, (d) Pugh's ratio, (e) Poisson' ratio, and (f) Zener ratio respectively. Node colors and sizes represent different elements (as shown in the legend) and fractions. The redder (bluer) the color of the node outlines and connections, the higher (lower) the value of the elastic property is predicted to be. The thicker the node outline or connection, the higher the lift value of the rule is. The node outlines and connections for the Zener ratio are mapped to a separate color bar to emphasize rules that predict Zener ratios that are close to 1.0 for the isotropic case.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Elastically isotropic HEAs with low Young's moduli</head><p>Another group of interesting HEAs are elastically isotropic systems with low Young's moduli. These HEAs can be potentially used in biomedical application <ref type="bibr">77</ref> . The inset of Fig. <ref type="figure">6</ref> shows a composition window that satisfies these conditions. Detailed screening results are in Supplementary Table <ref type="table">6</ref>. The (Cr, V) pair identified as related to elastically isotropic systems from ARM in Fig. <ref type="figure">4</ref>(f) also appears in HEAs within the composition window. From the screening, we notice that the densities of most HEAs in the composition window are relatively low, which can be another advantage for biomedical applications. The EMTO-CPA method usually overestimates the elastic moduli, suggesting that those selected compositions may even have lower E in real world applications.</p><p>In summay, we present an efficient, generalizable, and accurate Deep Sets model that can predict the energetic, structural, and elastic properties of HEA compositions. To our knowledge, the Deep Sets model was trained on the largest dataset of firstprinciples HEA elastic properties. The present work also analyzed the elastic properties predicted by our Deep Sets model using ARM to demonstrate correlations between compositional trends and properties that may assist the ultimate goal of alloy design. Elasticity is the underlying drive for mechanical responses, and the huge composition space that HEAs span gives rise to the potential for tuning elastic properties for targeted applications. Effective alloy design must also be efficient, and our methodology represents a step forward in improving our ability to design HEAs.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>METHODS</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>High-throughput DFT calculations</head><p>High-throughput DFT calculations were performed using the EMTO-CPA method to predict the stability and elastic constants of selected HEA compositions. Fourteen elements (Al, Co, Cr, Cu, Fe, Hf, Mn, Mo, Nb, Ni, Ti, V, W, and Zr) were selected to create a large composition space of elements commonly found in 3d transition metal and refractory HEAs. All possible quaternary equimolar compositions of the 14 elements, and more than 2000 sampled quaternary non-equimolar compositions of these elements were calculated. Non-equimolar compositions that are close to the center of the composition space, i.e., A 1 B x C y D z (0.6 &#8804; x, y, z &#8804; 1), were sampled more frequently. For each composition, we considered the facecentered cubic (FCC) and BCC random solid-solution phases for calculations. While realistic HEAs can crystalize as multi-phase alloys and with different lattices, the focus of the study is to understand the trends of elemental combinations on the properties of single-phase random alloys. The high-throughput workflow can be extended with additional DFT calculations and CALPHAD modeling to treat structurally complex HEAs <ref type="bibr">78</ref> .</p><p>Two sets of EMTO-CPA calculations were performed for each HEA composition. First, the EOS was calculated to determine the relative phase stability and the lattice parameters of the HEA. At least fifteen volumes were dynamically generated for both lattices, and the total energy was calculated at each volume <ref type="bibr">35</ref> . The energy, E, vs. volume, V, pairs were used to fit the Birch-Murnaghan EOS <ref type="bibr">79</ref> in Eq. ( <ref type="formula">5</ref>), where the subscript 0 represents the equilibrium condition and B 0 ' is the derivative of the bulk modulus with respect to pressure. The bulk modulus B 0 was obtained from the EOS.</p><p>Next, EMTO-CPA calculations were performed to predict the elastic constants for the cubic phase with lower energy at the equilibrium volume. For non-equimolar compositions, the complete cubic elastic constant calculations were only calculated for relatively stable compositions with formation energy &lt; 0.15 eV/atom. The elastic constants were found by fitting the energy changes with respect to externally applied deformations to the lattice. The energy change with the orthorhombic deformation (Eq. 6) gives the tetragonal shear modulus c ' (Eq. 7). Elastic constants C 11 and C 12 can be derived using Eqs. ( <ref type="formula">8</ref>) and (9). The energy change with the monoclinic deformation (Eq. 10) gives the elastic constant C 44 (Eq. 11). The deformation to the equilibrium cell was applied at three steps of &#948; o (or&#948; m ) = 0.00,0.03,0.05.</p><p>The polycrystalline elastic moduli were obtained from the calculated cubic elastic constants. The Voigt bound and Reuss bound of the polycrystalline bulk modulus of cubic structure are the same, B R = B V = B 0 80 . We used the arithmetic Hill average for the polycrystalline shear   </p><p>where G R is the Reuss bound (Eq. 12) and G V is the Voigt bound (Eq. 13). The Young's modulus E and Poisson's ratio v were obtained from Eqs. ( <ref type="formula">14</ref>) and (15). The Zener ratio A z (Eq. 16) was employed to assess the elastic anisotropy. When A z = 1, the crystal lattice is elastically isotropic.</p><p>For all DFT calculations, the exchange-correlation energy was defined by the generalized gradient approximation (GGA) in the Perdew-Burke-Ernzerhof (PBE) parameterization <ref type="bibr">81</ref> with the full charge density (FCD) techniques <ref type="bibr">39,</ref><ref type="bibr">82,</ref><ref type="bibr">83</ref> . The Monkhorst-Pack k-point grid was set to 17 &#215; 17 &#215; 17 and the energy was converged to 10 -6 eV/atom. The screened impurity model parameter of CPA was 0.6 with the soft-core approximation. The magnetic moment was initialized as ferromagnetic. We developed a workflow based on pyEMTO <ref type="bibr">84</ref> , performed and analyzed more than 160,000 high-throughput EMTO-CPA calculations <ref type="bibr">85</ref> . The complete dataset is included in the supplementary information. Due to numerical issues, a small fraction of the calculations (especially for compositions with Ta) did not converge to the required accuracy. Results from all successful calculations are included in the reported dataset, but our ML modeling and analysis does not include data with Ta to avoid introducing bias.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Deep sets model</head><p>The main idea of Deep Sets is that any permutation invariant function f over a set X (e.g., the set of elements x in an equimolar HEA) can be represented as &#961; P x2X &#981;&#240;x&#222; &#192; &#193; , where &#961; and &#981; are two functions that are usually parametrized by deep neural networks. In this study, we used MLP (multilayer perceptron) with ELU (exponential linear unit) activation function for &#961; and MLP with ReLU (rectified linear unit) for &#981;. In particular,</p><p>, where &#945; was set as 1, and</p><p>. We used the Adam optimizer <ref type="bibr">86</ref> with batch size 32, learning rate 10 -3 and weight decay rate 10 -4 . In the case of quaternary HEAs with non-equimolar ratios, we incorporated the weights by replacing &#961; P x2X &#981;&#240;x&#222; &#192; &#193; with &#961;&#240; P x2X &#981;&#240;w x x&#222;&#222; where w x stands for the weight of element x in the HEA. Since we consider HEAs as a collection of atoms randomly decorated on a lattice, it is natural to represent HEAs as a set of weighted atoms, with weights representing the elemental concentration.</p><p>We predicted the following properties using the Deep Sets model: sws, lattice parameter (a), elastic constants (C 11 , C 12 , C 44 ), B, G, E, &#957;, and B/G. Each quaternary HEA system was represented as a set of size four, {u 1 ,u 2 ,u 3 ,u 4 }, where u i is the feature vector for an element of the HEA. Elemental properties were encoded in u i using one-hot encoding <ref type="bibr">50</ref> . Table <ref type="table">4</ref> lists the elemental features we selected to train the model. For discrete properties, the property values were encoded according to the category the value belongs to; for continuous properties, the range of property values was evenly divided into 10 categories and the vectors were encoded accordingly. For instance, if we use the group number and period number as the elemental features, the atom feature vector for H will be a 27dimensional vector with the 1 st and 19 th element being 1 and the other elements being 0. If the atomic radius is 0.7 pm, then the atomic size feature vector will be a 10-dimensional vector with the 1 st element being 1 and the other elements being 0. The training data consisted of EMTO-CPA computed property data for 1911 equimolar and non-equimolar quaternary HEAs with complete cubic elastic constants. We randomly split the dataset into 60%, 20%, 20% as training, validation, and test set. We repeated the experiments 10 times and report the average MAE and the standard deviation. We compared our Deep Sets model with KNN, LR, RF, GBT, and SVM models. Table <ref type="table">5</ref> lists the hyper-parameter space that was explored for each algorithm. We first used the training set that contains the features listed in Table <ref type="table">4</ref> along with the stable lattice type (FCC or BCC) as an optional feature.</p><p>After training, we applied the Deep Sets model to predict the elastic properties of 369,369 HEA compositions for further analysis. The larger composition pool contains all quaternary compositions A 1 B x C y D z (0.6 &#8804; x, y, z &#8804; 1 with an 0.1 increment) of the 14 elements. Because the relative cubic phase stability is unknown for most compositions in the space, we used a Deep Sets model that does not include the stable lattice type for property prediction.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Association rule mining</head><p>ARM was utilized to identify meaningful correlations between the elemental combinations and the Deep Sets predicted HEA elastic properties. ARM is a descriptive data mining method to discover underlying relationships between different items in a dataset <ref type="bibr">87</ref> . In this study, the items are element fractions and target properties. The relationships or associations are represented by If-Then rules consisting of an antecedent and a consequent. For example, the rule shown below says: If a HEA composition contains low fractions of elements A and B, then the target property will have a relatively high elastic constant value.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>X ) Y X : low fractions of elements A and B</head><p>Y : high value predicted for target elastic property The condition of low fractions of the A and B elements is the antecedent, X, which is correlated with the consequent, Y, the high value of the target property. While these rules represent correlations and do not imply causation, they have directionality meaning that while antecedent X leads to consequent Y it does not necessarily mean that Y leads to X.</p><p>The first step in ARM is rule generation. Due to the sheer number of feature combinations that can make up both the antecedents and consequents, it is impossible to evaluate the entire combinatorial space of rules. Therefore, Apriori algorithm was used to generate rules from frequently occurring item sets <ref type="bibr">88</ref> . The Apriori algorithm searches for rules based on the assumption that all subsets of a frequently occurring item set will also be frequent. In other words, rule with fewer conditions, which is more general, will be satisfied more frequently. From this assumption it follows that if an item set is infrequent, its supersets will also be infrequent.</p><p>The Apriori algorithm performs a breadth-first search of rules starting with item sets with few items and adding items to the item set, while only considering item sets that exceed a threshold of support in the dataset. The support of an item set is defined as the frequency of the item set within the dataset. If an item set does not have enough support, none of its supersets are considered. After the frequent item sets are found, relevant rules are evaluated based on metrics such as support, confidence, and lift.</p><p>support</p><p>Confidence can be used to find rules with high probability of the consequent Y conditional on the antecedent X. Confidence has a drawback that if the consequent is frequent, the confidence for the rule may be high even though a true relation does not exist between the antecedent and the consequent. Lift is the ratio of the probability of events X and Y co-occurring to the product of the probability of event X and event Y. In other words, lift is a metric that shows whether the cooccurrence of events X and Y is more frequent than would be expected if the two events are statistically independent. Lift does not suffer from the same drawback as confidence.</p><p>The ARM methodology works on datasets with binary data rather than the continuous data that the Deep Sets model predicts. Our aim of the ARM analysis is to produce descriptive trends rather than to train a separate quantitative predictive model. Therefore, the continuous data was first discretized into 10 bins, which transformed the continuous dataset into an ordinal dataset <ref type="bibr">89</ref> . Then, the ordinal data was further transformed by one-hot encoding. The labels of the one-hot encoded data correspond to the rank of the ordinal data. For example, the one-hot encoded feature labels 'Cr0' and 'Cr9' corresponds to low and high Cr fractions respectively. A low threshold support value of 0.006 was used to filter the item sets, to ensure that meaningful rules were not missed due to the relative rarity of an item set. An item set with the support of 0.006 corresponds to 2,216 occurrences in our dataset. The rules were then filtered using a lift threshold of 2.4. The choice of this lift threshold was to keep the number of rules low. Higher lift thresholds can result in too few rules.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>npj Computational Materials (2022) 89 Published in partnership with the Shanghai Institute of Ceramics of the Chinese Academy of Sciences 1234567890():,;</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_1"><p>Published in partnership with the Shanghai Institute of Ceramics of the Chinese Academy of Sciences npj Computational Materials (2022) 89</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_2"><p>npj Computational Materials (2022)<ref type="bibr">89</ref> Published in partnership with the Shanghai Institute of Ceramics of the Chinese Academy of Sciences</p></note>
		</body>
		</text>
</TEI>
