<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>ZeoNet: 3D convolutional neural networks for predicting adsorption in nanoporous zeolites</title></titleStmt>
			<publicationStmt>
				<publisher>Royal Society of Chemistry</publisher>
				<date>08/22/2023</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10638745</idno>
					<idno type="doi">10.1039/D3TA01911J</idno>
					<title level='j'>Journal of Materials Chemistry A</title>
<idno>2050-7488</idno>
<biblScope unit="volume">11</biblScope>
<biblScope unit="issue">33</biblScope>					

					<author>Yachan Liu</author><author>Gustavo Perez</author><author>Zezhou Cheng</author><author>Aaron Sun</author><author>Samuel C Hoover</author><author>Wei Fan</author><author>Subhransu Maji</author><author>Peng Bai</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[<p>ZeoNet, based on 3D convolutional neural networks and a volumetric distance-grid representation, delivers an exceptional performance in predicting Henry's constants for adsorption of long-chain hydrocarbon molecules in all-silica zeolites.</p>]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Nanoporous materials such as zeolites and metal organic frameworks (MOFs) are important adsorbents and catalysts in the chemical industry due to their numerous applications such as gas storage, separation, and shape-selective catalysis. <ref type="bibr">1,</ref><ref type="bibr">2</ref> However, &#57603;nding the best zeolite for a given application is challenging since the relationship between performance and structure is o&#57501;en unknown, and the space of potential structures is large. To date there are over 250 known zeolite framework topologies <ref type="bibr">3</ref> and hundreds of thousands of computationally predicted structures. <ref type="bibr">4</ref> Although the development of accurate, transferable intermolecular potentials <ref type="bibr">[5]</ref><ref type="bibr">[6]</ref><ref type="bibr">[7]</ref> have enabled the computational predictions of adsorption performance in zeolites for a diverse range of applications, <ref type="bibr">[8]</ref><ref type="bibr">[9]</ref><ref type="bibr">[10]</ref><ref type="bibr">[11]</ref><ref type="bibr">[12]</ref> physicsbased simulations still require signi&#57603;cant computational resources, especially when large materials databases or complex mixtures are involved. <ref type="bibr">8,</ref><ref type="bibr">13</ref> Machine learning (ML) is increasingly being used to predict structure-property relationships in a data-driven manner. Such efforts have roots in quantitative structure-activity relationships (QSAR) for drug design <ref type="bibr">14</ref> and other molecular property predictions. These cheminformatics and ML approaches have o&#57501;en used features of atoms and their connectivity such as electronegativity, bond order, molecular weights, and surface area as descriptors. Along a similar line but adapting for extended crystalline materials, Gaillac et al. <ref type="bibr">15</ref> selected 22 local descriptors, 19 global descriptors, and seven porosity descriptors including bond lengths, densities, pore volume, and accessible surface area to predict the mechanical properties of zeolites. Anderson et al. <ref type="bibr">16</ref> built a multi-layer perceptron (MLP) model using six textural properties (e.g., helium void fraction, gravimetric surface area, largest cavity diameter, pore limiting diameter, inverse framework density, and the pore size standard deviation) together with the number density of 17 distinct MOF chemical moieties to predict the adsorption isotherms in MOFs. While conceptually intuitive, these features are highlevel coarse-grained properties that may not be able to accurately capture phenomena dominated by structural details of a material. Adsorption by all-silica zeolites is one such example: the materials are chemically identical, which all consist of corner-sharing SiO 4 tetrahedra, and their dramatic molecular shape selectivity is completely controlled by how framework atoms are arranged in space. <ref type="bibr">1,</ref><ref type="bibr">17,</ref><ref type="bibr">18</ref> Given the materials structures and an accurate intermolecular potential, the quantitative prediction of adsorption in porous materials is, to a large extent, a solved problem through the use of molecular simulations. <ref type="bibr">1</ref> For many adsorption systems, the assumption of a rigid framework structure allows one to pre-tabulate the energies felt by a probe molecule on a regular grid, a practice that improves the simulation efficiency by allowing the framework-sorbate interactions to be interpolated rather than computed. In other words, the energy grid contains complete information about a solid material that can be considered as rigid. Based on this insight, energy grids have been used as the input for ML models by Snurr et al. <ref type="bibr">19,</ref><ref type="bibr">20</ref> The interaction energy of a hydrogen probe at each grid point within the MOF unit cell was calculated and then summarized as an energy histogram. Bins of the energy histogram were used as the input to train a regression model to predict hydrogen and methane uptake with an accuracy within 3 g L -1 . Then, they extended this method to gas mixtures such as binary mixtures of Ke and Xr, and short linear alkanes up to propane. The selectivity for Xe over Kr in Xe/Kr mixtures and singlecomponent adsorption of ethane and propane can be predicted in good agreement with grand-canonical Monte Carlo simulations. However, energy grids are computationally relatively expensive to calculate and condensing them into histograms may also lose 3D structural information.</p><p>Fig. <ref type="figure">1</ref> The ZeoNet pipeline for predicting adsorption in zeolites. The unit cell of a zeolite is replicated to obtain an extended material structure. A fixed-size volumetric chunk with random origins and orientations is converted to a distance grid representation, which is fed to train 3D ConvNets using data collected from physical simulations in a supervised learning setting. In this work, Henry's constants for n-octadecane (C18) adsorption in more than 330 000 known and computationally predicted zeolites were used as the training data. For inference, ZeoNet is applied in a feed-forward manner on the distance grids without the translation and rotation augmentations.</p><p>To represent 3D structures directly, Lin et al. <ref type="bibr">21</ref> pioneered the use of 3D ConvNets with a binary occupancy grid, in which each grid location was marked as either zero or one depending on its distance to the nearest framework atom. They used a LeNet/ AlexNet based network to predict methane adsorption isotherms and were able to achieve an MSE of 0.015 mol kg -1 in loadings. This approach was recently extended to CO 2 adsorption in MOFs. <ref type="bibr">22</ref> Using both structural and energy grids, Kim et al. developed a generative adversarial network to produce plausible zeolite structures with user-speci&#57603;ed heats of adsorption for methane. <ref type="bibr">23,</ref><ref type="bibr">24</ref> While these studies demonstrate the utility of modern ConvNets in representing materials structures, they have focused on the adsorption of small, relatively rigid molecules. It remains unclear how well 3D ConvNets perform for large, &#57604;exible molecules whose properties are expected to be in&#57604;uenced not only by the local pore dimensions, but also their larger structural features.</p><p>In this work, we propose a 3D structural representation learning method, ZeoNet, for the task of predicting the adsorption of n-octadecane, a long-chain hydrocarbon molecule, in all-silica zeolites (see Fig. <ref type="figure">1</ref>). We carried out a systematic evaluation of 3D ConvNets and benchmarked them against MLP and XGBoost regressors trained on high-level descriptors. Four 3D ConvNets were tested, which were 3D variants of the popular AlexNet, <ref type="bibr">25</ref> VGG Net, <ref type="bibr">26</ref> ResNet, <ref type="bibr">27</ref> and DenseNet. <ref type="bibr">28</ref> Two volumetric representations, one based on binary occupancy grids and the other based on distance grids, were compared. The effect of grid resolution, input size, and other hyperparameters such as batch size, learning rate, and optimizer were examined. As summarized in</p><p>Table 1, ZeoNet vastly outperformed the MLP and XGBoost regressors and among the various 3D ConvNets, modern deep networks provided signi&#57603;cant improvement in model accuracy compared to older Alex-Net without sacri&#57603;cing inference speeds.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">The ZeoNet framework</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Adsorption dataset</head><p>To study the ability of 3D ConvNets in capturing spatial correlations of materials structures, the dataset of long-chain hydrocarbon adsorption was selected. This dataset was produced from a computational screening study that used Monte Carlo (MC) simulations to predict the adsorption of three normal alkanes from C18 to C30 and mono-and di-branched C18 isomers. <ref type="bibr">8</ref> The adsorption at both the in&#57603;nite-dilution regime (as characterized by Henry's constants, k H , and heats of adsorption) and a high-pressure, liquid regime (as characterized by the loadings at p = 3 MPa for an equimolar, sixcomponent mixture) was calculated. The intermolecular potentials used in this study were developed for a diverse range of molecules and zeolite structures and their accuracy has been validated extensively against experiments. <ref type="bibr">7,</ref><ref type="bibr">29</ref> In total, the study included 402 experimentally synthesized structures catalogued by the Structure Commission of the International Zeolite Association (IZASC) <ref type="bibr">3</ref> and 331 172 computationally predicted structures from the Predicted Crystallography Open Database (PCOD). <ref type="bibr">4</ref> Here, we focus on n-octadecane (C18), a linear hydrocarbon molecule that has a length of &#8764;2.2 nm when fully extended, and predicting ln k H , as k H scales exponentially with the adsorption free energy. Therefore, zeolites for which k H = 0 were removed, leaving 100 520 structures (269 IZA zeolites and 100 251 PCOD zeolites). It is also worth noting that due to the stochastic nature of the simulations, the adsorption estimates have statistical uncertainties, not unlike experimental measurements, and zeolites with higher adsorption strengths tend to have smaller uncertainties; fortunately these are precisely the structures more important for the application.</p><p>The dataset was initially split randomly into 60% (60 312) for training, 20% (20 104) for validation, and 20% (20 104) for testing. The test set was then sub-divided in order to determine the minimum size needed to reach the level of precision desired for model evaluation. The &#57603;rst subdivision included ten sets, each containing 2000 samples, the second included four sets of 5000 samples each, and the third included two sets of 10 000 samples each. Based on the results discussed later, 10 104 testing samples were moved to the training set, resulting in a training/validation/testing split of roughly 7 : 2 : 1.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Volumetric grids and high-level feature descriptors</head><p>Zeo++, version 0.3, 30 was used to calculate distance grids with a probe radius of 1.2 &#197; and a grid resolution of 0.15 &#197;, while distance grids with lower resolutions were obtained via downsampling using the trilinear interpolation. In a distance grid, each grid location is assigned its shortest distance to the solvent-accessible surface formed by zeolite framework atoms. In this calculation, Si and O atoms have radii of 2.1 and 1.52 &#197;, respectively. The distance can be positive or negative, depending on whether the grid locations lie outside or inside the solvent-accessible surface. To construct the binary occupancy grid, we simply assign a value of one to all grid locations where distances are non-positive and zero where they are positive.</p><p>Zeo++ was also used to calculate the pore-limiting diameter (PLD, unit &#197;), the largest-cavity diameter (LCD, unit &#197;), surface area (unit m 2 g -1 ), and pore volume (unit cm 3 g -1 ) for each zeolite using a spherical probe with a radius of 1.2 &#197;, as well as the number density of framework Si atoms (r Si , unit number per nm <ref type="bibr">3</ref> ). These high-level aggregate feature descriptors were used to construct a MLP regressor and an XGBoost regressor as the performance baseline.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">ConvNet architectures</head><p>3D variants of four ConvNet architectures, which have been used extensively for image recognition, were evaluated. These architectures are designed to work with primarily RGB images and employ 2D convolutions. To operate on 3D data, we replace the 2D convolutions and pooling operations in these networks with their 3D variants similar to prior work that has extended these architectures to handle spatio-temporal data (e.g., for video understanding <ref type="bibr">31</ref> ). We brie&#57604;y describe these architectures below.</p><p>2.3.1 AlexNet. AlexNet was the &#57603;rst large-scale model trained for image classi&#57603;cation and won the 2012 ImageNet Challenge. <ref type="bibr">25</ref> Our implementation consists of seven 3D convolutional (Conv) layers and two fully-connected (FC) layers. Each Conv layer is followed by batch normalization and ReLU activation. Two max pooling layers are inserted a&#57501;er the second and fourth Conv layers. All conv &#57603;lters have 16 channels, a kernel size of 3, a stride of 1, and a padding of 1. The max pooling layers have a kernel size of 2 and a stride of 2.</p><p>2.3.2 VGG16. The VGG architectures <ref type="bibr">26</ref> were introduced as deeper variants of AlexNet with several design changes and outperformed AlexNet on the ImageNet challenge. This VGG16 architecture consists of &#57603;ve blocks and three FC layers. A dropout of 0.5 is added a&#57501;er each of the &#57603;rst two FC layers. The &#57603;rst two blocks each contain two Conv layers and the latter three contain three Conv layers. All Conv layers have a kernel size of 3, a stride of 1, and a padding of 1, which is followed by batch normalization, and ReLU activation. Each block is terminated by a max-pooling layer with a kernel size of 2 and a stride of 2. The &#57603;rst block has 64 output channels and each subsequent block doubles the number of output channels, until it reaches 512.</p><p>2.3.3 ResNet18. He et al. introduced residual blocks with skip connections for training substantially deeper networks. <ref type="bibr">27</ref> The ResNet18 architecture used in this work consists of a Conv layer with a kernel size of 7, a stride of 2, and a padding of 3, followed by a max pooling layer with a kernel size of 3, a stride of 2, a padding of 1, and a dilation of 1. This is followed by four modules that each contain two residual blocks, and &#57603;nally, an average pooling layer and a FC layer. Each residual block contains two Conv layers with a kernel size of 3, a stride of 1 or 2, and a padding of 1. The output channels of the &#57603;rst residual module is 64, and is doubled in each subsequent residual module by including a 1 &#215; 1 Conv layer in the &#57603;rst skip connection while the height, width, and depth are halved in the last Conv layer. Batch normalization and ReLU activation are used a&#57501;er all Conv layers.</p><p>2.3.4 DenseNet121. Extending the idea of residual connections, Huang et al. <ref type="bibr">28</ref> proposed densely-connected networks in which each layer's output is concatenated in all subsequent layers in a feed-forward fashion. The DenseNet121 architecture used here consists of a Conv block, six dense blocks, a transition block, 12 dense blocks, a transition block, 24 dense blocks, a transition block, 16 dense blocks, and a FC layer. The &#57603;rst Conv layer has a kernel size of 7, a stride of 2, and a padding of 3. All dense blocks are identical, containing two Conv blocks, each using the modi&#57603;ed ResNet structure 32 of batch normalization, ReLU activation, and convolution. The Conv layer in the &#57603;rst block has a kernel size of 1 and a stride of 1 and that in the second block has a kernel size of 3, a stride of 1, and a padding of 1. The transition block contains batch normalization, ReLU activation, a Conv layer with a kernel size of 1 and a stride of 1, and an average pooling layer with a kernel size of 2 and a stride of 2, hence reducing the number of the output channels. The number of output channels of the three transition blocks are 128, 256, and 512, respectively.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4">Training</head><p>All models were trained to predict ln k H using the mean squared error (MSE) as the loss function. The baseline MLP and XGBoost models used high-level aggregate features including PLD, LCD, density of framework Si atoms, surface area, and pore volume as input, while the four 3D ConvNets used distance grids as input.</p><p>During training of 3D ConvNets, random translations up to full unit cell lattice lengths and rotations covering all possible spherical angles were applied as data augmentation techniques. The resulting 3D grid was then tiled and cropped to create the desired input size (see Fig. <ref type="figure">1</ref>). The grids at this stage have the same lattice system as the materials themselves but were resampled into a cubic lattice. Trilinear interpolation was used for the translation, rotation, and re-sampling operations. For all modeling work, PyTorch v1.11.0 was used with an Nvidia RTX 2080TI or A100 GPU as the accelerator. All 3D ConvNets were trained for a total of 30 epochs with a batch size up to what is allowed by the GPU memory. Apart from the section on hyperparameter optimization, the Adam optimizer was used with a learning rate of 0.001 and a batch size of 16 for AlexNet and ResNet18, 4 for VGG16, and 8 for DenseNet121.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Results and discussion</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">How large does the test set size need to be?</head><p>To maximize the number of training samples while also ensuring that the test set is large enough to allow for precise estimates of model performance, test sets of different sizes were used to evaluate an AlexNet model pre-trained using 60 312 training and 20 104 validation samples. As shown in Table <ref type="table">2</ref>, increasing the test set size from 2000 to 5000 leads to roughly seven times more precise estimate of the model performance. With 5000 or 10 000 testing samples, r 2 is accurate to the third decimal digit and MSE is accurate to the &#57603;rst decimal digit, which we consider adequate for comparing subsequent benchmarks.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Materials characteristics and MLP/XGBoost performance</head><p>Porous materials are conventionally characterized using geometric concepts. Viewing framework atoms as spheres of different radii, one can de&#57603;ne the pore volume and surface area to be the unoccupied space (a 3D property) and exposed surface (a 2D property) per unit mass of the material. If a spherical probe is placed in the free space, the radius of the largest sphere that can &#57603;t at a given location is de&#57603;ned as the local pore diameter (a 1D property), and since the interior of zeolites is not uniform, one can further distinguish between the pore-limiting diameter (PLD) and the largest-cavity diameter (LCD), which are the smallest and largest local pore diameters across an entire zeolite, respectively. Table <ref type="table">3</ref> gives a summary of the descriptive statistics of these high-level geometric features for all materials in the dataset and Fig. <ref type="figure">2</ref> compares the distributions of all the known zeolites and the computationally predicted ones. As shown in Fig. <ref type="figure">2</ref> and also noted by Pophale et al., <ref type="bibr">4</ref> the computationally generated PCOD database contains a larger amount of smaller-pore zeolites, coincident with higher Si atom density, lower surface area, and smaller pore volume. A fraction of these zeolites contain channel systems inaccessible externally by a probe with a radius of 1.2 &#197;, which is given a value of zero in Fig. <ref type="figure">2</ref>.</p><p>The above geometric features are o&#57501;en used in scatter plots to construct structure-property relationships, although the  resulting correlations are largely noisy and non-predictive (see Fig. <ref type="figure">3</ref> of ref.</p><p>8 as an example). However, to provide a baseline to compare with results obtained with 3D ConvNets, we trained a MLP model and an XGBoost model to predict ln k H , the logarithmic Henry's constant for the adsorption of n-octadecane.</p><p>The MLP achieved a value of r 2 of 0.783 and an MSE of 35.7, which corresponds to an error of &#8764;28.5 kJ mol -1 in the free energy of adsorption, DG ads . The XGBoost model performs better, with r 2 = 0.841 and MSE = 26.2. The performance of the two models is signi&#57603;cantly better than can be expected from the broad scatter plots of individual geometric descriptors.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Performance and optimization of ZeoNet</head><p>3.3.1 Comparing binary occupancy grids and distance grids. Among different volumetric representations, intuitively, energy grids would be expected to contain the most physical information, as they are widely used to speed up atomistic simulations. However, computing an energy grid involves calculating the interactions of a probe atom with all framework atoms and is thus computationally rather expensive. We therefore investigate two alternative volumetric representations that are easier to calculate, including binary occupancy grids that have been used by Lin's group <ref type="bibr">21</ref> and distance grids implemented by Zeo++ (see Computational Details for the calculation of both grid representations). All four 3D ConvNet models were trained using both representations with an input shape of 100 &#215; 100 &#215; 100. As shown in Table <ref type="table">4</ref>, distance grids outperform binary occupancy grids in almost all cases, with the only exception being VGG16 at a grid resolution of 1 &#197;. When using the default grid resolution in Zeo++, Dd = 0.15 &#197;, the values of r 2 for distance grids exceed those for occupancy grids by 0.014-0.046, while MSE is lower by 2.2-7.6. The largest difference is found for AlexNet, which also shows the worst performance for both representations, with r 2 &lt; 0.68 and MSE &gt; 54, while the deeper VGG16 model and the more modern architectures, ResNet18 and DenseNet121, exhibit a dramatic improvement, with r 2 &gt; 0.83 and MSE &lt; 28. Also included in Table <ref type="table">4</ref> are the results obtained with the two representations down-sampled to a grid resolution of 1 &#197; (while keeping the same input grid dimension). The resulting coarser, but larger volumetric grids show even more pronounced improvements than achieved by the more modern 3D ConvNet architectures. The r 2 values are larger than 0.91 and MSE lower than 13.5 in all cases, with a much smaller difference between the two representations. It is apparent that a large enough input volume is critical to ensure good performance, presumably due to the long-chain hydrocarbon molecule selected for the target application, which requires spatial learning of larger patches of the materials structure. As the input volume becomes more limited, the performance of the simplest AlexNet model suffers the most.</p><p>3.3.2 Effect of input volume and grid resolution. Following the observation that the size of the input volume greatly in&#57604;uences the performance of 3D ConvNets, in this section, the effect of input volume was systematically studied. We focus on the distance grid representation and vary the grid resolution from 0.15 to 1 &#197; while keeping its shape at 100 &#215; 100 &#215; 100. Consequently, the distance grids represent a cubic input volume with a linear dimension, L, ranging from 15 to 100 &#197; (see Fig. <ref type="figure">3</ref>). Fig. <ref type="figure">4</ref> (numerical data can be found in ESI Tables S5-S7 &#8224;) shows how the performance metrics vary with input volume: As L increases, r 2 increases and MSE decreases sharply, by 0.09-0.27 and 15.5-45.0, respectively, until the model stabilizes roughly at L &#8764; 45 &#197; for AlexNet and VGG16 and at L &#8764; 30 &#197; for ResNet18 and DenseNet121. Above L &#8764; 45 &#197;, the performance of ResNet18 and DenseNet121 decreases slightly, by about 0.01 (r 2 ) and 1.9 (MSE) at L = 100 &#197;, indicating a potential loss of details due to the lower grid resolutions. The degradation is more signi&#57603;cant for VGG16, with r 2 decreasing by 0.04 and MSE increasing by 7.1, although this observation may be an idiosyncrasy of the speci&#57603;c training runs (see ESI Fig. <ref type="figure">S6 &#8224;</ref>). In contrast, the performance of AlexNet continues to improve, albeit slightly, up to the largest input volume tested, L = 100 &#197;. The relatively smaller depth of AlexNet might be limiting its ability to learn larger-scale features, leading to its lower accuracy than that of the ResNet models at all input volumes/grid resolutions.</p><p>Given the relatively similar performance, it is useful to compare the training speeds of the four 3D ConvNet models. With GPU acceleration using Nvidia RTX 2080TI, the ratio of training times per epoch is roughly 1 : 2 : 1.5 : 2.5 for AlexNet,  Using ResNet18, the effect of input volume was examined at a &#57603;xed grid resolution of 0.45 &#197;. This set of data is shown as &#57603;lled up triangles in Fig. <ref type="figure">4</ref>, which largely fall onto the same trend line as the previous test with different grid resolutions but a &#57603;xed input grid shape. Differences become larger with smaller input volumes (those with L &lt; 30 &#197;): comparing L = 14.4 &#197; and Dd = 0.45 &#197; and L = 15 &#197; and Dd = 0.15 &#197;, the r 2 and MSE values for the latter are better by 0.03 and 5.7, respectively. To compare model performance at exactly the same input volume, two additional tests were performed using a grid resolution of 0.3 &#197; and an input shape of 50 3 or a grid resolution of 0.6 &#197; and an input shape of 25 <ref type="bibr">3</ref> , for an input volume with L = 15 &#197;. As shown in Fig. <ref type="figure">4</ref>, the performance of grid resolutions of 0.15 and 0.3 is almost indistinguishable, while the grid resolution of 0.6 &#197; is slightly worse.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.3">Optimization of hyperparameters.</head><p>Here, the performance of ResNet18 was further optimized by tuning the size of the mini batches, optimizer, and learning rate. Given the comparisons in the previous section, a grid resolution of 0.45 &#197; and an input shape of 100 3 are considered nearly optimal and therefore used without change during the hyperparameter optimization process. Four optimizers were tested, including Adam, <ref type="bibr">33</ref> Adagrad, <ref type="bibr">34</ref> RMSprop 35 and vanilla stochastic gradient descent (SGD). <ref type="bibr">36</ref> Table <ref type="table">5</ref> summarizes the results obtained with the different hyperparameters. First, the effect of learning rate was examined with a batch size of 64 (c.f., last three rows), the largest that can &#57603;t into the GPU memory of Nvidia A100. Next, the batch size was varied from 64 to 4, while the learning rate, according to the commonly used heuristic, was halved with every halving of batch size, resulting in a learning rate of 0.00025 for a batch size of 4 and a learning rate of 0.004 for a batch size of 64. Overall, the training of ResNet18 is largely insensitive to batch sizes and learning rates, achieving nearly identical results with all hyperparameters when the batch size is larger than 8. At the two smallest batch sizes, 4 and 8, the performance is slightly worse with the Adam or Adagrad optimizers. Adam is the best optimizer for this system, slightly outperforming the other three across combinations of batch sizes and learning rates. The best model was obtained using a batch size of 64 and a learning rate of 0.004, which achieved a r 2 coefficient of 0.974 and an MSE of 4.4 on the validation set. Very similar performance metrics (r 2 = 0.973 and MSE = 4.4) were found for the test set, indicating a good model generalization.   To gain a better understanding of model performance, a scatter plot was constructed to compare the target Henry's constants for n-octadecane adsorption from Monte Carlo (MC) simulations with the values predicted by the best ResNet18 model. As shown in Fig. <ref type="figure">5</ref>, the predictions from ResNet18 cluster nicely around the parity line, although they are substantially more accurate for zeolites with larger values of k H (i.e., stronger adsorption). For k H &lt; 1 mol kg -1 MPa -1 , the correlation is visibly noisier. It is worth noting that the ResNet18 model has a mean-squared error of 4.4 in ln k H (Table <ref type="table">1</ref>), or 10.0 kJ mol -1 in DG ads , but as k H scales exponentially with DG ads , even relatively small free energy differences manifest as large differences in Fig. <ref type="figure">5</ref>. To quantify the distribution of prediction errors, k H is grouped into nine classes and the resulting confusion matrix is shown in Fig. <ref type="figure">6</ref>. Both &#57603;gures show that the majority of zeolites have k H &gt; 1 mol kg -1 MPa -1 and these materials were predicted very well by the ResNet18 model. As k H decreases, the prediction becomes less accurate and, interestingly, seems to be slightly positively biased (while still ranking near the bottom). Fig. <ref type="figure">6</ref> further demonstrates the generalizability of the trained model when they are applied to zeolites for which simulations predicted k H = 0. These materials were excluded from the supervised learning   Examining these zeolites with the worst-case errors suggests that their pore diameters are barely large enough to &#57603;t linear alkanes (e.g., AEN-1 has PLD = 3 and LCD = 3.93 &#197;) and increasing the pore sizes even slightly may lead to signi&#57603;cant increases in k H (within the rigid-zeolite assumption). We thus speculate that the spatial resolution of the ResNet18 model, while optimized for the prediction accuracy and efficiency over the entire dataset, may not be adequate to resolve the cutoff pore diameters.</p><p>3.4.2 Effect of training set size. To investigate how many training samples are needed to achieve good model performance, the best ResNet18 model was retrained from scratch using the optimal hyperparameters but with decreasing amounts of training data. These tests maintained the 7 : 2 training/validation split and used the same test set that consists of 10 000 samples. As summarized in Fig. <ref type="figure">7</ref>, the model performance remains relatively unchanged as the number of training samples decreases from 70 416 to 17 500. Empirically, the minimum training set size for this adsorption system to achieve optimal results appears to be 10 000, below which the model performance degrades sharply. With 1050 training samples, the MSE in ln k H increases to above 15 and r 2 drops below 0.93. Nonetheless, these values are still better than the best performance of the MLP and XGBoost models.   (1) given a zeolite structure, what role different regions of the material play in directing the ConvNet to make its prediction; and (2) what features the ConvNet activates most in order to make predictions. Fig. <ref type="figure">8</ref> shows the saliency maps for the MFI zeolite based on the best ResNet18 model. Saliency maps are a feature attribution technique that assigns an importance value to each grid location as the gradient of the model output with respect to the input grid value. <ref type="bibr">38</ref> The resulting 3D gradient &#57603;elds thus characterize how much local changes of each grid in&#57604;uence the model prediction. By comparing with the corresponding distance grids, we found that, interestingly, the ResNet18 identi&#57603;es the accessible pore volume and primarily relies on those regions to make its predictions. To answer the second question, one can look for the types of input structures that strongly activate a speci&#57603;ed feature map, which can be obtained by solving an optimization problem starting from a random input (Fig. <ref type="figure">9a</ref>). Here, we focus on feature maps in the last Conv layer of the best ResNet18 model as they represent higher-level features that may be more relatable. As shown in Fig. <ref type="figure">9</ref>, the ConvNet appears to rely mostly on channels of different sizes and shapes (cylindrical vs. rectangular) to characterize zeolite structures. Finally, it is also worth noting that these visualizations are for structures that activate strongly, but not maximally the given feature map, as we found that pushing the optimization to convergence o&#57501;en yield unrealistic structures, i.e., those with rapidly changing or even nearly discontinuous distance values, which may be due to strided convolutions and pooling operations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Conclusions</head><p>In this work, we developed the ZeoNet representation learning framework to predict the adsorption of long-chain hydrocarbon molecules in all-silica zeolites using 3D ConvNets with volumetric representations. Using the logarithms of Henry's constants, ln k H , for n-octadecane adsorption as the target property, we performed a comprehensive evaluation of different ConvNet architectures and optimization of the grid representations and training hyperparameters. With almost all ConvNet models, it was found that distance grids, which contain the distances from each grid point to the nearest solvent accessible surface formed by zeolite framework atoms, outperformed binary occupancy grids that were used successfully for adsorption of small molecules (Table <ref type="table">4</ref>). Using the distance grid representation, we compared 3D variants of four popular Con-vNet architectures: AlexNet, VGG16, ResNet18, and Dense-Net121 (Table <ref type="table">1</ref>). These models all outperform a benchmark multi-layer perceptron trained on common geometric descriptors including pore-limiting diameters, largest-cavity diameters, surface areas, pore volume, and framework atom densities, which achieved a mean-squared error (MSE) of 35.7 and a correlation coefficient of r 2 = 0.783. The best prediction accuracy was obtained using DenseNet121, which reached r 2 = 0.977 and MSE = 3.8, corresponding to an error of 9.3 kJ mol -1 in adsorption free energy. AlexNet consistently underperformed modern ConvNets, with r 2 = 0.944 and MSE = 9.2. ResNet18 was found to provide the best balance between expressiveness and efficiency, reaching an accuracy of r 2 = 0.973 and MSE = 4.4 but with a 70% faster training speed and a 75% reduction in memory requirements than DenseNet121. All 3D ConvNet models require a minimum input volume to obtain good performance, with AlexNet and VGG16 reaching a performance plateau at a linear dimension L &gt; 45 &#197; and ResNet18 and DenseNet121 relatively stable between L = 30 and 100 &#197; (Fig. <ref type="figure">4</ref>). The performance depends less sensitively on grid resolution, with a small bene&#57603;t at Dd = 0.30-0.45 &#197;.</p><p>Analysis of the model performance (Fig. <ref type="figure">5</ref> and <ref type="figure">6</ref>) reveals that ZeoNet is exceptionally accurate for zeolites with strong adsorption (k H &gt; 1 mol kg -1 MPa -1 ) and slightly over-predicts compared to simulation results for weakly-adsorbing zeolites, which we argue may in fact be partly due to inadequate sampling by the grand-canonical Monte Carlo simulations for the more challenging adsorption systems. In addition, saliency maps suggest that the ConvNets mostly rely on the accessible pore volume to make predictions (Fig. <ref type="figure">8</ref>) and visualization of feature maps further indicates that geometric primitives such as channels of different sizes and shapes are features learned by the ConvNets (Fig. <ref type="figure">9</ref>). Finally, experiments with different training set and test set sizes suggest that a minimum of 10 000 samples are needed to reach peak accuracy and a minimum of 5000 -10 000 samples are needed to obtain a precise estimate of performance metrics (three decimal digits in r 2 and one in MSE). These results provide benchmark quality data and comprehensive guidelines for using 3D ConvNets to model porous materials. ZeoNet and the associated dataset and so&#57501;ware code provide a foundation for developing and comparing methods in future research efforts.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>This journal is &#169; The Royal Society of Chemistry 2023</p></note>
		</body>
		</text>
</TEI>
