<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Machine Learning-Based Rapid Detection of Volatile Organic Compounds in a Graphene Electronic Nose</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>11/22/2022</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10411325</idno>
					<idno type="doi">10.1021/acsnano.2c10240</idno>
					<title level='j'>ACS Nano</title>
<idno>1936-0851</idno>
<biblScope unit="volume">16</biblScope>
<biblScope unit="issue">11</biblScope>					

					<author>Nyssa S. Capman</author><author>Xue V. Zhen</author><author>Justin T. Nelson</author><author>V. R. Chaganti</author><author>Raia C. Finc</author><author>Michael J. Lyden</author><author>Thomas L. Williams</author><author>Mike Freking</author><author>Gregory J. Sherwood</author><author>Philippe Bühlmann</author><author>Christopher J. Hogan</author><author>Steven J. Koester</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Rapid detection of volatile organic compounds (VOCs) is growing in importance in many sectors. Noninvasive medical diagnoses may be based upon particular combinations of VOCs in human breath; detecting VOCs emitted from environmental hazards such as fungal growth could prevent illness; and waste could be reduced through monitoring of gases produced during food storage. Electronic noses have been applied to such problems, however, a common limitation is in improving selectivity. Graphene is an adaptable material that can be functionalized with many chemical receptors. Here, we use this versatility to demonstrate selective and rapid detection of multiple VOCs at varying concentrations with graphene-based variable capacitor (varactor) arrays. Each array contains 108 sensors functionalized with 36 chemical receptors for cross-selectivity. Multiplexer data acquisition from 108 sensors is accomplished in tens of seconds. While this rapid measurement reduces the signal magnitude, classification using supervised machine learning (Bootstrap Aggregated Random Forest) shows excellent results of 98% accuracy between 5 analytes (ethanol, hexanal, methyl ethyl ketone, toluene, and octane) at 4 concentrations each. With the addition of 1-octene, an analyte highly similar in structure to octane, an accuracy of 89% is achieved. These results demonstrate the important role of the choice of analysis method, particularly in the presence of noisy data. This is an important step toward fully utilizing graphene-based sensor arrays for rapid gas sensing applications from environmental monitoring to disease detection in human breath.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>E lectronic-nose (e-nose) systems are promising tech- nologies for early diagnosis of a variety of diseases. <ref type="bibr">[1]</ref><ref type="bibr">[2]</ref><ref type="bibr">[3]</ref> Most e-nose systems utilized for this purpose detect VOCs found in exhaled human breath. Thousands of VOCs can be found in breath and provide useful VOC concentration patterns that reflect the body's metabolism. <ref type="bibr">4,</ref><ref type="bibr">5</ref> Their concentrations can range from hundredths of ppb to thousands of ppb, depending on the VOC. <ref type="bibr">6</ref> As such, they are potentially useful in screening for abnormalities ranging from cancers to cardiovascular and respiratory diseases. <ref type="bibr">[7]</ref><ref type="bibr">[8]</ref><ref type="bibr">[9]</ref> However, because VOC patterns may be similar between diseases states, e-nose systems must be capable of distinguishing pattern profiles of multiple analytes, and if such systems can be realized, they could play an important role in future clinical healthcare applications.</p><p>The most common e-nose technologies described in the literature are based upon chemiresistive sensors such as metaloxide (MOX) devices, <ref type="bibr">1,</ref><ref type="bibr">10</ref> polymer composites, <ref type="bibr">1</ref> nanotubes, <ref type="bibr">11</ref> and tunneling-based metal nanoparticles. <ref type="bibr">12,</ref><ref type="bibr">13</ref> These sensors are typically resistance-based devices that use a thin conducting channel of either oxidizing or reducing compounds. Upon exposure to VOCs, their resistance can reversibly change in proportion to the concentration of the target gas molecule. If several different chemiresistors with sensitivities to different VOCs are used in combination, their responses form a "breathprint" that is associated with a particular disease. <ref type="bibr">14,</ref><ref type="bibr">15</ref> These sensors have several advantages, including having a fairly wide range of sensor materials that can be used, sub-ppb-level sensitivity, relatively short response times, long lifetime, and small sensor dimensions. <ref type="bibr">16</ref> Chemiresistive e-noses have been studied for numerous diagnostic or screening applications including detection of lung cancer <ref type="bibr">17,</ref><ref type="bibr">18</ref> and other diseases. <ref type="bibr">19</ref> Despite the attributes listed above, chemiresistor e-noses have several limitations. For instance, some MOX-based sensors require operation at high temperatures, increasing their power consumption and overall system complexity. They are also limited in the number of different types of sensors that can easily be used, and as such, a very high level of multiplexing (many 10s of sensors) is difficult to achieve in practice.</p><p>In light of the limitations of current e-nose technologies, graphene emerges as an attractive alternative. <ref type="bibr">2,</ref><ref type="bibr">[20]</ref><ref type="bibr">[21]</ref><ref type="bibr">[22]</ref><ref type="bibr">[23]</ref><ref type="bibr">[24]</ref> Graphene, while chemically inert itself, can be functionalized with a wide variety of chemical receptors self-assembled onto the graphene surface. <ref type="bibr">[25]</ref><ref type="bibr">[26]</ref><ref type="bibr">[27]</ref><ref type="bibr">[28]</ref><ref type="bibr">[29]</ref><ref type="bibr">[30]</ref><ref type="bibr">[31]</ref><ref type="bibr">[32]</ref><ref type="bibr">[33]</ref> Such functionalizations include derivatives of pyrenes and cyclodextrins as well as porphyrins and numerous other receptors. The functionalizations interact with different VOCs through a range of different mono-and multi-topic interactions, including hydrogen-bond acceptance and donation, ion-dipole interactions, metal center ligation, inclusion complexation, steric repulsion, and covalent bond formation. <ref type="bibr">34</ref> The self-assembly of monolayers of these compounds on graphene is driven by noncovalent interactions between the receptors and graphene (such as &#960;-&#960; stacking) <ref type="bibr">30,</ref><ref type="bibr">35,</ref><ref type="bibr">36</ref> and, importantly, does not compromise graphene's high conductivity or significantly alter its band structure. <ref type="bibr">37</ref> This functionalization capability allows graphene sensors to easily be adapted to various sensing applications, since the underlying graphene platform is unchanged while the functionalizations can be replaced readily. This is simpler than in chemiresistive sensors, where adding an additional sensor type could require extensive process development or optimization.</p><p>Graphene-based sensors have extremely fast responses since the gas interaction is surface driven, and so gas diffusion into a bulk sensing material is not necessary (though slower effects can still be present due to intercalation and interaction with adjacent materials). The interaction of test gases with the surface changes the free carrier concentration of the graphene, altering measurable electrical characteristics. These alterations are detected by tracking various features extracted from sensor response curves. Multiple features that describe varying aspects of the graphene-gas interactions may be extracted from a single curve.</p><p>A major step in gas sensing research is the ability to distinguish between many gas species and even concentrations. Nallon et al. presented work with a single, unmodified graphene gas sensor that demonstrated clear separability between 11 different chemically diverse gas species and between 9 different chemically similar species with principle component analysis (PCA), albeit at only one concentration each. <ref type="bibr">38</ref> Their classification algorithms (including Random Forest) were also able to predict the identity of these gases with &gt;92% and &gt;88% accuracy in chemically more diverse and similar gas sets, respectively. Nallon et al. point to the potential of graphene as a modifiable sensing material for use in sensor arrays, where chemical cross-reactivity between functionalizations and gases will be instrumental to distinguishing between similar gases and for sensing complex gas mixtures. However, graphene-based sensors still have some of their own limitations. The first of these is the difficulty in producing a large number of devices with consistent characteristics. This is due to the relative immaturity of graphene device fabrication technology within commercial fabrication environments, though progress is being made rapidly. <ref type="bibr">39,</ref><ref type="bibr">40</ref> Currently, one of the largest reported arrays of graphene chemical gas sensors was reported by Kybert et al., which consisted of a 56-device array. However, in that work, only 10 devices were measured simultaneously, and only 4 functionalization receptors were used. <ref type="bibr">41</ref> Another study reported a sensor array based on singlelayer graphene consisting of 100 individual devices, which were probed simultaneously. <ref type="bibr">42</ref> However, the devices in that array were functionalized with only 4 different receptors, and further, this study took electrical measurements of the devices only after applying the target analyte as a liquid and did not record real-time data. Finally, in Mackin et al., chemiresistive graphene sensor arrays were fabricated and used to detect ammonia gas, but only a single Co-porphyrin functionalization was used. <ref type="bibr">43</ref> These studies point to a limitation of graphene sensors to date, which is a lack of chemical diversity in functionalizations. Such diversity may be important, since distinguishing between complex gas mixtures such as human breath may require a large number of receptors. Arrays of sensors with a large number of receptors are still not common in the current literature&#65533;in the graphene array studies mentioned above, at most, only 4 different receptors were used.</p><p>Large sensor arrays will necessarily collect large amounts of data, and an especially difficult aspect of this is the large number of possible data features. A considerable amount of machine learning research focuses on finding optimal ways of extracting scalar features to represent response data, and this task is different for every sensor system. <ref type="bibr">44</ref> One option is to extract an extensive list of many features and then reduce or compress any redundant, noisy, or otherwise uninformative features through dimensionality reduction techniques such as PCA. <ref type="bibr">45</ref> While this may improve classification accuracies, these exhaustive, brute force feature extraction methods do not always provide clear insights into which of the extracted features are truly important. Such information could provide a deeper understanding of sensor mechanisms which could in turn help to fine-tune their design for better gas detection.</p><p>Finally, many sensing applications benefit from or require rapid measurement speeds. Because gas sensors need time to respond to the analytes, there is a relationship between measurement speed and signal magnitude-typically, the faster the measurement, the smaller the signal. Even sensor materials with theoretically rapid response time such as graphene are limited in how quickly they can respond to brief exposures to the analyte: It has been suggested that adsorption barriers prevent some gas molecules from landing directly on graphene's defect sites, causing the responses to be diffusion limited. <ref type="bibr">46</ref> Indeed, long response times are seen in the graphene-based gas sensor array literature, where response times can vary between 30 s and 20 min. <ref type="bibr">38,</ref><ref type="bibr">41,</ref><ref type="bibr">43,</ref><ref type="bibr">47,</ref><ref type="bibr">48</ref> Improving this even further could enhance graphene's utility in sensing.</p><p>In this work, we show the successful implementation of a graphene gas sensor array system that incorporates fast, realtime measurements, a large number of individual devices, and an extremely diverse set of chemical functionalizations. This is accomplished using 4 identical arrays of 108 graphene variable capacitor (varactor) sensors each with 36 distinct functional-izations. We also report the necessary conditions for applying each of these functionalizations to graphene. With so many devices available for simultaneous probing on a single array, this greater chemical diversity in functionalizations improves the cross-reactivity with target analytes. We show discrimination between six different VOCs including an alcohol, ketone, aldehyde, and several hydrocarbons. Each of these was tested at four concentrations to help assess the array sensitivity and differentiation limitations. Our results show that even an unsupervised machine learning algorithm (PCA) produces data clusters with good visual separation between most gases and concentrations. The trade-off for such a highly parallel, rapid measurement process is signal-to-noise ratio. However, in this work, we show that a supervised algorithm based upon Random Forest classification achieves good separation with up to 89% accuracy for the tested system and with accuracy over 97% when omitting one gas with high chemical similarity to another in the data set. Machine learning allows us to reduce the time that the sensors are exposed to the analytes, despite the associated reduction in signal magnitude. These results are a promising start to utilizing large-scale arrays of graphene devices for gas sensing in complex, multi-gas environments where fast detection is required.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>RESULTS AND DISCUSSION</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Description of Varactors and Measurement System.</head><p>A diagram of the e-nose design and testing setup is shown in Figure <ref type="figure">1</ref>. The individual sensors are composed of graphene varactors configured in a multi-finger geometry to ensure highspeed operation. <ref type="bibr">49</ref> The varactors consist of a tungsten (W) local bottom gate electrode, a composite dielectric stack consisting of an Al 2 O 3 and HfO 2 layer, and a graphene layer on top. The equivalent oxide thickness of the dielectric stack is &#8764;3.3 nm. Contacts to the graphene were made using interdigitated electrodes configured such that the graphene above the W electrode was exposed to the environment. The local gate electrode and HfO 2 dielectric layer were fabricated on 150 mm wafers in a commercial fabrication facility, while the remaining fabrication steps were performed on smaller samples in the Minnesota Nano Center. After fabrication and dicing, 36 chips containing three varactors each were functionalized with one of the 36 different chemical receptors listed in Table <ref type="table">1</ref>, and these functionalized chips were wirebonded to a printed circuit board (PCB) that was probed from underneath. Details of the fabrication and functionalization procedure are provided in the Methods section.   [4]arene [1]  quinone Bare graphene is included to note its average V DF relationship with the others. b Such as amines, alcohols, and carbonyl compounds, including ketones, esters, amides, carboxylic acids, and aldehydes. 50 c Such as primary and secondary amines, alcohols, and carboxylic acids. <ref type="bibr">50 d</ref> Such as olefins, amines, hetereoaromatics, and carbonyl compounds, including ketones, esters, amides, carboxylic acids, and aldehydes. <ref type="bibr">51</ref> e Such as compounds with phenyl and short alkyl groups. <ref type="bibr">52</ref> f Such as compounds with aromatic and long alkyl groups. <ref type="bibr">53</ref> We have previously reported a method for pyrene-and cyclodextrin-based functionalization, <ref type="bibr">29</ref> reported on the sources of disorder in graphene, <ref type="bibr">54</ref> and have also shown that pyrenebased functionalization does not degrade the quantum capacitance of graphene. <ref type="bibr">55</ref> Raman mapping and spectra of bare graphene used in this study are shown in Figure <ref type="figure">S1</ref> of the Supporting Information, and generally show that the graphene has good quality, with only a slight D peak typical of graphene grown by chemical vapor deposition (CVD).</p><p>A custom-designed high-speed capacitance-measurement circuit applied voltage biases to each varactor in the array and recorded the capacitance responses in rapid succession. Four identical arrays with 108 varactors each were used in this study, for a total of 432 individual varactors tested. The measurements were performed as follows: A DC voltage of -1.5 V was applied to the bottom electrode of all varactors in an array simultaneously, and then the capacitance of each varactor was measured by sequentially applying a 250 kHz square-wave voltage with a peak-to-peak amplitude of 100 mV to each device. Then, the bottom electrode DC voltage was increased by 0.05 V, and the process repeated up to a maximum DC voltage of +1.5 V, after which, the DC voltage was swept back to -1.5 V. We refer to the capacitance-voltage (C-V) curve determined from DC voltages going from -1.5 V to +1.5 V as the forward sweep, and those from +1.5 V to -1.5 V as the reverse sweep. Capacitance measurements for the entire array at each DC voltage step took approximately 38 ms, and a complete forward or reverse sweep took approximately 2.3 s. Images of the measurement circuit and fabricated varactor chips are shown in Figures <ref type="figure">S2</ref> and<ref type="figure">S3</ref> of the Supporting Information, respectively.</p><p>A total of 36 distinct functionalizations were used for this study. They include compounds from four main chemical groups: pillararenes, porphyrins, pyrenes, and cyclodextrins. These were selected for the wide range of receptor-analyte interactions that they provide, including hydrogen-bond donation and acceptance, dipole-dipole interactions, ligation to metal centers, and steric repulsion. Of these functionalizations, we have previously shown the application of 10 on graphene monolayers. <ref type="bibr">30</ref> In this work, we demonstrate the application of an additional 26 functionalizations. High-density monolayers of these compounds on graphene were obtained by self-assembly. Confirmation of receptor adsorption onto the graphene and information on the surface concentration of the receptors was performed using contact angle <ref type="bibr">56,</ref><ref type="bibr">57</ref> and X-ray photoelectron spectroscopy (XPS) measurements, <ref type="bibr">30</ref> making it possible to determine the optimum concentration of the receptors in the functionalization solutions in order to obtain dense monolayers but avoid multilayer formation. A full list of the 36 receptors used and their expected chemical interactions with analyte gases is provided in Table <ref type="table">1</ref>, while Langmuir adsorption isotherms for each receptor can be found in Figures <ref type="figure">S4-S6</ref>. It should be noted that while we only used 4 arrays in this study, each of the 36 functionalizations were represented by 3 duplicate sensors per array, for a total of 12 sensors per functionalization across the whole system. Furthermore, each of these 12 replicates was used to test 25 distinct gas and concentration pairs, which collectively provides a great deal of information for examining device consistency.</p><p>Feature Extraction. Several methods have been used previously to extract features from response curves in gas sensors. <ref type="bibr">44</ref> These include coefficients from curve fitting and transformations with different types of functions as well as geometric values or slopes at certain points, ranges, or areas of the curve. One advantage to geometric features is their inherent interpretability if the values are chosen judiciously. Previous work on graphene varactors has already described a set of geometric features that have well-understood physical interpretations that help us to explain the underlying mechanism of sensing. <ref type="bibr">55,</ref><ref type="bibr">58</ref> In this work, the main geometric features that were chosen to describe the C-V curves were the Dirac point, V D , the minimum capacitance, C min , and maximum capacitance, C max . From the latter two of these values, an additional feature, the tuning range, TR = C max /C min , was also defined. Finally, because both forward and reverse sweeps were performed, additional information conveyed by the hysteresis could be determined. Therefore, V D , C min , C max , and TR were all recorded for both sweep directions, and the Dirac point hysteresis, &#916;V D , was also recorded for a total of 9 features: V DF , V DR , C minF , C minR , C maxF , C maxR , TR F , TR R , and &#916;V D . A detailed description of the sweep parameters and definitions of the extracted features are provided in Figures <ref type="figure">S7</ref> and<ref type="figure">S8</ref> of the Supporting Information. Figure <ref type="figure">S7</ref> visually depicts how each feature is calculated from a typical C-V curve, and also includes a description of how each feature relates to the underlying physics of the graphene varactors and their interaction with the gas analytes. Figure <ref type="figure">S8</ref> demonstrates how the shape of the C-V curves may change under gas exposure.</p><p>Device Consistency and Effect of Functionalizations in Pretest. Before each gas sensing run, a set of initial pretest curves were taken in N 2 . These curves are useful to show how the functionalizations themselves affected the features described above and are also used as the reference baseline value from which the sensor response is determined. Five forward and reverse sweep cycles were recorded during the pretest phases. An example comparison of pretest C-V curves is shown in Figure <ref type="figure">2a</ref>, where the curve shown for each device is that of the final pretest sweep. This sweep was chosen to show the C-V characteristics after the device response to N 2 had settled, but before the introduction of the gases, to best represent the initial condition of the device before the testing phase. Here, a pretest C-V curve for a bare (unfunctionalized) graphene varactor is shown in blue, and the curve for a functionalized varactor is shown in red. Arrows on each curve denote the forward and reverse voltage sweeps. Three of the features are indicated on the forward sweep of the bare device (V DF , C minF , and C maxF ). In the case represented here, it is clear that the functionalization increases both the overall capacitance and shifts the Dirac point to more positive values when compared to a bare device.</p><p>The functionalization-dependent change in device behavior is further demonstrated in Figure <ref type="figure">S9</ref>, where a pretest C-V curve of every varactor on each sensor array card has been plotted. Each subplot of Figure <ref type="figure">S9</ref> includes data from all 120 varactors on a single sensor array. Each of the 3 varactors on a single array that have been functionalized with the same chemical receptor are grouped together in the figure, and the 36 functionalization groups have been sorted by their average V DF across all 4 arrays as in Table <ref type="table">1</ref>. The 12 bare devices on each array are also shown, and they are highlighted by a pink box to help the reader to compare them to the functionalized devices. This representation is intended to demonstrate that on each array, (1) the three duplicate varactors of the same functionalization type produce similar pretest C-V curves, and</p><p>(2) pretest C-V curves from dif ferent functionalization types vary in shape and location on the voltage and capacitance axes. While not every curve is unique, there is a gradient in C-V curve behavior across functionalizations that is consistent within groups. This does not show how each functionalization responds to gas exposure; however, it does show that at least the pretest C-V curves have been altered in varying ways by the functionalizations. All four sensor arrays show very similar V DF trends.</p><p>Figure <ref type="figure">S10</ref> shows a forward pretest curve from a sample of varactors in all four sensor array cards, normalized by their minimum capacitances. This demonstrates that devices functionalized with different receptors produce distinct C-V curve shapes that are consistent between replicate varactors of the same functionalization and across all four sensor arrays. Scatter plots showing the means and standard deviations of one forward pretest sweep of the 12 devices of each functionalization group across the 4 arrays are shown in Figure <ref type="figure">S11</ref>. Forward sweep pretest feature values of every varactor on all arrays are shown in Figure <ref type="figure">S12</ref>, and histograms of pretest V DF values from every varactor in each functionalization group are shown in Figure <ref type="figure">S13</ref>. These results provide a strong indication of the high yield of the graphene varactors, as well as the consistency of the changes in the C-V curves induced by the functionalization. For example, while some groups of functionalizations share similar Dirac point values as evidenced by their overlapping standard deviations in Figure <ref type="figure">S11</ref> (e.g., functionalizations 1-10), other groups do not overlap with the first group. Similarly, distinct capacitance values are seen from each functionalization in the C min , C max , and TR features. Similar shapes in the scatter plots of Figure <ref type="figure">S12</ref> suggest that devices of the same functionalization type across all four arrays are producing similar pretest values. The histograms in Figure <ref type="figure">S13</ref> also show a shift in the distribution of each functionalization.</p><p>We note that it is not necessary that each functionalization produces a distinct pretest response, as their specific response patterns to the set of gas analytes tested produces the information necessary for the classification algorithms to distinguish between the gases. However, these figures demonstrate that even within the pretest response the devices have been consistently altered by the functionalization type.</p><p>Measurement Procedure and Gas Responses. The gas sensing measurements proceeded as follows. VOC mixtures produced by flowing N 2 carrier gas through a gas bubbler containing neat liquid were flowed through the chamber, and as the gas molecules flow over the arrays, they interact with the functionalized sensor surfaces. These interactions alter the capacitance of the graphene and result in a shifting of the recorded C-V curves compared to the pretest period. This shift will vary depending on the functionalization and the gas molecules, and so the different functionalization/gas combinations produced specific response patterns that were input into machine learning algorithms. Six gas species were chosen for testing at four concentrations each. These gases are listed in Table <ref type="table">S1</ref>, along with their associated concentrations in ppm. These correspond to 1, 10, 50, and 100% of the saturated vapor concentration for each gas. The concentrations tested range from approximately 145 to 103,000 ppm (depending on the gas), and so are not especially low compared to the concentrations that these gases might be found at in a realworld context. However, as a sensing array may be used to produce signals in response to the entire chemical content of a given sample using the cross-reactivity of every sensor on the array, extremely high sensitivity to single analytes may not be required to detect specific diseases. In total, 25 combinations of gases and concentrations were tested. The gas flow and measurement setup are shown in Figure <ref type="figure">S14</ref>, and results of a computational fluid dynamics simulation of the gas flow through the sensor array card chamber are shown in Figure <ref type="figure">S15</ref>. This shows that the sensors are located in areas of uniform flow.</p><p>As an additional indicator of the device yield and the longterm stability over multiple measurements, Figure <ref type="figure">S16</ref> shows the averaged forward sweep pretest values for each of the 25 measurements of one representative bare device. Between each measurement, the sensor was vacuum baked to desorb gas molecules and reset the electrical characteristics. The lack of trends in this figure demonstrates that the sensor behavior is not being altered or degraded throughout repeated use. To show that the functionalization itself is not being degraded, Figure <ref type="figure">S17</ref> shows the difference in forward sweep pretest values for each of the 25 measurements between two representative devices, one functionalized with 1-pyrenesulfonic acid and the other a bare device. A similar lack of trends in this figure demonstrates that the functionalization is not changing throughout repeated use. To demonstrate the device behavior and drift during repeated exposures to varying concentrations of the analyte within one measurement, we have also included data gathered from a single bare graphene device throughout repeated exposure to ethanol at vapor concentrations corresponding to 1% -10% in Figure <ref type="figure">S18</ref>. The gas exposures were longer than in the sensor array data (340 s instead of 40 s), and so the drift in some features is more significant. Despite this, even after multiple exposures to the VOC the sensor responses show a tendency to return to the same baseline value during the N 2 flow in most features.</p><p>The 6 VOCs tested produced varying response types in each feature, as exemplified by Figure <ref type="figure">2b,</ref><ref type="figure">c</ref>, which show the ethanol and hexanal responses in V DF and C minF , respectively. In each of the pretest, exposure, and posttest measurement phases, 5, 8, and 6 complete forward-reverse sweep pairs were completed, respectively. Each measurement set was completed in approximately 94 s (pretest: 24 s; exposure: 40 s; posttest: 30 s). The baseline values for the various features were all determined from the final sweep in the pretest period, as the final sweep was the most representative of the initial condition before the introduction of the gases. Two types of responses were found based upon the exposure period behavior relative to the baseline. A positive response was defined as one in which the value of the feature in question increased relative to baseline during gas exposure, and a negative response was one in which the value of the feature in question decreased relative to baseline during gas exposure. For example, in the case of V D , a positive response would result in the C-V curve shifting to the right during gas exposure. Similarly, a negative response would result in the C-V curve shifting to the left during gas exposure. Another example is in the case of C min , where the curves would shift up during gas exposure for a positive response and shift down for a negative response. The behavior of a C-V curve undergoing a positive and negative response in each feature is shown schematically in Figure <ref type="figure">S8</ref>. Response values were calculated as the difference between the maximum (minimum) value of the exposure period for a positive (negative) response curve and the final sweep of pretest. The data points used to calculate responses for each curve in Figure <ref type="figure">2b</ref>,c are highlighted in red. The ethanol curve in Figure <ref type="figure">2b</ref> is an example of a positive response, and the hexanal curve in the same subfigure is an example of a negative response. Response data were calculated from each of the nine features for each particular gas-concentration response curve recorded from every sensor on the four sensor array cards. Further examples of the response data similar to Figure <ref type="figure">2b</ref>,c are shown in Figures <ref type="figure">S19-S24</ref>: representative bare device response curves for each analyte at 100% concentration (Figures <ref type="figure">S19</ref> and<ref type="figure">S20</ref>) and 1% concentration (Figures <ref type="figure">S21</ref> and<ref type="figure">S22</ref>) as well as representative saturated ethanol vapor response curves for 1 bare device and 4 functionalized devices (Figure <ref type="figure">S23</ref> and<ref type="figure">S24</ref>). Figures <ref type="figure">S25</ref> and<ref type="figure">S26</ref> show scatter plots of the means and standard deviations of each functionalization's response values across the entire data set (similar to Figure <ref type="figure">S11</ref>, which shows only the baseline values). Additionally, every calculated response value in the data set is shown in Figure <ref type="figure">S27</ref>. Machine learning algorithms are needed to discriminate more finely between gases; however, different patterns can still be seen by eye in Figure <ref type="figure">S27</ref>. For example, ethanol and hexanal responses have opposite signs and varying magnitudes in different features.</p><p>Average estimates of the V DR limit of detection (LOD) for each functionalization in the sensor array under exposure to each of the 6 gases are provided in Figure <ref type="figure">S28</ref>. The V DR LODs estimated in this way are large, and differential sensitivities of the functionalizations between each gas are subtle. This is due to the fast measurement speed and short gas exposure time. However, as will be shown later, machine learning can still detect these subtle differences and successfully differentiate between gas and concentration classes with high accuracy, correctly classifying 6 gases at concentrations as low as 145 ppm.</p><p>In this work, we have opted for rapid sweeping to demonstrate the varactor capabilities for applications requiring short gas exposure and fast results. However, the sensitivity of the graphene varactors is directly related to the speed of the data acquisition&#65533;the varactors need time to respond to the presence of gas molecules, and so the shorter the gas exposure, the smaller the response signal. To illustrate this point, Figures <ref type="figure">S29</ref> and<ref type="figure">S30</ref> include comparisons between the rapidly measured sensor arrays and data collected from single (not multiplexed) varactors at slower measurement speeds. Whereas each voltage sweep of the sensor array data was completed in approximately 5 s, each sweep of the slower, single-sensor data of these figures was completed in approximately 17 s. In Figure <ref type="figure">S29</ref>, the single-sensor measurements were performed using 20 sweeps, so that the total gas exposure times were 40 s (arrays) and 340 s (single sensors). The single sensors were exposed to ethanol at 1%, 4%, 7%, and 10% saturated vapor concentrations (as compared to the 1%, 10%, 50%, and 100% saturated vapor concentrations that the array sensors were exposed to). Over the matching concentration range between 1% and 10%, the single sensors exhibit approximately 1.6 times the sensitivity of the sensor arrays, and at 10% concentration, the average signal magnitudes of the single sensors are approximately 4 times larger than the sensor arrays. While the difference in sensitivities between the bare device and the devices functionalized with Receptors 2 and 26 are very subtle in the sensor arrays, the trend matches that seen in the single sensors: Receptor 26 &gt; bare &gt; Receptor 2.</p><p>In Figure <ref type="figure">S30</ref>, the slower, single-sensor measurements have varied exposure periods of 5, 10, and 20 sweeps (compared to the 8 sweeps in the faster array data) to demonstrate the change in sensitivity due to exposure duration. The total exposure time for each case is as follows: 40 s (fast array measurements), 85 s (single-sensor measurements -5 sweeps), 170 s (single-sensor measurements -10 sweeps), and 340 s (single-sensor measurements -20 sweeps). The single sensors were also tested at much lower concentrations of ethanol (20-180 ppm) to get a more accurate estimate for their LOD. The lowest estimated LOD is 19 ppm for the 10-sweep measurements in V DF . This shows that the varactors are highly sensitive when measured at slower speeds, and so optimization between rapid detection and high sensitivity is possible. This figure also directly illustrates the trade-off between measurement speed and sensitivity. Within the slower, single-sensor measurements, the average LOD is larger in the 5-sweep version than in the 10-and 20-sweep versions in all features except C minR , C maxF , and C maxR . These three features typically produce much noisier signals, and so their responses are more variable. The difference between the 10-and 20-sweep versions is less clear, suggesting a saturation in sensitivity improvement with increase in exposure time.</p><p>Parameter Selection. The responses relative to the baseline represent the changes in the device behavior after gas exposure. These distinct changes allow machine learning algorithms to discriminate between both gas species and concentrations. Two algorithms were chosen for this purpose: PCA and Random Forest classification (RF). PCA demonstrates the inherent discrimination between gas species and concentrations with no model supervision, while the supervised RF technique can classify with greater accuracy. The number of parameters relative to data points was quite large: with 9 features for each of the 36 functionalizations, there were 324 possible recorded parameters per measurement (although we note that 73 parameters were excluded due to missing data as explained in the Methods section, and so only 251 parameters were considered). With 25 tested gases, 4 arrays, and each functionalization triplicate regarded as a different observation, we had 300 data points. Because of this high parameter-to-data point ratio, we risked model overfitting, and the data set needed to be reduced for model efficiency. This reduction could be done by eliminating parameters or by using dimensionality reduction methods, such as PCA. An additional desired outcome from this analysis was to better understand how each parameter is altered by gas exposure, and so eliminating parameters arbitrarily was not an ideal method. Likewise, compressing parameters with PCA would lose more detailed information about the importance of each. On the other hand, the RF algorithm can handle many parameters since it effectively performs an informed predictor selection during training, and so this algorithm was chosen as the main classification method. Additionally, we utilized bootstrap aggregating to train 200 ensembles of Random Forest trees, in which each tree used different, randomly sampled training and testing data sets. This method is commonly used to reduce overfitting. Further details on this procedure are provided in the Methods section.</p><p>A ranking of the most influential parameters in a data set can also be extracted from the RF analysis. This can be used to reduce the parameter set, and it provides information about which parameters contribute the most to distinguishing between the different data classes. Before training the classification models to evaluate the sensor arrays, we first performed a broad parameter selection process to narrow down the list. Upon training an RF model, information can be extracted regarding the importance of each of the input parameters. By permuting the response values of one parameter, the relationship is broken between it and the data classes. Measuring the resulting changes in model accuracy produces a permutation importance score for that parameter; a reduction in model accuracy after permutation indicates that the parameter did in fact contribute to explaining the data pattern. No change or even an improvement in model accuracy after permutation indicates that the parameter is not informative; it may be too much affected by noise, or the information it conveys may be unrelated to the data classes. This permutation process is performed for each parameter in turn, yielding an importance score for each. Parameters may be ranked by their scores to show their relative influence in the model. A similar ranking may also be obtained from PCA by using the magnitude of each parameter's loading coefficient in The frequency of how many of the top (a) or bottom (b) 33% of parameters were derived from each feature in the PCA predictor ranking. A weighted summation of the first two principal components was used to rank the predictors. Influential features include V DR , &#916;V D , C maxF , C maxR , and TR F , whereas V DF , C minF , C minR , and TR R do not appear highly influential. (c, d) The frequency of each parameter appearing in the top (c) or bottom (d) 33% of parameters for high accuracy RF models only (accuracy &#8805;86%). In contrast to PCA, C maxF , C maxR , TR F , and TR R are not highly favored by the algorithm. This suggests that the C max -and TR-derived parameters help to explain data variation other than gas type (such as variations between sensor array cards).</p><p>the most explanatory principal components (PCs). The larger a parameter's coefficient, the more influential it is in describing the data.</p><p>The entire data set of 25 analyte gas/concentration combinations and all parameters was input into both PCA and RF, and importance scores were calculated from both. These scores were ranked for each algorithm from most to least influential. For the PCA, the top and bottom 33% of the ranked parameters were recorded, and the number of parameters in each top/bottom group corresponding to each feature was tallied. These feature frequencies are plotted in Figure <ref type="figure">3a</ref> (top 33%) and Figure <ref type="figure">3b</ref> (bottom 33%). In the case of the RF, 200 models were trained on different randomly selected subsets of the data to improve stability, and importance scores were calculated for each model. For each RF run, these were ranked and divided into the top and bottom 33%, as with PCA. After all 200 RF runs were trained, the number of times that each feature-functionalization parameter appeared in the top or bottom 33% of a "good" model's importance scores was tallied. A "good" model was defined as a model that achieved a prediction accuracy &#8805;86%. These tallies are plotted in Figure <ref type="figure">3c,</ref><ref type="figure">d</ref>.</p><p>The PCA importance scores indicate that parameters derived from the V DR , C maxF , C maxR , and TR F features are frequently found in the most influential parameters for PCA, whereas parameters derived from the V DF , C minF , C minR , and TR R features are frequently found in the least influential parameters for PCA. However, neither the C max nor TR features (both forward and reverse sweep versions) are clearly influential in the RF models. This disparity between the PCA and RF rankings is likely due to the blind nature of the PCA; as an unsupervised method, PCA finds patterns in the data without any prior information about the true classes. However, RF is supervised, meaning that a priori knowledge is employed to identify the parameters that best discriminate between known classes. These results not only explain which features best describe the variations in sensor behavior between gas samples but also explain which features are perhaps describing confounding factors such as fabrication variations between arrays. Based on these rankings of parameter importance, those derived from the C max and TR features were omitted from most further analyses.</p><p>Gas and Concentration Distinctions with Unsupervised Learning. While we chose to not use PCA as a dimensionality reduction technique, it can still be used to examine what inherent patterns are present in the data. If patterns were found, we can then check if they match our expectations based on the background knowledge of what gases and concentrations were tested. We performed PCA on a data set containing all 25 gas-concentration classes, and all feature-functionalization parameters except those derived from the C max and TR features. Figure <ref type="figure">4</ref> shows the PCA score plots from various perspectives. Figure <ref type="figure">4a,</ref><ref type="figure">b</ref> shows the projection of the responses onto PCs 1 and 2, and Figure <ref type="figure">4c,</ref><ref type="figure">d</ref> shows the projection onto PCs 1 and 3. To show the tight cluster of points in more detail, Figure <ref type="figure">4b</ref>,d is the magnified version of Figure <ref type="figure">4a</ref>,c, respectively. Together, PCs 1, 2, and 3 explain 89.5% of the variation in the data. Each gas type has been indicated with a different color, and marker shapes indicate the gas concentrations (circles: 100%; triangles: 50%; diamonds: 10%; squares: 1% of the respective saturated vapor  <ref type="figure">(c,</ref><ref type="figure">d</ref>) 1 and 3, which all together explain 89.5% of the variation in the data. Marker symbols indicate gas concentration (circles: 100%; triangles: 50%; diamonds: 10%; squares: 1% saturated vapor concentration), and the colors indicate gas species as labeled. Arrows indicate the direction of concentration increase within a gas species. The ethanol and MEK groups are somewhat overlapped in PC 1 and PC 2; however, they are distinct in the much weaker PC 3. The plots in (b) and (d) are magnified versions of (a) and (c), respectively, to show the details of the tight cluster containing octane, 1-octene, toluene, nitrogen, and the low concentration measurements of the other three gases. The varying concentrations of octane and 1-octene form slightly distinct groups; however, the two species are indistinguishable within those groups.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>concentration. Table S1 provides a complete list of the corresponding concentrations in ppm of each gas).</head><p>Colored ellipses have been drawn around groups of points to emphasize the relationships between classes. Ethanol (dark blue), MEK (light green), and hexanal (orange) form lobes radiating outward from the center cluster. Arrows indicate an increase in gas concentration away from the center cluster, which is consistent with all gas species. The tight center cluster is comprised of the lowest concentrations of ethanol, MEK, and hexanal as well as the nitrogen control and all concentrations of toluene (light blue) and octane and 1octene (red and purple). Some overlap exists in the center cluster, namely, between toluene and the lowest concentrations of ethanol and MEK. Additionally, the lowest concentration of hexanal is nearly overlapping with the octane/1-octene cluster. These overlaps are likely due in part to the low response values obtained from those overlapping classes (see Figures <ref type="figure">S19-S22</ref>, which show the relative response magnitudes of all gases). However, nitrogen, toluene, and the octane/1-octene cluster are all well separated from each other.</p><p>Another interesting aspect of the results involves a comparison of the octane and 1-octene responses. While octane and 1-octene responses overlap with each other, it is interesting to note that each concentration group is somewhat separated. For instance, octane/1-octene at 1% (squares) is distinct from octane/1-octene at 10% (diamonds). Both of these are also somewhat distinct from 50% (triangles), which is in turn somewhat distinct from 100% (circles). This is not surprising given that these two hydrocarbons have the same number and connectivity of carbons and differ only by two hydrogens, as also evident from the very similar boiling points of 126 and 121 &#176;C for octane and 1-octene, respectively. Nevertheless, most classes are well separated in the PCA, which demonstrates that even an unsupervised algorithm, given no prior knowledge of the class labels, can find distinct differences between most gas species and concentrations.</p><p>Random Forest Classification Models. Subsequently, six RF models were trained, using different combinations of the data classes and parameters to examine different aspects of the data, described as follows and summarized in Table <ref type="table">S2</ref>: Model 1 was trained on the full data set, including all gases and concentrations, and all parameters. To examine how much the octane/1-octene confusion seen in the PCA analysis reduced the RF classification accuracy in Model 1, Model 2 was trained on a data set that excluded the 1-octene classes. Model 3 included all gases and concentrations, but excluded any parameters derived from C max and TR, to emphasize that these two features are not necessary for classification. Model 4 was trained on the same data set and reduced parameter set as Model 3; however, the labels of the training set were shuffled prior to model training. This method, referred to as Yscrambling, seeks to verify that the model is not producing high prediction accuracy by random chance. Shuffling the training labels breaks any relationships between the classes and the associated values in the input parameters. If the model trained on the shuffled labels achieves a similar accuracy as before shuffling, then the input parameters are likely very affected by noise or are otherwise unrelated to the data classes, and the model has evidently succeeded in classifying the data through random guessing. If, however, the accuracy from the shuffled-label model is much worse, this result would suggest a real relationship between the class labels and the input parameters. Model 5 was trained on the reduced-parameter data set (without parameters derived from C max or TR), including all gases, but only the two lowest concentrations of each (1% and 10%). This was to demonstrate that even when tasked with the most difficult challenge of classifying the classes with the smallest signals, the RF algorithm still produces a high accuracy. Finally, Model 6 was trained on all gases and concentrations and the reduced-parameter set (without C max -or TR-derived features), but the parameter set was further reduced to only include bare devices. This is to demonstrate the effect of the functionalizations on the classification accuracy. A summary of the data classes and parameters used in each of these models is provided in Table <ref type="table">S2</ref>.</p><p>The prediction accuracies of each of the six models are shown in Figure <ref type="figure">5a</ref> and Table <ref type="table">S2</ref>. To improve the stability of the results, 200 RF models were trained using different training and testing data sets chosen at random from each data set. The best accuracy of each model type is given by the bar chart, and the mean and standard deviations of all 200 models are shown in blue. To provide a baseline for success, the accuracy of a random chance model is shown in red for each model (see the Methods section for a description of their calculation). A confusion matrix from a successful Model 3 run (accuracy = 89%) is shown in Figure <ref type="figure">5b</ref> to illustrate the octane/1-octene confusion that is still present even in the supervised RF models.</p><p>All of the RF models (excluding the Y-scrambled Model 4) performed much better than their respective calculated random chance accuracy, giving confidence in the robustness of these models. The full Model 1 (full data set, full parameter set) produced an accuracy of 89%. When 1-octene was removed (Model 2), the accuracy was 97.6%. The cause of this improvement is evident in the confusion matrix in Figure <ref type="figure">5b</ref>, where the octane/1-octene classes have been highlighted. Even in the best model shown in the confusion matrix, the octane and 1-octene are frequently confused; however, only three misclassifications between other gases are observed. These confusions occur between two different concentrations of toluene (toluene 50% is misclassified as toluene 100%), and one misclassification between MEK 10% and toluene 50%.</p><p>By examining only the highlighted portion of Figure <ref type="figure">5b</ref>, we can calculate a 75% accuracy in discriminating between the 4 concentrations of octane and 1-octene. While this is not ideal, to our knowledge, few other studies using graphene gas sensors in this way have tested such chemically similar compounds with this level of success. Nallon et al. tested a group of chemically similar analytes consisting of 9 monosubstituted benzene compounds with a single graphene sensor and achieved a classification accuracy of 88% when using the RF algorithm as in our study. Considering that each analyte in Nallon et al. was tested at only one concentration, whereas we have tested each of our analytes at 4 concentrations each, we believe our accuracy of 75% is reasonable.</p><p>Kybert et al. tested four carboxylic acids differing only in the lengths of their carbon chains and two structural isomers of pinene differing only in the location of a double bond. Through visual examination of response versus concentration curves, they were able to differentiate between each of these compounds. In their study, their analyte exposure periods were 200 s long each. In our case, we have opted for faster sensing (&#8764;40 s per exposure period), which has reduced our sensitivity and likely made the discrimination of octane and 1-octene more difficult. Despite this, we have still demonstrated moderately successful discrimination ability between two highly similar chemical compounds.</p><p>Finally, we note that octane and 1-octene are chemically much more similar than the compounds tested in either of the two studies mentioned above. Octane and 1-octene differ by only two hydrogen atoms, a similarity also reflected by their very similar boiling points (121 &#176;C for 1-octene and 126 &#176;C for octane). In contrast, the boiling points for the benzene compounds tested by Nallon et al. range between 111 and 210 &#176;C, and the boiling points of the 4 carboxylic acids tested by Kybert et al. range between 141 and 237 &#176;C. Further improvements to our system are needed; however, we believe this level of discrimination between octane and 1-octene shows excellent progress toward a highly selective sensing array.</p><p>When the parameter set was reduced by removing those suggested by the broad selection in Figure <ref type="figure">3</ref>, the best possible accuracy in Models 1 and 3 is the same (89%), while the average accuracy is slightly improved in the reduced model (80% for Model 1, 82% for Model 3). The standard deviation also decreased in the reduced model (3.3 for Model 1, 2.9 for Model 3). These results further support the removal of the C max -and TR-derived parameters, since if those parameters were contributing useful information, we would expect the classification accuracy to decrease upon their removal. Next, Model 4 shows that Y-scrambling the training labels produces, on average, the same accuracy as expected from random chance. This indicates that the other models are not achieving high accuracy simply due to random chance.</p><p>Model 5 achieved an accuracy of 94%, slightly better than Model 3, which was trained on the same data and parameter set, except that Model 5 only included the lowest 2 concentrations of each gas (1% and 10%). As the lowest concentrations have the smallest response signals, this classification success indicates that the RF algorithm is effective for classifying gases even with indistinct signals. We note that these signals are not readily distinguishable upon visual inspection, as seen in Figures <ref type="figure">S21-S22</ref> and Figure <ref type="figure">S27</ref>, and yet the machine learning algorithm was still successful at classifying them with high success.</p><p>Finally, Model 6 achieved an accuracy of 63%. This is much better than this model's random chance of 4%; however, it is much lower than Model 3 (89%). The difference between these two models is that Model 3 uses data from the functionalized sensors, while Model 6 uses only data from the bare sensors. The reduced accuracy when using only bare devices demonstrates that the functionalizations have provided improved classification ability by increasing the selectivity to different analytes.</p><p>We note that in all of our machine learning algorithms, we have treated each concentration of each gas as an individual class for a total of 25 classes, rather than grouping all concentrations of one gas together into one class for a total of 7 classes. In this case, we are differentiating between larger and smaller amounts of an analyte, rather than the presence or absence of it. This would be helpful in a possible application such as disease diagnosis in breath; while certain analytes may be markers for some diseases, they are often still present in the breath of healthy patients, simply in larger or smaller amounts. For example, ethanol may be present in the breath of lung cancer patients in concentrations between 64 and 2160 ppb, but it may also appear in concentrations between 27 and 216 ppb in the breath of healthy subjects. <ref type="bibr">6</ref> Similarly, octane and hexanal may be present at 0.57-2.87 ppb (octane) and 0.68-1.47 ppb (hexanal) in lung cancer patients, but also appear in healthy subjects at 0.1-1.29 ppb and 0.18-0.35 ppb, respectively. Other applications, such as environmental monitoring for toxic chemicals, may have less of a need for detecting a wide range of concentrations (e.g., detecting NO 2 ).</p><p>Both the raw data and classification results described here demonstrate that we have achieved a high yield of graphene sensors and that the wide range of functionalizations applied have produced consistent alterations to the inherent electrical characteristics of the sensors. When including the 12 unfunctionalized devices per sensor card array (which were not included in the PCA and RF analyses), there were a total of 480 varactors tested across the 4 sensor array cards. Throughout the sequence of 25 measurement trials reported here, each using all 480 varactors, only 3.6% of response data points calculated are missing due to malfunctioning or damaged devices. This value encompasses device loss throughout 49 total measurement trials with these 4 sensor card arrays over the course of 7 months (including 24 additional measurement trials not reported here), and this level of loss is within the triplicated redundancy of each set of varactors on each array. This &gt;96% yield across many device uses is an improvement to other graphene array studies, which have shown &gt;90% yields across their sets of fabricated devices. <ref type="bibr">41,</ref><ref type="bibr">42</ref> Rather than using an exhaustive feature selection followed by dimensionality reduction as is common, we have first started with a short list of features, intuitively selected based on previously described physical characteristics. <ref type="bibr">55,</ref><ref type="bibr">58</ref> Even so, the large number of functionalizations combined with the 9 extracted features creates a set of over 300 parameters, and some narrowing down was necessary. We have opted to reduce the number of parameters by examining feature importance in a supervised model. The broad parameter selection procedure performed here (Figure <ref type="figure">3</ref>) indicates that those derived from the C max and TR features are not informative when trying to discriminate between the gas species and concentrations tested. The nonimportance of these features could be related to their correlation with minor fabrication differences between devices and cards, such as lithographic variations or HfO 2 thickness. For instance, thinner HfO 2 would tend to increase both C max and TR, and changes in these features would also tend to be amplified upon gas exposure.</p><p>Limitations and Optimization. While our sensing system provided high accuracy, further improvements in the sensor array understanding are needed. For instance, a detailed comparison of the importance of each functionalization was not made here. Due to the complexity of the system, this study was focused on demonstrating the feasibility of the graphene arrays for rapid gas sensing, demonstrating the usefulness of machine learning at pulling information from low-magnitude signals that the human eye cannot see, and exploring the importance of each feature. Further experiments might include measurements of single gases with single sensors to carefully characterize the relationship between each receptor and each gas. This would enable fine-tuning of the functionalization selection for arrays intended to detect varying target analytes.</p><p>Future application of the graphene varactor arrays to realworld detection will of course require experiments with more complex and realistic gaseous environments. Here, we have shown the arrays can discriminate between single gases; however, they must be able to detect target analytes in a variety of backgrounds. Human breath, for example, contains many interfering compounds not relevant to the diagnosis of diseases (e.g., water vapor, oxygen). As such, binary mixtures of a target analyte plus water vapor are a very appropriate test for gas sensor arrays, especially those intended for lung cancer or other breath-related diagnoses. <ref type="bibr">47,</ref><ref type="bibr">48</ref> Further relevant steps might include testing binary mixtures with varying background gas compositions or mixtures that imitate specific environments, such as simplified "human breath" mixtures. Ultimately, it will be crucial to understand the underlying sensor mechanisms between O 2 , H 2 O, and VOCs. However, for this study, we felt that sensing in the presence of O 2 and H 2 O added an unnecessary level of complexity and could also introduce additional unintentional biases in the classification analysis. We chose to focus on the pure VOC analytes before adding interfering environments; however, future work should include water and oxygen. Additionally, experiments at lower VOC concentrations may be useful as disease-indicating analytes such as those tested here may be present in breath at levels ranging from 10 0 to 10 3 ppb, <ref type="bibr">59</ref> whereas the lowest levels we have tested here are around 10 2 ppm. However, one advantage of an array-based approach is that each sensor responds to the entire chemical content of the sample, and therefore highly sensitive detection of any individual analyte may not be necessary.</p><p>Future work will need to be done to more fully explain the relationships between the analyte/receptor binding and the resulting sensor responses. Although we have sought to describe the expected response behaviors in Table <ref type="table">1</ref>, these analyte binding/response interactions are quite complex. Figure <ref type="figure">S28</ref> also provides some information regarding the differences in V DR functionalization response to each of the six gases. There are some variations between how each functionalized sensor responds to each gas; however, the signal magnitudes in this data set are quite low, and therefore the differences are not particularly distinct. While the machine learning methods described in this work are effective at pulling information from noisy data, this still means that a careful examination of these analyte binding/response interactions must be pursued with data collected at slower speeds. Some preliminary work on this is seen in Figure <ref type="figure">S29</ref>, where the slower measurement speed allows us to see the difference in sensitivities between each functionalization more clearly. Future work will explore these relationships with additional gases. In the present work, however, we have opted to explore the sensor arrays' capabilities for rapid data collection.</p><p>Additional future studies will also involve investigating the time response of the sensor arrays. Our current sensor system was specifically designed for fast readout; however, investigating the sensor response over a longer time could allow additional time for weak sensor responses to emerge above the noise level, though at the expense of rapid identification. Figures <ref type="figure">S29</ref> and<ref type="figure">S30</ref> show preliminary work on this, demonstrating that when measured more slowly, the sensitivity of the varactors is improved, and they can achieve an LOD of 19 ppm when exposed to ethanol over a time period of 170 s. In previous studies, we have observed that sensor responses can occur over the course of several tens of minutes and that these longer measurement times could be used to improve sensitivity to lower concentration gases or further improve detection accuracy at fixed concentration. At 5 s per sweep, our 8 exposure sweeps took a total of 40 s, whereas others have used 30 s, <ref type="bibr">38</ref> 60 s, <ref type="bibr">43</ref> 200 s, <ref type="bibr">41</ref> 10 min, <ref type="bibr">48</ref> and 10-20 min. <ref type="bibr">47</ref> In all of these examples, the total exposure time was approximately equal or much greater than ours. Furthermore, our sensor system provides a full C-V sweep for each of the 108 sensors within this 5 s time frame. This is a massively parallel measurement configuration, and such a system demonstrating rapid, near-simultaneous measurement of such a large number of graphene devices is not common in the literature. However, improvements in the response time are possible with our system, as this platform offers numerous possible ways that the parallelism could be traded off for speed, and vice versa. In addition to changing the total sweeping time, the C-V measurement voltage step size could be decreased from 50 mV to improve V D measurement accuracy, or the capacitance measurement integration time could also be increased to reduce noise in the measurement system. An example of the trade-off between measurement time and sensitivity is shown in Figures <ref type="figure">S18, S29,</ref> and<ref type="figure">S30</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>CONCLUSION</head><p>This work has shown the successful classification of VOC species and concentrations with a 108-device graphene-based sensor array swept at rapid speeds. The array was functionalized with 36 distinct chemical receptors to improve selectivity, and all devices were probed virtually simultaneously to collect a cross-reactive data set for input into machine learning algorithms. These are among the largest graphene sensor arrays of this size and functionalization diversity. Of the 36 receptors utilized, 26 have not previously been applied to graphene sensors. We have also reported the parameters necessary for applying each functionalization to graphene layers with high surface coverage. Despite a reduction in signal magnitudes due to the rapid measurement, these large-scale arrays have successfully produced signal patterns distinct enough to discriminate between gas analytes with machine learning algorithms, and we have demonstrated consistent behavior of 400 individual devices. Nine data features were studied for their influence in machine learning algorithms, and a parameter selection process was performed to determine which of these was not important for discriminating gas species. Two features were deemed unimportant, likely due to the influence of fabrication variations on their values. After omitting parameters derived from these features, gas discrimination was still highly successful between nearly all classes (prediction accuracy = 89%), although the classification algorithm had more difficulty in distinguishing two highly similar gases (octane and 1-octene). By omitting the 1-octene, this confusion was eliminated, and the resulting model achieved 98% prediction accuracy. However, discrimination between octane and 1-octene with an accuracy of 75% was still possible between 4 concentrations of these two gases, despite their extreme chemical similarity. By comparing model results to a Y-scrambled classification model, we found strong support that the models are robust and are in fact identifying true patterns in the data. We emphasize the importance of selecting an appropriate analysis method that can utilize signals that cannot readily be distinguished by visual inspection. Further optimization of these sensors is possible to balance speed with sensitivity. These results are an important step toward developing large arrays of graphene-based chemical gas sensors for use in health care and environmental monitoring applications.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>METHODS</head><p>Sensor Fabrication and Functionalization. Sensors were fabricated on 150 mm-diameter silicon wafers using a hybrid process that started in a commercial fabrication facility. The process began by first growing a 1 &#956;m-thick SiO 2 layer by wet thermal oxidation. The local gate electrode was created next by first patterning and etching the SiO 2 to form 300 nm-deep trenches. Tungsten was deposited by CVD then planarized using chemical-mechanical polishing. Next, 9.9 nm of a composite dielectric stack consisting of Al 2 O 3 and HfO 2 were deposited by ALD. The composite stack was used to suppress crystallization of the HfO 2 , thus minimizing gate leakage current. After dielectric deposition, unless otherwise noted, the remaining process steps were performed at the Minnesota Nano Center. Via openings were patterned and etched through the dielectric to the tungsten local gate using a BCl 3 reactive-ion etch. Next, single-layer CVD graphene was transferred onto the substrate by an outside vendor using an aqueous transfer process. Next, the graphene regions were patterned and etched using an O 2 plasma. Contact electrodes consisting of Cr/ Au (10/80 nm) were then patterned and lifted off. Finally, additional layers of Cr/Al (10/1000 nm) were deposited onto the contact pads for subsequent wire bonding. Completed wafers were diced by an external vendor using a laser-based dicing process that did not require an additional surface protection layer to be deposited.</p><p>Functionalization of each sensor chip was performed as follows: For each receptor, at least 3 mL of the functionalization solution was prepared in 20 mL glass vial. Ten chips were immersed in the solution at a time, and the vial was covered with aluminum foil and left at room temperature overnight for self-assembly. After immersion, the functionalized chips were dipped into 3 mL of receptor-free selfassembly solvent twice to rinse off any residual functionalization solution and then transferred into a 50 mL solvent bath in a crystallizing dish. The solvent bath was placed onto an orbital shaker set to 90 rpm and shaken for 1 min to wash off any excess selfassembly solution. Afterward, the chips were removed from the bath and dried with nitrogen flow. It was confirmed for several functionalizations with contact angle measurements and XPS that this process did not compromise the monolayer quality. Functionalized sensor chips were attached to PCBAs using EPO-TEK H70E epoxy and ball bonded with 0.001" Au wire for testing.</p><p>Sensor Testing. Sensor arrays were vacuum baked at 100 &#176;C and &#8764;10 -6 Torr overnight to remove any adsorbed gases. Once cooled, the sensors were purged for at least 10 min with N 2 prior to testing. To test an array, the sensor card was loaded into the test stand and attached to a gas flow of 1 L/min N 2 , and pretest C-V curves were measured. After the pretest measurements were complete, part of the N 2 gas flow was directed through a bubbler containing the neat liquid of the desired test gas. This generated a saturated vapor, which was then diluted to the desired concentration in N 2 . Once the exposure measurements were complete, the gas flow was changed to pure N 2 for the posttest measurements. The total flow rate remained constant at 1 L/min throughout the experiment. Measurement data were automatically transferred to the PC for offline analysis.</p><p>Data Preparation. To extract the Dirac point (V D ) for each C-V curve, first the numerical derivative of the curve was calculated and smoothed with a 5-point moving average. V D is the voltage at which this derivative is equal to zero. A linear polynomial was fitted across the two data points on either side of the zero crossing of the derivative, and V D was calculated as the root of this line. The minimum capacitance (C min ) is the capacitance at V D and was calculated using a second degree polynomial fit to a 5-point window on either side of the zero crossing of the derivative. The minimum value of this polynomial fit is taken as C min .</p><p>The maximum capacitance (C max ) was not calculated at a fixed voltage, since both lateral and vertical movement of the C-V curves would influence the value. Instead, C max was defined as the capacitance 1.3 V to the left of V D for a given curve. Next, the tuning range was the ratio of the maximum and minimum capacitances, TR = C max /C min . Finally, the lateral hysteresis of the C-V curve was calculated by taking the difference of the forward and reverse Dirac points, &#916;V D = V DR -V DF .</p><p>For a positive response, the response value was calculated as the difference between the maximum value of exposure and the last data point of pretest, or max(exposure)last(pretest). Similarly, negative responses were calculated as the difference between the minimum value of exposure and the last data point of pretest, or min(exposure) last(pretest). Signals were inspected by eye to determine the appropriate response type for each of the nine features and for each of the seven gases (six VOCs plus the nitrogen control). A total of 63 gas-feature combinations were used, and all but 8 of these were determined to have positive responses. The eight negative responses were: forward and reverse C min (ethanol, MEK, and toluene), V DF (hexanal), and &#916;V D (nitrogen).</p><p>To input the data into machine learning algorithms, it was arranged into an N &#215; p matrix, where N is the number of observations and p is the number of predictors, or parameters. The number of observations was N = 300: There were 6 gases at 4 concentrations each, plus the nitrogen control. Each of these conditions was measured using four sensor array cards, and the three replicate varactors per functionalization on each array were considered as distinct observations. The total number of possible parameters was p = 324: A parameter was defined as a specific feature-functionalization pair, and there were 9 features and 36 functionalizations. However, some varactors across the four arrays did not produce viable data&#65533;reasons for this include varactors that were physically damaged, and damaged or missing wire bonds connecting the varactors to the probes. The number of missing data points due to this damage was 3.6% of the entire calculated response data set. To retain as many parameters as possible while reducing the number of response observations with missing data, parameters with 24 or more missing data points were removed from the data set. The final remaining number of parameters was p = 251.</p><p>Normalization was applied to the data set before any analysis was done. A Z-score normalization was used to normalize groups of parameters derived from the same feature: All V DF -derived parameters were normalized as a group, all V DR -derived parameters were normalized as a group, and so on. The Z-score was calculated as</p><p>, where x is the raw response value, is the sample mean of the feature group (e.g., the mean of all V DF -derived parameters), and is the sample standard deviation of the feature group. Machine Learning Models. PCA analysis was done using the pca function in the Statistics and Machine Learning Toolbox of MATLAB (version R2021a). To rank the parameters by influence, a weighted summation of each parameter's coefficients from the first two principal components (PCs) was calculated. Each PC was first weighted by its respective percentage explained (how much of the data's variance each PC explains), and then the absolute values of the corresponding coefficients from the two weighted PCs were summed together. The sums for each coefficient were sorted from largest to smallest, and the top 33% and bottom 33% of coefficients (84 coefficients, respectively) were selected. Finally, the number of the top and bottom coefficients that corresponded to each feature were tallied.</p><p>RF classification was done using MATLAB's TreeBagger function in the Statistics and Machine Learning Toolbox. This function creates a bootstrap aggregated ("bagged") ensemble of decision trees. Each ensemble of decision trees was grown using 200 trees, and node splits were calculated using an interaction test (interaction-curvature in MATLAB). Surrogate splits were not used. To further improve the stability of the results, 200 of these bagged tree ensembles were fitted, each with different training and test data sets. Each training set was randomly sampled without replacement, and the remaining data points were held out as the test set. The ratio of training set to test set was 66.7%:33.3%. For the complete data set of 25 classes, this ratio resulted in 200:100 data points. For the reduced data set of 21 classes (excluding 1-octene), this ratio resulted in 168:84 data points. In both cases, this meant that each class was represented by 8 data points in the training set and by 4 data points in the test set. After training each decision tree ensemble, the labels of the test set were predicted with MATLAB's predict function. The predicted labels were compared to the true test set labels, and the prediction accuracy was calculated as the number of labels that were correctly predicted divided by the total number of data points.</p><p>To test the Y-scrambled prediction accuracy, the same process was followed, but the training labels were randomly shuffled for each tree ensemble before model training. Random chance accuracies were calculated for each of the reported models by assuming that a classifier predicting labels randomly would be correct at a rate of 1/C, where C is the number of classes in the data set. For example, a random classifier is expected to correctly predict the labels of a 2-class balanced data set approximately 50% of the time. In this study, all reported models used balanced data sets.</p><p>To rank the RF parameters by influence, out-of-bag permutation importance values were calculated from each tree ensemble using the ComputeOOBPredictorImportance feature of MATLAB's TreeBagger function. This method measures how much the prediction accuracy of a single parameter changes when that parameter's out-of-bag values are permuted. These importance values are calculated for each tree in the ensemble, and the results for each parameter are averaged across the entire ensemble. After this was calculated for a given ensemble, the parameters were ranked by importance, and the top and bottom 33% of parameters were recorded. After training all 200 ensembles, the frequency of each parameter appearing in the top or bottom 33% of importance was tallied for only the ensembles with high prediction accuracies (accuracy &#8805;86%).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>ASSOCIATED CONTENT</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>* s&#305; Supporting Information</head><p>The Supporting Information is available free of charge at <ref type="url">https://pubs.acs.org/doi/10.1021/acsnano.2c10240</ref>.</p><p>Raman mapping and spectra of graphene used in this study; circuit block diagram of the array measurement system; images of dies and chips; Langmuir absorption curves; additional C-V curves under different conditions; Additional statistical analysis; gas species and associated concentrations; LOD analysis; summary of RF models (PDF)</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>https://doi.org/10.1021/acsnano.2c10240ACS Nano 2022,<ref type="bibr">16,</ref>[19567][19568][19569][19570][19571][19572][19573][19574][19575][19576][19577][19578][19579][19580][19581][19582][19583] </p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_1"><p>https://doi.org/10.1021/acsnano.2c10240</p></note>
		</body>
		</text>
</TEI>
