<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>State of health and remaining useful life prediction of lithium-ion batteries with conditional graph convolutional network</title></titleStmt>
			<publicationStmt>
				<publisher>Elsevier</publisher>
				<date>03/01/2024</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10512338</idno>
					<idno type="doi">10.1016/j.eswa.2023.122041</idno>
					<title level='j'>Expert Systems with Applications</title>
<idno>0957-4174</idno>
<biblScope unit="volume">238</biblScope>
<biblScope unit="issue">PD</biblScope>					

					<author>Yupeng Wei</author><author>Dazhong Wu</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Graph convolutional networks (GCNs) have been increasingly used to predict the state of health (SOH) and remaining useful life (RUL) of batteries. However, conventional GCNs have limitations. Firstly, the correlation between features and the SOH or RUL is not considered. Secondly, temporal relationships among features are not considered when projecting aggregated temporal features into another dimensional space. To address these issues, two types of undirected graphs are introduced to simultaneously consider the correlation among features and the correlation between features and the SOH or RUL. A conditional GCN is built to analyze these graphs. A dual spectral graph convolutional operation is introduced to analyze the topological structures of these graphs. Additionally, a dilated convolutional operation is integrated with the conditional GCN to consider the temporal correlation among the aggregated features. Two battery datasets are used to evaluate the effectiveness of the proposed method. Experimental results show that the proposed method outperforms other machine learning methods reported in the literature.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.1.">Background</head><p>Lithium-ion batteries have been increasingly adopted as an energy resource for electric vehicles (EVs), drones, and portable electronics due to their high electricity density, lightweight, long lifetime, and low self-discharge rate <ref type="bibr">(Cui, Gao, Mao, &amp; Wang, 2022;</ref><ref type="bibr">Liu, Wu, Zhang, &amp; Chen, 2014;</ref><ref type="bibr">Xi, Wang, Fu, &amp; Mi, 2022)</ref>. Over the past decades, many efforts have been made in the design of lithium-ion batteries to improve energy efficiency <ref type="bibr">(Guo, Song, &amp; Chen, 2009;</ref><ref type="bibr">Wang &amp; Cao, 2008)</ref>. However, similar to other engineered systems, the performance of lithium-ion batteries will deteriorate over time, also known as battery aging, due to physical and chemical changes as a result of daily usage and operations <ref type="bibr">(He, Williard, Chen, &amp; Pecht, 2014;</ref><ref type="bibr">Lee, Kim, &amp; Lee, 2022)</ref>. The battery aging problem may result in catastrophic failures such as fire hazards and outbursts. As a result, it is critical to estimate the state of health (SOH) and predict the remaining useful life (RUL) of batteries <ref type="bibr">(Chao &amp; Chen, 2011)</ref>.</p><p>Over the past few years, data-driven methods have shown superior performance for SOH and RUL predictions of lithium-ion batteries <ref type="bibr">(Khaleghi et al., 2022;</ref><ref type="bibr">Shen, Sadoughi, Li, Wang, &amp; Hu, 2020;</ref><ref type="bibr">Wang, Zhao, Yang, &amp; Tsui, 2017)</ref>. Data-driven SOH and RUL prediction methods can be classified into two categories: filter-based approaches and One of the advantages of filter-based methods is their self-correction capabilities, however, these methods have limitations in dealing with large volumes of data <ref type="bibr">(Lee, Kwon, &amp; Lee, 2023;</ref><ref type="bibr">Park, Lee, Kim, Park, &amp; Kim, 2020;</ref><ref type="bibr">Wei, Dong, &amp; Chen, 2017</ref>). To address this issue, machine learning methods <ref type="bibr">(Greenbank &amp; Howey, 2023;</ref><ref type="bibr">Luo, Fang, Deng, &amp; Tian, 2022)</ref> especially deep learning methods <ref type="bibr">(Chang, Wang, Jiang, &amp; Wu, 2021;</ref><ref type="bibr">Xu, Yang, Fei, Huang, &amp; Tsui, 2021)</ref>, such as convolutional neural network (CNN) <ref type="bibr">(Al-Dulaimi, Zabihi, Asif, &amp; Mohammadi, 2019;</ref><ref type="bibr">Ma et al., 2023)</ref>, recurrent neural network (RNN) <ref type="bibr">(Lu, Xiong, Tian, Wang, Hsu, Tsou, Sun, &amp; Li, 2022)</ref>, gated recurrent network (GRU) <ref type="bibr">(Ungurean, Micea, &amp; Carstoiu, 2020)</ref>, long short term memory (LSTM) <ref type="bibr">(Wei, 2023;</ref><ref type="bibr">Xia, Song, Zheng, Pan, &amp; Xi, 2020;</ref><ref type="bibr">Zhang, Jiang, et al., 2022)</ref>, bidirectional LSTM <ref type="bibr">(Guo, Wang, Yao, Fu, &amp; Ning, 2023)</ref>, and bidirectional GRU <ref type="bibr">(Zhang et al., 2023)</ref>, have been utilized to predict SOH and RUL, where the parameters of these methods can be trained and tuned with backpropagation <ref type="bibr">(Ma, Yao, Liu, &amp; Tang, 2022)</ref> or metaheuristics <ref type="bibr">(Raskar &amp; Nema, 2022;</ref><ref type="bibr">Zamfirache, Precup, Roman, &amp; Petriu, 2022)</ref>. For instance, <ref type="bibr">Li et al. (2022)</ref> presented a hybrid deep learning approach, where one-dimensional CNN was integrated with LSTM to identify features related to battery degradation phenomenon, and Kolmogorov-Smirnov test was implemented to infer the prior distribution of hyperparameters used in the presented hybrid deep learning approach. <ref type="bibr">Eddahech, Briat, Bertrand, Del&#233;tage, and Vinassa (2012)</ref> proposed an RNN-based model to estimate the SOH of a high electricity density battery cell. The RNN was implemented to track the degradation trajectory of several batteries in hybrid EV and EV usage. <ref type="bibr">Cheng, Wang, and He (2021)</ref> integrated the empirical mode decomposition (EMD) approach with a LSTM network for accurate SOH and RUL predictions of batteries. The voltage and current measurements were fed into a LSTM model for SOH predictions, and the predicted SOH was fed into the EMD approach to eliminate the randomness brought by the capacity regeneration phenomenon so that the RUL of a battery can be predicted precisely. <ref type="bibr">Duong and Raghavan (2018)</ref> combined a metaheuristic optimization approach with particle filtering methods to address the problem of sample degeneracy, thus enhancing the RUL prediction performance for lithium-ion batteries. Experimental results have demonstrated that the proposed method outperforms traditional metaheuristic approaches, such as the optimized particle filtering method.</p><p>One of the issues with the aforementioned deep learning methods is that they are not effective in revealing feature correlations. Such a correlation can be used to identify and aggregate features with high affinity and similarity to enhance the precision and robustness of a predictive model <ref type="bibr">(Li, Zhao, Sun, Yan, &amp; Chen, 2020;</ref><ref type="bibr">Wei &amp; Wu, 2023)</ref>. To reveal this correlation, undirected graphs have been increasingly used in current literature <ref type="bibr">(Wei, Wu, &amp; Terpenny, 2023)</ref>, where the graph nodes represent feature vectors and the graph edges denote the similarity or affinity between features. To handle these undirected graphs more effectively, the Graph Convolutional Network (GCN) is increasingly used to predict the RUL of complex systems, due to its ability to leverage the topology of undirected graphs and provide better insights into data correlation <ref type="bibr">(Wang, Cao, Xu, &amp; Liu, 2022)</ref>. For example, <ref type="bibr">Wei and Wu (2022b)</ref> proposed an optimization model to build an undirected graph by simultaneously minimizing the graph density and maximizing the graph entropy. The GCN was adopted to handle the constructed graph and predict the SOH and RUL of lithium-ion batteries. The experimental results have shown that the GCN enables accurate predictions of SOH and RUL for batteries. Similarly, <ref type="bibr">Li, Zhao, Sun, Yan, and Chen (2021)</ref> constructed multiple undirected graphs to represent the sensor and feature correlations in condition monitoring data collected from aircraft engines. The GCN was employed to handle these constructed undirected graphs and predict RUL for the engines. Numerical studies have demonstrated the capabilities of GCNs in dealing with these undirected graphs.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.2.">Research gap</head><p>While the effectiveness of GCNs in predicting RUL has been demonstrated, two issues remain to be addressed to enhance their robustness and precision. Firstly, the undirected graphs are usually constructed to represent the correlation among features. However, these graphs are not able to reveal the correlation between features and SOH or RUL. Revealing the correlation between features and SOH or RUL can help identify the most significant features that directly impact SOH or RUL. By understanding this correlation, predictive models can prioritize the important features during the prediction process, resulting in improved accuracy <ref type="bibr">(Wei, Wu, &amp; Terpenny, 2021)</ref>. Most of the current methods are unable to consider this correlation because true SOH and RUL data are only available during training instead of testing. Secondly, traditional GCNs typically stack several spectral graph convolutional layers, where each layer performs two operations: aggregation and projection. The features with high affinity or similarity are first aggregated based on a pre-constructed undirected graph, and then the aggregated features are projected onto another higher-dimensional space. While the feature correlation can be effectively considered through the repeated aggregation of similar features, traditional GCNs do not take into account the temporal correlation of these aggregated features. To address these issues, this work introduced a conditional GCN with a dilated convolution operation. To address the first issue, this work constructed two types of undirected graphs. The first type of graphs (denoted as &#57907; 1 ) was used to consider the correlation among features, and the second type of graphs (denoted as &#57907; 2 ) was used to consider the correlation between features and SOH/RUL. Two feature spaces were extracted from the two types of graphs, respectively. Then, a KLdivergence was used to minimize the distance between two feature spaces so that the feature space extracted from &#57907; 1 can approximate the feature space extracted from &#57907; 2 . Therefore, even without the SOH/RUL, the correlation between the features and SOH/RUL can be taken into account when the feature space extracted from &#57907; 1 is used for testing. To address the second issue, this work implemented the dilated convolutional operation to consider temporal correlations after aggregating similar features in GCNs. The dilated convolutional operation was implemented for two primary reasons. First, the condition monitoring data collected for battery health management is a time series. The dilated convolutional operation in the temporal convolutional network (TCN) is designed to deal with time series data, making it capable of capturing long-term dependencies in time series data compared to GRU or LSTM <ref type="bibr">(Zhen, Fang, Zhao, Ge, &amp; Xiao, 2022)</ref>. Second, the dilated convolutional operation can expand the receptive field of convolutional layers without significantly increasing the number of parameters <ref type="bibr">(Bai, Kolter, &amp; Koltun, 2018)</ref>. This capability ensures that the proposed method is less susceptible to overfitting, which is particularly crucial when dealing with limited battery health data. The constraint of having limited battery health data is a common challenge in realworld applications due to the high cost and time-consuming nature of battery testing and monitoring. The major contributions of this work are outlined below:</p><p>&#8226; Two types of undirected graphs, denoted as &#57907; 1 and &#57907; 2 , were constructed. &#57907; 1 was used to capture correlations among features, while &#57907; 2 was used to capture correlations between features and SOH/RUL. &#8226; The KL-divergence was introduced to minimize the distance between two feature spaces extracted from the two types of graphs so that the feature space extracted from &#57907; 1 can approximate the feature space extracted from &#57907; 2 . &#8226; The dilated convolutional operation was implemented after aggregating similar features in GCNs to increase the receptive field of convolutional layers, thereby allowing for considering the temporal correlation among the aggregated features.</p><p>The remaining sections of this paper are organized in the following manner. Section 2 introduces the proposed conditional graph convolutional network with dilated convolution operations. Section 3 utilizes the NASA battery dataset to demonstrate the effectiveness of the proposed method, and Section 4 uses the Oxford battery degradation dataset to further demonstrate the efficiency of the proposed method. Section 5 concludes with a summary of this work and an examination of future work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Conditional graph convolutional network with dilated convolutional operations</head><p>In this section, the conditional GCN with the dilated convolutional operations is introduced. Specifically, two types of undirected graphs are constructed in Section 2.1: standard undirected graphs and conditional undirected graphs. Next, in Section 2.2, the dual spectral graph convolution operation is presented, which is designed to deal with the topological structures of these graphs. Then, in Section 2.3, the dilated convolution operation is introduced. Finally, in Section 2.4, the training procedure for the conditional GCN with the dilated convolutional operation is outlined.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Constructing undirected graphs</head><p>In the field of battery health management, the most commonly used condition monitoring data include voltage, current, and temperature. Although numerous studies utilize both charge and discharge cycle data to estimate SOH and predict RUL, the availability of condition monitoring data during charge cycles is limited. Therefore, in this work, only condition monitoring data from discharge cycles was used for SOH estimations and RUL predictions. To capture the degradation trajectory of a battery, this work extracts several temporal features from each discharge cycle. These features include time to reach the minimum voltage, time discharged under constant or variable current modes, time to reach the maximum temperature, voltage decrease rate, and temperature increase rate. These features have been proven to be successful in estimating SOH and predicting RUL <ref type="bibr">(Audin et al., 2021;</ref><ref type="bibr">Wei &amp; Wu, 2022b)</ref>.</p><p>Constructing undirected graphs involves initializing two types of graphs: standard undirected graphs and conditional undirected graphs. Standard undirected graphs are used to consider the correlation among features, while conditional undirected graphs are used to consider the correlation between features and SOH or RUL. The standard undirected graphs is the first graph set, denoted as &#57907; 1 , consist of two graphs: the positively connected graph G 1,+ and the negatively connected graph G 1,-. Mathematically, &#57907; 1 can be represented as &#57907; 1 = {G 1,+ , G 1,-}. To construct the graph G 1,+ and the graph G 1,-, this work first selects two different temporal feature vectors, &#119839; &#119894;,&#119896; &#8712; R 1&#215;&#119879; &#119894; and &#119839; &#119894;,&#119896; &#8242; &#8712; R 1&#215;&#119879; &#119894; , from the extracted temporal feature matrix &#119813; &#119894; &#8712; R &#119870;&#215;&#119879; &#119894; . Then their covariance is calculated, denoted as &#119992; 1 &#119896;,&#119896; &#8242; , which can be defined using Eq. ( <ref type="formula">1</ref>). In the equation, &#119870; represents the number of extracted temporal features, &#119879; &#119894; denotes the number of discharge cycles of battery cell &#119894;, &#119891; &#119894;,&#119895;,&#119896; and &#119891; &#119894;,&#119895;,&#119896; &#8242; refers to the &#119896;th and &#119896; &#8242; th feature of the feature matrix &#119813; &#119894; for battery cell &#119894; in discharge cycle &#119895;, &#119873; represents the number of battery cells, and f..&#119896; and f..&#119896; &#8242; represents the expectation of the &#119896;th and &#119896; &#8242; th feature of the feature matrix. It is worth noting that covariance is utilized to determine edges because it can assist in identifying positively and negatively correlated features. Positively correlated features refer to those with the same monotonicity, while negatively correlated features represent those with different monotonicity. By identifying features with the same or different monotonicity and aggregating them in spectral graph convolutional operations, it can be guaranteed that features with the same monotonicity are summed together, and features with different monotonicity are subtracted from each other. In this way, the monotonicity of the aggregated features can be maximized, potentially improving prediction performance.</p><p>Then, the edge in the graph G 1,+ and the graph G 1,-can be determined by using Eq. ( <ref type="formula">2</ref>), where &#120598; 1 denotes a non-negative threshold for determining the edge. If the covariance &#119992; 1 &#119896;,&#119896; &#8242; is positive and greater than &#120598; 1 , a positive edge between the feature node &#119896; and the feature node &#119896; &#8242; in the graph G 1,+ is added, and the corresponding edge &#119890; 1,+ &#119896;,&#119896; &#8242; in the graph G 1,+ between these two nodes is assigned as 1. If the covariance &#119992; 1 &#119896;,&#119896; &#8242; is negative and less than -&#120598; 1 , a negative edge between the feature node &#119896; and the feature node &#119896; &#8242; in the graph G 1,-is added, and the corresponding edge &#119890; 1,- &#119896;,&#119896; &#8242; between these two nodes is assigned as -1.</p><p>(2)</p><p>The graph construction process will be repeated</p><p>times until all combinations of two features in the feature matrix have been examined. The constructed graphs G 1,+ and G 1,-are respectively represented as</p><p>to a set of feature nodes and &#119813; &#119894; represents the extracted feature matrix. Moreover, E 1,+ and E 1,-refer to sets of edges for the graph G 1,+ . and the graph G 1,-, respectively.</p><p>The conditional undirected graphs &#57907; 2 also consist of two graphs: the positively connected graph G 2,+ and the negatively connected graph G 2,-. Mathematically, &#57907; 2 is the second graph set and can be represented as &#57907; 2 = {G 2,+ , G 2,-}. To initialize the graph G 2,+ and the graph G 2,-, a conditional feature matrix &#119810; &#119894; &#8712; R (&#119870;+1)&#215;&#119879; &#119894; is firstly constructed, where</p><p>and &#119858; &#119894; refers to a time series vector of SOH or RUL. Similar to the step used to build the first graph set &#57907; 1 , two different vectors &#119836; &#119894;,&#119896; and &#119836; &#119894;,&#119896; &#8242; are selected from the conditional feature matrix &#119810; &#119894; and examine their covariance &#119992; 2 &#119896;,&#119896; &#8242; . Then, the edges in the graph G 2,+ and the graph G 2,-can be determined using Eq. (3), where &#120598; 2 represents a non-negative threshold for edge determination. If the covariance &#119992; 2 &#119896;,&#119896; &#8242; is positive and greater than &#120598; 2 , a positive edge between node &#119896; and node &#119896; &#8242; in the graph G 2,+ is added, and the corresponding edge &#119890; 2,+ &#119896;,&#119896; &#8242; in G 2,+ is assigned a value of 1. If the covariance &#119992; 2 &#119896;,&#119896; &#8242; is negative and less than -&#120598; 2 , a negative edge between node &#119896; and node &#119896; &#8242; in the graph G 2,-is added, the corresponding edge &#119890; 2,- &#119896;,&#119896; &#8242; between these two nodes is assigned a value of -1.</p><p>(3)</p><p>The graph construction process will be repeated</p><p>times until all combinations of two vectors in the conditional feature matrix &#119810; &#119894; have been examined. The constructed graphs G 2,+ and G 2,-are respectively represented as G 2,+ = {V 2 , E 2,+ , &#119810; &#119894; } and G 2,-= {V 2 , E 2,-, &#119810; &#119894; }, where V 2 refers to a set of nodes, and &#119810; &#119894; represents the conditional feature matrix. Moreover, E 2,+ and E 2,-refer to sets of edges for the graph G 2,+ and the graph G 2,-, respectively.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Dual spectral graph convolutional operation</head><p>To utilize the topological structures of the first graph set &#57907; 1 and the second graph set &#57907; 2 , the dual spectral graph convolutional operation is introduced to handle &#57907; 1 and &#57907; 2 , respectively. The dual spectral graph convolutional operation performed on the graph set &#57907; &#119899; can be written as Eq. ( <ref type="formula">4</ref>), where &#119899; = 1 refers to the first graph set &#57907; 1 and &#119899; = 2 refers to the second graph set &#57907; 2 . In this equation, &#119813; (&#120596;)   &#119894; &#8712; R &#119870;&#215;&#120591; refers to the &#120596;th sampled temporal feature matrix using a sliding window with a step length of one and a window length of &#120591;; &#119810; (&#120596;)   &#119894; &#8712; R (&#119870;+1)&#215;&#120591; refers to the &#120596;th sampled conditional feature matrix using a sliding window with a step length of one and a window length of &#120591;; &#8497; and &#8497; -1 denote the Fourier transform and its inverse transform; &#119840; &#119899; represents the graph filter from the graph set &#57907; &#119899; ; &#57919; is a set that refers to + or -, therefore, &#119892; &#119899;,+ and &#119892; &#119899;,-respectively denote the graph filter from the graph G &#119899;,+ and the graph G &#119899;,-, and || represents the concatenation operator.</p><p>The graph Fourier transform for the graph filter &#119892; &#119899;,&#119904; can be represented as &#119880; &#119879; &#119899;,&#119904; &#119892; &#119899;,&#119904; , where &#119880; &#119899;,&#119904; refers to the eigenvectors of the Laplacian matrices. The graph Fourier transform for the sampled matrix can be written as &#119880; &#119879;  &#119899;,&#119904; &#119831; &#119894; . By substituting the eigenvector &#119880; &#119899;,&#119904; into Eq. ( <ref type="formula">4</ref>), Eq. ( <ref type="formula">5</ref>) can be obtained, where &#120556; &#119899;,&#119904; is the vector that stores the eigenvalues of the Laplacian matrix, and &#120595; &#119899;,&#119904; represents a collection of parameters provided by the graph filter &#119892; &#119899;,&#119904; . For the graph G &#119899;,&#119904; , the normalized Laplacian matrix &#8466; &#119899;,&#119904; is given by Eq. ( <ref type="formula">6</ref>), where &#119816; is the identity matrix; A &#119899;,&#119904; denotes the adjacency matrix of the graph G &#119899;,&#119904; , and the elements of this adjacency matrix are derived from the edge set E &#119899;,&#119904; .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>||</head><p>Moreover, &#119863; &#119899;,&#119904; refers to the degree matrix of A &#119899;,&#119904; ; the diagonal entry of the degree matrix can be written as Eq. ( <ref type="formula">7</ref>), where &#119890; 1,+ &#119896;,&#119896; &#8242; and &#119890; 1,- &#119896;,&#119896; &#8242; are obtained from Eq. ( <ref type="formula">2</ref>); and &#119890; 2,+ &#119896;,&#119896; &#8242; and &#119890; 2,- &#119896;,&#119896; &#8242; are obtained from Eq. ( <ref type="formula">3</ref>).</p><p>Because the Laplacian matrix &#8466; &#119899;,&#119904; is a real and symmetric in nature, the eigendecomposition of this Laplacian matrix can be expressed as Eq. ( <ref type="formula">8</ref>).</p><p>By inserting Eq. ( <ref type="formula">8</ref>) into Eq. ( <ref type="formula">5</ref>), Eq. ( <ref type="formula">9</ref>) can be derived.</p><p>Previous research indicated that the first-order Chebyshev polynomials is capable of reducing the computational cost of the spectral graph convolution <ref type="bibr">(Hammond, Vandergheynst, &amp; Gribonval, 2011)</ref>. By using the Chebyshev polynomials, Eq. ( <ref type="formula">9</ref>) can be rewritten as Eq. ( <ref type="formula">10</ref>), where &#119966; &#119901; is the &#119901;th order Chebyshev polynomial; L&#119899;,&#119904; represents the scaled Laplacian matrix.</p><p>To enable a feature to aggregate its adjacent features along with itself, this work incorporates a self-connection in the positively connected graphs G &#119899;,+ for all &#119899;. The updated adjacency matrix can be denoted as &#195;&#119899;,&#119904; = A &#119899;,&#119904; + &#119816; when &#119904; = +, and &#195;&#119899;,&#119904; = A &#119899;,&#119904; when &#119904; = -. In this work, a self-connection is not included in the negatively connected graphs G &#119899;,-for all &#119899; because the self-aggregation process is already completed in G &#119899;,+ . Next, &#120595; &#119899;,&#119904; is rewritten in a matrix format, and the dual graph convolutional operations can be expressed as Eq. ( <ref type="formula">11</ref>).</p><p>In Eq. ( <ref type="formula">10</ref>), &#194;&#119899;,&#119904; can be written as D-1&#8725;2 &#119899;,&#119904; &#195;&#119899;,&#119904; D-1&#8725;2 &#119899;,&#119904; , where D&#119899;,&#119904; denotes the degree matrix of the updated adjacency matrix &#195;&#119899;,&#119904; . Additionally, &#120569; &#119899;,&#119904; &#8712; R &#120591;&#215;&#120591; &#8242; represents the graph filter parameters in matrix format. The dual spectral graph convolutional layer is introduced based on the dual spectral convolutional operation in Eq. ( <ref type="formula">11</ref>). To enhance the robustness of this layer, a bias weight vector &#119835; &#119899;,&#119904; and an activation function &#120590; is incorporated. The output of a single dual spectral graph convolutional layer is expressed as Eq. ( <ref type="formula">12</ref>), where &#120590; represents the activation function and &#119835; &#119899;,&#119904; denotes the bias weight vector.</p><p>In summary, there are two graph sets (&#57907; 1 and &#57907; 2 ) are initialized, each consisting of two graphs when &#119899; = 1, 2 and &#119904; = +, -. The dual spectral graph convolutional operation is then performed on each graph set. As a result, there will be four outputs generated by the proposed dual spectral graph convolutional operation. These outputs can be mathematically represented as &#119815; 1,+ &#119894; , &#119815; 1,- &#119894; , &#119815; 2,+ &#119894; , and &#119815; 2,- &#119894; . Fig. <ref type="figure">1</ref> shows an example of the introduced dual spectral graph convolutional operation for the graph sets &#57907; 1 and &#57907; 2 with one single layer for the purpose of illustration. First of all, two different vectors are selected from the feature matrix &#119813; &#119894; to examine their covariance. The edges of the graph G 1,+ and the graph G 1,-in the graph set &#57907; 1 are determined based on the covariance and the threshold &#120598; 1 . Likewise, two different vectors are selected from the conditional feature matrix &#119810; &#119894; to examine their covariance. The edges of the graph G 2,+ and the graph G 2,-in the graph set &#57907; 2 are determined based on the covariance and the threshold &#120598; 2 . Next, the dual spectral graph convolution operation is utilized twice to respectively handle the topological structure of the graphs in the graph set &#57907; 1 and the graph set &#57907; 2 . For the positively connected graph G &#119899;,+ , &#8704;&#119899;, a positive self-connected adjacency matrix &#195;&#119899;,+ , &#8704;&#119899;, is generated, and all the vectors in the sampled feature matrix or the sampled conditional feature matrix are aggregated and projected based on the constructed matrix &#195;&#119899;,+ , &#8704;&#119899;. With respect to the negatively correlated graph G &#119899;,-, &#8704;&#119899;, a negative adjacency matrix &#195;&#119899;,-, &#8704;&#119899;, is generated, and all the vectors in the sampled feature matrix or the sampled conditional feature matrix are aggregated and projected based on the matrix &#195;&#119899;,-, &#8704;&#119899;. The outputs after performing the proposed dual spectral graph convolutional operation are denoted as &#119815; 1,+ &#119894; , &#119815; 1,- &#119894; , &#119815; 2,+ &#119894; , and &#119815; 2,- &#119894; .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.">Dilated convolutional operation</head><p>The dilated convolutional operation is the most critical component of the temporal convolutional network, which has been demonstrated to outperform canonical recurrent neural networks such as RNN and LSTM in revealing temporal correlations in time series data <ref type="bibr">(Bai et al., 2018)</ref>. Therefore, the dilated convolutional operation is employed to more effectively consider the temporal correlation of the aggregated and projected features &#119815; 1,+ &#119894; , &#119815; 1,- &#119894; , &#119815; 2,+ &#119894; , and &#119815; 2,- &#119894; after performing the proposed dual spectral graph convolutional operation. The dilated convolutional operation is similar to the typical convolutional operation that uses filter matrices sweeping over the entire input matrix. The output of the dilated convolutional operation D can be mathematically represented as Eq. ( <ref type="formula">13</ref>), where &#119815; &#119899;,&#119904; &#119894;,&#120574; is the &#120574;-th part of the resulting tensors &#119815; &#119899;,&#119904;  &#119894; from the dual spectral graph convolutional operation. &#119870;+1) . Here, &#119870; is the total number of columns in the extracted feature matrix, &#119865; refers to the filter size of the dilated convolutional operation, &#8462; (&#120574;)  &#119886;,&#119887; represents one element in the matrix &#119815; &#119899;,&#119904; &#119894;,&#120574; , &#119908; &#119886;,&#119887;,&#120575; is one of the weight elements in the &#120575;-th filter matrix, and &#119863; represents the dilation factor. This work sets &#120575; = 1, &#8230; , &#120549;, where &#120549; refers to the number of filters provided in the dilated convolutional operation, and sets &#120574; = 1, &#8230; , &#120548; , where &#120548; = &#120591; &#8242; + &#119865; -1 refers to the total number of input matrices multiplied by the filter matrices. Therefore, the output of the dilated convolutional operation is a &#120548; -by-&#120549; matrix.</p><p>Moreover, the dilated convolutional filter matrix is a matrix with sparsity, which can be mathematically represented as Eq. ( <ref type="formula">14</ref>), where the weight is trainable if &#119887; = 1 + &#120587; &#8901; &#119863; and &#120587; = 1, &#8230; , &#120561;, otherwise the weight is not trainable and equals zero constantly. In addition,   the relation between the dilation factor &#119863; and the filter size &#119865; can be mathematically represented as 1 + &#120561; &#8901; &#119863; = &#119865; ;</p><p>Fig. <ref type="figure">2</ref> shows an example of two continuous dilated convolutional operations for illustration purposes. The first dilated convolutional operation has the dilation factor &#119863; 1 = 1 and the filter size &#119865; 1 = 3; and the second dilated convolutional operation has the dilation factor &#119863; 2 = 3 and the filter size &#119865; 2 = 5. The first dilated convolutional operation is performed on the tensors &#119815; &#119899;,&#119904; &#119894; , &#8704;&#119899;, &#119904; generated by the dual spectral graph convolutional operations. After performing the first dilated convolutional operations, the ReLU activation function and the dropout function are adopted, then the resulting tensor can be represented as &#119822; &#119899;,&#119904;  &#119894;,1 &#8712; R &#120548; 1 &#215;&#120549; 1 , where &#120549; 1 refers to the amount of filters in the first dilated convolutional layer and &#120548; 1 is the reduced time length after performing the first dilated convolutional operation and &#120548; 1 equals &#120591; &#8242; + &#119865; 1 -1. Next, the second dilated convolutional operation is performed on the tensor &#119822; &#119899;,&#119904; &#119894;,1 , &#8704;&#119899;, &#119904; generated by the first dilated convolutional operation. After performing the second dilated convolutional operations, the ReLU activation function and the dropout function are implemented, then the resulting tensor can be represented as &#119822; &#119899;,&#119904; &#119894;,2 &#8712; R &#120548; 2 &#215;&#120549; 2 . Here, &#120549; 2 refers to the amount of filters in the second dilated convolutional operation and &#120548; 2 is the reduced time length after performing the second dilated convolutional operation and &#120548; 2 equals &#120548; 1 + &#119865; 2 -1. Next, &#119822; &#119899;,+ &#119894;,&#119966; and &#119822; &#119899;,- &#119894;,&#119966; are concatenated as a vector &#119822; &#119899; &#119894; for all &#119899;, which can be mathematically written as Eq. ( <ref type="formula">15</ref>), where &#119966; represents the number of dilated convolutional operations has been used.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4.">Training the condition graph convolutional network with dilated convolutional operations</head><p>In summary, the proposed method initializes two sets of graphs: the undirected graph set &#57907; 1 and the conditional undirected graph set &#57907; 2 .</p><p>Each graph set consists of two undirected graphs representing positively and negatively connected graphs. The purpose of &#57907; 1 is to consider the correlation among features, while &#57907; 2 considers the correlation between features and SOH or RUL. To handle the topological structures of these two graph sets, the method introduces the dual spectral graph convolutional operation and adopts the dilated convolutional operation. The dual spectral graph convolutional operation is applied to &#57907; 1 and &#57907; 2 , and the resulting tensors for the battery unit &#119894; are denoted as &#119822; 1 &#119894; and &#119822; 2 &#119894; , respectively. The dilated convolutional operation allows effective consideration of the temporal correlation among the aggregated features provided by the dual spectral graph convolutional operation. To train the proposed conditional GCN with dilated convolutional operation, the resulting tensors &#119822; &#119899; &#119894; for all &#119899; are flattened and passed through a fully connected (FC) layer for SOH estimations and RUL predictions. This process can be mathematically represented as Eq. ( <ref type="formula">16</ref>),</p><p>where &#375;(&#120596;) &#119894;,&#119899; refers to the estimated SOH or predicted RUL provided by the &#119899;th graph set for battery unit &#119894; in the &#120596;th sample, &#120590; is the activation function, &#119830; &#119899; denotes the weight matrix of the FC layer for the &#119899;th graph set, and &#119835; &#119899; denotes the bias vector of the FC layer for the &#119899;th graph set. Therefore, the training loss &#57912; 1 for the first graph set &#57907; 1 and the training loss &#57912; 2 for the second graph set &#57907; 2 can be written as shown in Eq. ( <ref type="formula">17</ref>), where &#119868; refers to the total number of battery units used for training, and &#120570; denotes the total number of discharge cycles used for training.</p><p>The dual spectral graph convolutional operation performed on the second graph set &#57907; 2 involves the SOH and RUL in the conditional feature matrix. While the second graph set &#57907; 2 can be used for training, it cannot be used for predicting because the true SOH and RUL are not available during prediction. Therefore, only the first graph set &#57907; 1 can be used for prediction. To leverage the correlation between features and SOH or RUL from the second graph set &#57907; 2 during prediction, this work introduces a KL-divergence loss to minimize the divergence between &#119822; 1 &#119894; and &#119822; 2 &#119894; , which can be written as Eq. ( <ref type="formula">18</ref>).</p><p>In this equation, &#119901; &#120569; 1 (&#119822; 1 &#119894; |&#119813; (&#120596;) &#119894; ) represents the probability distribution of &#119822; 1 &#119894; given the sampled feature matrix &#119813; (&#120596;)  &#119894; . Here, &#120569; 1 denotes the collection of parameters in the first dual spectral graph convolutional operation with dilated convolutional operation performed on the first graph set &#57907; 1 . Similarly, &#119901; &#120569; 2 (&#119822; 2 &#119894; |&#119810; (&#120596;) &#119894; ) denotes the probability distribution of &#119822; 2 &#119894; given the sampled conditional feature matrix &#119810; (&#120596;)  &#119894; , where &#120569; 2 represents the collection of parameters in the second dual spectral graph convolutional operation with dilated convolutional operation performed on the second graph set &#57907; 2 . Since it is infeasible to directly determine the conditional distribution of &#119822; 1 &#119894; and &#119822; 2 &#119894; , it is commonly assumed that these two distributions in the KL-divergence follow normal distributions <ref type="bibr">(Kusner, Paige, &amp; Hern&#225;ndez-Lobato, 2017;</ref><ref type="bibr">Wei &amp; Wu, 2022a)</ref>. This assumption can be written as Eq. ( <ref type="formula">19</ref>), where &#120583; 1 &#119894;,&#120596; and &#120564; 1 &#119894;,&#120596; represent the resulting tensors from the first dilated convolutional operation and are used to sample &#119822; 1 &#119894; , referring to the mean and variance of &#119822; 1 &#119894; . Similarly, &#120583; 2 &#119894;,&#120596; and &#120564; 2 &#119894;,&#120596; represent the resulting tensors from the second dilated convolutional operation and are used to sample &#119822; 2 &#119894; , referring to the mean and variance of &#119822; 2 &#119894; .</p><p>By utilizing the reparameterization trick <ref type="bibr">(Huang, Wu, Wang, &amp; Tan, 2013;</ref><ref type="bibr">Kingma &amp; Welling, 2013)</ref>, the KL-divergence can be expressed as Eq. ( <ref type="formula">20</ref>), where &#119993; represents the dimensionality of the learned deep-level representations, and &#119905;&#119903;(&#8901;) indicates the trace of a matrix.</p><p>Then, the overall training loss consists of a triplet loss, which includes a KL-divergence loss and two prediction losses. The triplet loss can be represented as Eq. ( <ref type="formula">21</ref>).</p><p>Next, the obtained training loss is utilized to train the proposed method.</p><p>The training process of the proposed method involves two training steps. In the first training step, the collection of parameters &#120569; 1 and &#120569; 2 in the first and second spectral graph convolutional operations with dilated convolutional operation, as well as the parameters in the FC layers, can be updated using the gradient descent method. The updating process can be described as Eq. ( <ref type="formula">22</ref>), where &#120572; represents the learning rate.</p><p>In the second training step, the single training loss &#57912; 1 is used and the learned &#120569; 1 from the first training step is used to retrain the parameters &#119830; &#119899; and &#119835; &#119899; in the corresponding FC layer. This process can be mathematically represented as Eq. ( <ref type="formula">23</ref>).</p><p>After completing the training process, the trained first spectral graph convolutional operation with dilated convolutional operations and the retrained FC layer are used to estimate the SOH and predict the RUL of batteries. In summary, the proposed method employs two dual spectral graph convolutional networks with dilated convolutional operations. The inputs of the first dual spectral graph convolutional network include temporal features extracted from the condition monitoring data and the initialized graph set &#57907; 1 , while its output is a predicted SOH/RUL. The inputs of the second dual spectral graph convolutional network are the extracted temporal features, SOH/RUL, and the initialized graph set &#57907; 2 , yielding another predicted SOH/RUL as the output. During the training process, both dual spectral graph convolutional networks with dilated convolutional operations are used to minimize prediction errors and reduce the distance between the two feature spaces extracted from these networks using the triplet training loss. In contrast, during the testing phase, only the first dual spectral graph convolutional network with dilated convolutional operations is used for SOH/RUL predictions.</p><p>Fig. <ref type="figure">3</ref> illustrates the two steps used to train the conditional GCN. In the first training step, condition monitoring data is used to initialize two graph sets, &#57907; 1 and &#57907; 2 . &#57907; 1 captures the correlation among features, while &#57907; 2 reflects the correlation between features and the SOH or RUL. For each graph set, the proposed dual spectral graph convolutional operation is employed to aggregate data with high similarity, resulting in four tensors: &#119815; 1,+ &#119894; , &#119815; 1,- &#119894; , &#119815; 2,+ &#119894; , and &#119815; 2,- &#119894; . These four tensors are subsequently fed into the dilated convolutional operation to consider the temporal correlation of the aggregated data, resulting in another four tensors: &#119822; &#119894;,1 and &#375;(&#120596;) &#119894;,2 . The triplet loss is employed to train the entire framework and update all trainable parameters, including two prediction losses and one KLdivergence loss. In the second training step, all parameters in the dual spectral graph convolutional operation with the dilated convolutional operation for &#57907; 1 are frozen and transferred. Only the first prediction loss, &#57912; 1 , is used to retrain the parameters in the corresponding FC layer. During prediction, only the trained dual spectral graph convolutional operation with the dilated convolutional operation for &#57907; 1 and the retrained FC layer are used to make SOH and RUL predictions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Case study I</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Data description</head><p>The battery dataset released by the NASA Ames Prognostics Center of Excellence (PCoE) <ref type="bibr">(Saha &amp; Goebel, 2007)</ref> was ued to demonstrate the effectiveness of the conditional GCN. This dataset includes three subsets. Subset 1 includes condition monitoring data collected from four lithium-ion batteries <ref type="bibr">(Battery No. 5,</ref><ref type="bibr">No. 6,</ref><ref type="bibr">No. 7,</ref><ref type="bibr">and No. 18)</ref>, subset 2 includes data obtained from four batteries <ref type="bibr">(Battery No. 29,</ref><ref type="bibr">No. 30,</ref><ref type="bibr">No. 31,</ref><ref type="bibr">No. 32)</ref>, and subset 3 includes data obtained from three batteries <ref type="bibr">(Battery No. 25,</ref><ref type="bibr">No. 26,</ref><ref type="bibr">and No. 27</ref>). These lithiumion batteries underwent three distinct operational conditions: charging, discharging, and impedance. In the charging and discharging cycles, current, voltage, and temperature data were collected. For all three subsets, the charging process was executed using a Constant Current (CC) mode at a rate of 1.5 A until the voltage reached 4.2 V, and then switched to a Constant Voltage (CV) mode until the current dropped below 20 mA.</p><p>For subset 1, the discharge process was performed with a CC mode of 2 A until the voltage of the batteries reached 2.7 V for Battery No. 5, 2.5 V for Battery No. 6, 2.2 V for Battery No. 7, and 2.5 V for Battery No. 18. For subset 2, the discharge procedure was performed at a CC level of 4 A until the voltage reached 2.0 V, 2.2 V, 2.5 V, and 2.7 V for Batteries <ref type="bibr">No. 29,</ref><ref type="bibr">No. 30,</ref><ref type="bibr">No. 31,</ref><ref type="bibr">and No. 32</ref>, respectively. This work used the data collected during the discharge cycles only. Fig. <ref type="figure">4</ref> shows voltage, current, and temperature readings during the discharge cycles of Battery No. 5.   For subset 3, the discharge process was performed under a square wave loading profile with a frequency of 0.05 Hz and an amplitude of 4 A, with a duty cycle of 50%. The process continued until the voltage reduced to 2.0 V, 2.2 V, 2.5 V, and 2.7 V for batteries <ref type="bibr">No. 25,</ref><ref type="bibr">No. 26,</ref><ref type="bibr">and No. 27,</ref><ref type="bibr">respectively.</ref>  From this figure, it can be observed that the voltage does not monotonically decrease due to the 0.05 Hz square wave loading profile, and such a loading profile may bring some randomness and difficulty in SOH estimations. In addition, it can also be observed that the trajectory of the measured data alternates with the increase in the number of discharge cycles. For example, as depicted in Fig. <ref type="figure">5(c</ref>), the time required to reach the maximum temperature in the initial discharge cycles is longer than the time needed to reach the maximum temperature in 20 discharge cycles. Therefore, the extracted temporal features can still be utilized when predicting.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Y. Wei and D. Wu</head><p>Table <ref type="table">1</ref> shows the operating conditions for three different subsets in charge and discharge cycles and the corresponding battery indices. In summary, subset 1 and subset 2 performed the discharging process under a constant loading profile, and subset 3 performed the discharging process under a square wave loading profile. In addition, with respect to all the battery datasets, the charging process was conducted under a CC mode.</p><p>The rate of degradation of lithium-ion batteries increased through repeated cycles of charging and discharging, leading to capacity loss or capacity fading. Fig. <ref type="figure">6</ref> shows the degradation trajectories of capacity for batteries in three battery subsets. With respect to batteries listed in subset 1, run-to-failure tests were conducted, where the experiments ended under the condition that the battery capacity decreased by 30%. The capacity of these batteries was 2 Ahr at its peak, with an end-oflife (EOL) capacity of 1.4 Ahr. With respect to batteries listed in subset 2, run-to-failure tests were conducted, where the experiments were terminated under the condition that the battery capacity decreased by 15%. The capacity of these batteries was 2 Ahr at its peak, with an end-of-life capacity of 1.7 Ahr. In this case study, the SOH is estimated and the RUL is predicted for batteries in subset 1 and subset 2, and the SOH is estimated only for batteries in subset 3 because run-tofailure tests were not conducted on batteries in subset 3. Moreover, four-fold cross-validation was performed for the batteries in subset 1 and subset 2, while three-fold cross-validation was performed for the batteries in subset 3 to thoroughly evaluate the effectiveness of the proposed method on these batteries.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Feature extraction and network structure</head><p>In real-world applications, condition monitoring data in charge cycles are often not available. Therefore, we extracted features from voltage, current, and temperature measurements collected during discharge cycles only. These features include the time to reach minimum voltage, discharge duration under constant or variable current conditions, time to reach maximum temperature, voltage decrease rate, and temperature increase rate. These features have been demonstrated to be effective in tracking the capacity trajectory of a battery <ref type="bibr">(Audin et al., 2021;</ref><ref type="bibr">Wei &amp; Wu, 2022b)</ref>. For example, the voltage decrease rate is calculated as the voltage drop divided by the discharge time, while the temperature increase rate is determined by dividing the temperature increment by the discharge time.</p><p>Next, the extracted features were fed into the conditional GCN with dilated convolutional operations to estimate the SOH and RUL. The details on the network structure and hyperparameters used in the case study for all three battery subsets are provided in Table <ref type="table">2</ref>. These hyperparameters were determined using the grid search method. As shown in this table, Batch refers to the batch size of 100. &#119870; = 5 represents the number of extracted features. &#120591; = 20 represents the window size of the sampling window. &#120591; &#8242; = 100 refers to the dimensionality after projection in the dual spectral graph convolutional operation. &#120548; &#119894; and &#120548; &#119895; respectively refer to the reduced time length after performing the &#119894;th and the &#119895;th dilated convolutional operations. In addition to these aforementioned parameters, the filter sizes in the dilated convolutional layers are &#119865; 1 = &#119865; 2 = &#119865; 3 = 10. The amount of filters in the dilated convolutional layers are &#120549; 1 = &#120549; 2 = &#120549; 3 = 100, and the dilation factors are &#119863; 1 = 1, &#119863; 2 = 2, and &#119863; 3 = 4. Moreover, the learning rate is set as 5 &#215; 10 -3 , the threshold level &#120598; is set as zero for simplification, and the Adam optimizer is adopted to train the proposed conditional graph convolutional network with dilated convolutional operation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">SOH estimation</head><p>Fig. <ref type="figure">7</ref> shows the SOH estimations for Battery <ref type="bibr">No. 5,</ref><ref type="bibr">No. 6,</ref><ref type="bibr">No. 7,</ref><ref type="bibr">and No. 18 in subset 1,</ref><ref type="bibr">and Battery No. 29,</ref><ref type="bibr">No. 30,</ref><ref type="bibr">No. 31,</ref><ref type="bibr">and No. 32</ref> in subset 2. The SOH estimations for batteries in subset 1 start from the 20th discharge cycle since subset 1 has more discharge cycles. The SOH estimations for batteries in subset 2 start from the 5th discharge cycle since subset 2 has fewer discharge cycles. From Fig. <ref type="figure">7</ref>, it can be observed that the proposed method is capable of estimating the SOH of lithium-ion batteries with high accuracy. For example, for Battery No. 5, the estimated SOH matches the true SOH of 0.915 when 31 discharge cycles have been observed. Likewise, during the 20th discharge cycle of Battery No. 29, the estimated SOH is 0.906, matching closely with the true SOH of 0.907.</p><p>Fig. <ref type="figure">8</ref> shows the SOH estimations for Battery <ref type="bibr">No. 25,</ref><ref type="bibr">No. 26,</ref><ref type="bibr">and No.</ref> 27 in subset 3. The SOH estimations for batteries in subset 3 start from the 5th discharge cycles since these batteries have fewer discharge cycles. Fig. <ref type="figure">8</ref> shows that the proposed method accurately estimates the SOH of batteries subjected to the square wave load in discharge cycles. For example, when 10 discharge cycles have been observed, the estimated SOH for Battery No. 25 is 0.895, while the actual SOH is 0.914. Although there is a gap between the estimated SOH trajectory and the true SOH trajectory for Battery No. 25 and Battery No. 27, the proposed method is still capable of tracking the fluctuation of the trajectories for these two batteries. There are two reasons for the estimation error. The first reason is that the square wave load in discharge cycles introduces randomness in SOH estimations. The second reason is that the experiments on these batteries terminated at the very early degradation stage, as a result, this subset provides limited training data.</p><p>To further illustrate the performance of the conditional GCN with dilated convolutional operations, an ablation study was conducted. Table <ref type="table">3</ref> lists the methods used in this ablation study: CGCN-DCO refers to the proposed method, GCN-DCO represents the graph convolutional network with dilated convolutional operations, and CGCN denotes the conditional graph convolutional network. The comparison between CGCN-DCO and GCN-DCO aims to demonstrate the effectiveness of the proposed conditional graphs, while the comparison between CGCN-DCO and CGCN aims to showcase the effectiveness of the dilated convolutional operation. It should be noted that the conditional graphs should not be used in the testing phase because they include SOH/RUL information, which is not available during testing. In addition to this ablation study, a comparative study was also conducted to show that the proposed method outperforms other deep learning methods, such Y. <ref type="bibr">Wei and D. Wu</ref>     <ref type="bibr">)</ref>, and R2-score (R-squared score) for SOH estimations across all batteries in three subsets, utilizing the methods presented in Table <ref type="table">3</ref> and other deep learning methods. It can be concluded from this table that the proposed conditional graphs and the dilated convolutional operation can enhance the SOH estimation performance of the graph convolutional network. For instance, the average MAE of the CGCN-DCO method for all batteries is 0.0078. In contrast, the average MAE for CGCN and GCN-DCO for all batteries is 0.02228 and 0.01926, respectively. By employing the proposed conditional graphs and the dilated convolutional operation, the average prediction RMSE can be reduced by up to 73.5%, and the average R2-score can increase by up to 29.4%. Furthermore, as indicated in Table <ref type="table">4</ref>, the proposed method also outperforms other deep learning methods. For instance, the average RMSE for the proposed method across all batteries is 0.00913, while the average RMSE for Transformer, MGCN, and CNN+LSTM are 0.08039, 0.01254, and 0.127, respectively. Fig. <ref type="figure">9</ref> shows a spider plot of five evaluation metrics used to assess the SOH estimation performance of the methods employed in the ablation study. These evaluation metrics include RMSE, MAE, MSE, MedAE, and R2-score. Based on this figure, it can also be concluded that both</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 4</head><p>The RMSE, MAE, MedAE, MSE, and R2-score of SOH estimations for all batteries in three subsets with using methods in Table <ref type="table">3</ref>   the proposed conditional graph and the use of dilated convolutional operations can improve prediction performance across all evaluation metrics. For example, in the first subset of batteries, the proposed CGCN-DCO achieves a MedAE of 0.00383, while the MedAE for GCN-DCO and CGCN is 0.00422 and 0.00987, respectively. Furthermore, in the second subset of batteries, the R2 Score of the proposed CGCN-DCO is 0.89677, whereas the R2-scores of GCN-DCO and CGCN are 0.89572 and 0.18335, respectively. In addition, a two-sample t-test was conducted to demonstrate that the proposed conditional graph and dilated convolutional operations can statistically improve prediction performance. The two-sample t-test indicates that the &#119901;-value for the hypothesis test, with the null hypothesis stating that the average RMSE of the proposed CGCN-DCO is equal to or greater than the average RMSE of CGCN, is 0.086 at a significance level of 0.1. Similarly, the &#119901;-value for the hypothesis test, with the null hypothesis stating that the average RMSE of the proposed CGCN-DCO is equal to or greater than the average RMSE of GCN-DCO, is 0.095 at a significance level of 0.1. This implies that the confidence level for rejecting the null hypothesis is greater than 0.9, indicating that the proposed method can significantly reduce prediction errors.</p><p>In addition, a comparison of the proposed CGCN-DCO with other methods reported in the literature was also conducted. Table <ref type="table">5</ref> shows the RMSE of the SOH estimation of the proposed method (CGCN-DCO), graph convolutional network with dual attention mechanism (GCN-DA), multiple Gaussian regression models (MGP), logic regression with Gaussian process regression (LRGP), gradient boosting decision tree (GBDT), and Gaussian process (GP), and health index informed attention model (HIIA). Based on the RMSE of the SOH estimation presented in this table, it can be concluded that the proposed method (CGCN-DCO) exhibits superior performance compared to previously established methods. For instance, the average RMSE of the proposed method is 0.0049, whereas the average RMSE of other methods documented in the literature ranges between 0.0135 and 0.0493, further highlighting the effectiveness of the proposed method.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.">RUL prediction</head><p>In this section, the effectiveness of the proposed method in predicting RUL for subsets 1 and 2 is demonstrated. Because the number of cycles for each battery in subset 3 is very small, 3 does not have Y. <ref type="bibr">Wei and D. Wu</ref> Table <ref type="table">5</ref> The RMSE of the SOH estimation of the proposed method and other methods reported in the literature.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Battery</head><p>CGCN-DCO CGCN GCN-DCO GCN-DA <ref type="bibr">(Wei &amp; Wu, 2022b)</ref> MGP <ref type="bibr">(Zheng &amp; Deng, 2019)</ref> LRGP <ref type="bibr">(Yu, 2018)</ref> GBT <ref type="bibr">(Qin, Zhao, &amp; Liu, 2022)</ref> GP <ref type="bibr">(Yu, 2018)</ref> HIIA <ref type="bibr">(Wei, 2023)</ref> No sufficient data to perform RUL predictions. For example, the number of discharge cycles for subset 3 is only 28, compared to over 100 for subset 1 and 40 for subset 2. Fig. <ref type="figure">10</ref> shows the box plot of RUL prediction results for Battery <ref type="bibr">No. 5,</ref><ref type="bibr">No. 6,</ref><ref type="bibr">No. 7,</ref><ref type="bibr">No. 18 in subset 1 and Battery No. 29,</ref><ref type="bibr">No. 30,</ref><ref type="bibr">No. 31,</ref><ref type="bibr">No. 32</ref> in subset 2. Based on this figure, it can be observed that the proposed method achieved high prediction accuracy because the predicted RUL aligns with the true RUL, and the range of the box plot includes the true RUL. It should be noted that the RUL prediction accuracy is not very high in comparison with the SOH prediction accuracy. This is because RUL predictions involve predicting the future end-of-life (EOL) of a battery, which can be very challenging.</p><p>In addition, in this case study, the dataset is relatively small since each subset includes data collected from only four batteries.</p><p>Moreover, an ablation study was conducted to demonstrate the effectiveness of the proposed conditional graphs and the use of the dilated convolutional operation. In addition to this ablation study, a comparative study was also conducted to show that the proposed method outperforms other deep learning methods. Table <ref type="table">6</ref> shows the RMSE, MAE, MSE, MedAE, and R2-score for the RUL prediction using methods listed in Table <ref type="table">3</ref> and other deep learning methods. From this table, it can be concluded that both the proposed conditional graphs and the dilated convolutional operation improve the RUL prediction performance. For example, the average prediction RMSE of the proposed method is 10.45, while the average RMSE of the CGCN and GCN-DCO is 16.32 and 11.18, respectively. In addition, based on Table <ref type="table">6</ref>, it can also be observed that the proposed method outperforms other deep learning methods. For instance, the average MAE of the proposed method is 9.15, whereas the average MAE of Transformer, <ref type="bibr">MGCN,</ref><ref type="bibr">and CNN+LSTM is 10.06,</ref><ref type="bibr">9.65,</ref><ref type="bibr">and 11.13,</ref><ref type="bibr">respectively. Fig. 11</ref> shows the spider plot of five evaluation metrics used to assess the RUL prediction performance of the methods employed in this ablation study. These evaluation metrics include RMSE, MAE, MSE, MedAE, and R2-score. Based on this figure, it can also be concluded that both the proposed conditional graph and the use of dilated convolutional operations can improve RUL prediction performance across all evaluation metrics. For example, in the first subset of batteries, the proposed CGCN-DCO achieves a R2-score of 0.51, whereas the R2 Score for CGCN and GCN-DCO are -0.19 and 0.46, respectively. Furthermore, in the second subset of batteries, the MedAE of the proposed CGCN-DCO is 3.48, while the MedAE of CGCN and GCN-DCO are 4.35 and 3.84, respectively.</p><p>To further demonstrate the effectiveness of the proposed method, a comparison was conducted between the proposed CGCN-DCO and other methods reported in the literature. Table <ref type="table">7</ref> provides the RMSEs in the RUL predictions for the proposed method CGCN-DCO, GCN-DCO, CGCN, logic regression with Gaussian process (LRGP), Gaussian process (GP), LSTM, and LSTM with dual attention mechanisms (LSTM-DA). Based on the table, it can be concluded that the proposed CGCN-DCO outperforms other methods. For instance, the average RMSE of the proposed method is 16.84, while the average RMSE of the other methods ranges from 17.07 to 30.27.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Case study II</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Data description</head><p>The proposed method is also demonstrated on the Oxford battery degradation dataset <ref type="bibr">(Birkl, 2017)</ref>. The Oxford battery dataset includes eight lithium-ion battery cells, each with a maximum capacity of 740 mAh. These battery cells underwent repeated charging and discharging operations, during which current, voltage, and temperature data were collected. The charging and discharging cycles exposed these battery cells to a CC and CV charging profile, followed by a drive cycle discharging profile. More details about this dataset can be found in <ref type="bibr">Birkl, Roberts, McTurk, Bruce, and Howey (2017)</ref>. Similar to the batteries in the NASA dataset, the capacity of these battery cells decreased as the number of charging and discharging cycles increased. Fig. <ref type="figure">12</ref> displays the degradation trajectories of capacity and capacity fade for all the battery cells. For all battery cells in the Oxford battery dataset, run-to-failure tests were conducted, and End-of-Life (EOL) was reached when the capacity decreased by 15%. Additionally, an eight-fold crossvalidation was performed to thoroughly evaluate the efficacy of the proposed method for all eight battery cells. In both SOH estimations and RUL predictions, the extracted features and network structure are the same as those used in the first case study. The only difference is that the number of filters in the dilated convolutional layers is set to 10, and the learning rate is set to 10 -4 .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">SOH estimation</head><p>Fig. <ref type="figure">13</ref> displays the SOH estimations for all battery cells in the Oxford battery dataset. The initial estimation point for these battery cells is 10 cycles. From Fig. <ref type="figure">13</ref>, it can be concluded that the proposed method is capable of estimating the SOH of lithium-ion batteries with   Similar to the first case study, an ablation study and a comparative study were also conducted to demonstrate the effectiveness of the proposed method. Table <ref type="table">8</ref> shows the RMSE, MAE, MSE, MedAE, and R2-score of SOH estimations for all battery cells in the Oxford battery dataset. From this table, it can be observed that the proposed method outperforms the methods listed in Table <ref type="table">3</ref> and other deep learning methods. For example, the average SOH estimation RMSE of the proposed method is 0.0150. In comparison, the average SOH estimation RMSE of Transformer and MGCN is 0.0235 and 0.0294, respectively. Moreover, the average R2-score of the proposed method is 0.9672. However, the average R2-score of other deep learning methods in this table ranges from 0.8093 to 0.9658.</p><p>Y. <ref type="bibr">Wei and D. Wu</ref>      </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">RUL prediction</head><p>Fig. <ref type="figure">14</ref> shows the box plot of RUL prediction results and prediction errors for battery cells in the Oxford battery dataset. From this figure, it can be concluded that the proposed method has a relatively high prediction accuracy, as the mean of the predicted RUL aligns with the true RUL, and the majority of prediction errors range from -5 to 5 cycles. It is worth noting that the RUL prediction performance is better than the RUL prediction performance in the first case study for two primary reasons. First, there are more battery cells included in this case study, leading to increased prediction performance. Second, the degradation trajectories of these battery cells are close to each other, therefore reducing the difficulties in RUL predictions.</p><p>Similar to the first case study, an ablation study was conducted to demonstrate the effectiveness of the proposed conditional graphs and the use of the dilated convolutional operation. A comparative study was also conducted to show that the proposed method outperforms other deep learning methods. Table <ref type="table">9</ref> displays the RMSE, MAE, MSE, MedAE, and R2-score for RUL prediction using methods listed in Table <ref type="table">3</ref> and other deep learning methods. Based on this table, it can be observed that both the proposed conditional graphs and the dilated convolutional operation improve RUL prediction performance. For example, the average RUL prediction RMSE of the proposed method is 3.484. In contrast, the average RUL prediction RMSE of the CGCN and GCN-DCO is 3.519 and 3.633, respectively. From Table <ref type="table">9</ref>, it can be concluded that the proposed method outperforms other deep learning methods. As an example, the average prediction MAE of the proposed method is 3.219. In comparison, the average prediction MAE of Transformer, MGCN, and CNN+LSTM ranges from 3.873 to 4.579.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusions and future work</head><p>In this work, two types of undirected graphs were introduced. The first type of graphs (&#57907; 1 ) was used to consider the correlation among features, while the second type of graphs (&#57907; 2 ) was used to consider the correlation between features and SOH/RUL. Two feature spaces were extracted from the two types of graphs, respectively. KL-divergence was then adopted to minimize the distance between the two feature spaces, allowing the feature space extracted from &#57907; 1 to approximate the feature space extracted from &#57907; 2 . Even without SOH/RUL, the correlation between the features and SOH/RUL can be taken into account when the feature space extracted from &#57907; 1 is used for testing. Additionally, the dilated convolutional operations were implemented after aggregating similar features in GCN, allowing one to consider the temporal correlation among the aggregated features. To evaluate the effectiveness of the proposed method, two battery datasets (i.e., NASA and Oxford battery datasets) were used, where the current, voltage, and temperature data in discharge cycles were used to predict the SOH and RUL. Experimental results have demonstrated that the proposed method outperforms other methods, such as the Transformer encoder, multi-receptive field GCN, and convolutional neural network with long short-term memory, in terms of RMSE, MAE, MedAE, MSE, and R2-score. Furthermore, the experimental results have shown that the proposed method outperforms other machine learning methods reported in the literature. In the future, we will take into account the underlying physics of battery aging in the proposed method. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>CRediT authorship contribution statement</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Declaration of competing interest</head><p>The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.</p></div></body>
		</text>
</TEI>
