The solution of a partial differential equation can be obtained by computing the inverse operator map between the input and the solution space. Towards this end, we introduce a multiwaveletbased neural operator learning scheme that compresses the associated operator's kernel using finegrained wavelets. By explicitly embedding the inverse multiwavelet filters, we learn the projection of the kernel onto fixed multiwavelet polynomial bases. The projected kernel is trained at multiple scales derived from using repeated computation of multiwavelet transform. This allows learning the complex dependencies at various scales and results in a resolutionindependent scheme. Compare to the prior works, we exploit the fundamental properties of the operator's kernel which enable numerically efficient representation. We perform experiments on the Kortewegde Vries (KdV) equation, Burgers' equation, Darcy Flow, and NavierStokes equation. Compared with the existing neural operator approaches, our model shows significantly higher accuracy and achieves stateoftheart in a range of datasets. For the timevarying equations, the proposed method exhibits a (2X−10X) improvement (0.0018 (0.0033) relative L2 error for Burgers' (KdV) equation). By learning the mappings between function spaces, the proposed method has the ability to find the solution of a highresolution input after learning from lowerresolution data.
more »
« less
Multiwaveletbased Operator Learning for Differential Equations
The solution of a partial differential equation can be obtained by computing the inverse operator map between the input and the solution space. Towards this end, we introduce a multiwaveletbased neural operator learning scheme that compresses the associated operator's kernel using finegrained wavelets. By explicitly embedding the inverse multiwavelet filters, we learn the projection of the kernel onto fixed multiwavelet polynomial bases. The projected kernel is trained at multiple scales derived from using repeated computation of multiwavelet transform. This allows learning the complex dependencies at various scales and results in a resolutionindependent scheme. Compare to the prior works, we exploit the fundamental properties of the operator's kernel which enable numerically efficient representation. We perform experiments on the Kortewegde Vries (KdV) equation, Burgers' equation, Darcy Flow, and NavierStokes equation. Compared with the existing neural operator approaches, our model shows significantly higher accuracy and achieves stateoftheart in a range of datasets. For the timevarying equations, the proposed method exhibits a ( 2 X − 10 X ) improvement ( 0.0018 ( 0.0033 ) relative L 2 error for Burgers' (KdV) equation). By learning the mappings between function spaces, the proposed method has the ability to find the solution of a highresolution input after learning from lowerresolution data.
more »
« less
 Award ID(s):
 1936775
 NSFPAR ID:
 10351666
 Date Published:
 Journal Name:
 Advances in neural information processing systems
 Volume:
 34
 ISSN:
 10495258
 Page Range / eLocation ID:
 2404824062
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this


In this paper, we apply the selfattention from the stateoftheart Transformer in Attention Is All You Need for the first time to a datadriven operator learning problem related to partial differential equations. An effort is put together to explain the heuristics of, and to improve the efficacy of the attention mechanism. By employing the operator approximation theory in Hilbert spaces, it is demonstrated for the first time that the softmax normalization in the scaled dotproduct attention is sufficient but not necessary. Without softmax, the approximation capacity of a linearized Transformer variant can be proved to be comparable to a PetrovGalerkin projection layerwise, and the estimate is independent with respect to the sequence length. A new layer normalization scheme mimicking the PetrovGalerkin projection is proposed to allow a scaling to propagate through attention layers, which helps the model achieve remarkable accuracy in operator learning tasks with unnormalized data. Finally, we present three operator learning experiments, including the viscid Burgers' equation, an interface Darcy flow, and an inverse interface coefficient identification problem. The newly proposed simple attentionbased operator learner, Galerkin Transformer, shows significant improvements in both training cost and evaluation accuracy over its softmaxnormalized counterparts.more » « less

Timeevolution of partial differential equations is the key to model several dynamical processes, events forecasting but the operators associated with such problems are nonlinear. We propose a Padé approximation based exponential neural operator scheme for efficiently learning the map between a given initial condition and activities at a later time. The multiwavelets bases are used for space discretization. By explicitly embedding the exponential operators in the model, we reduce the training parameters and make it more dataefficient which is essential in dealing with scarce realworld datasets. The Padé exponential operator uses a to model the nonlinearity compared to recent neural operators that rely on using multiple linear operator layers in succession. We show theoretically that the gradients associated with the recurrent Padé network are bounded across the recurrent horizon. We perform experiments on nonlinear systems such as Kortewegde Vries (KdV) and Kuramoto–Sivashinsky (KS) equations to show that the proposed approach achieves the best performance and at the same time is dataefficient. We also show that urgent realworld problems like Epidemic forecasting (for example, COVID19) can be formulated as a 2D timevarying operator problem. The proposed Padé exponential operators yield better prediction results ( better MAE than best neural operator (nonneural operator deep learning model)) compared to stateoftheart forecasting models.more » « less

Timeevolution of partial differential equations is the key to model several dynamical processes, events forecasting but the operators associated with such problems are nonlinear. We propose a Padé approximation based exponential neural operator scheme for efficiently learning the map between a given initial condition and activities at a later time. The multiwavelets bases are used for space discretization. By explicitly embedding the exponential operators in the model, we reduce the training parameters and make it more dataefficient which is essential in dealing with scarce realworld datasets. The Padé exponential operator uses a to model the nonlinearity compared to recent neural operators that rely on using multiple linear operator layers in succession. We show theoretically that the gradients associated with the recurrent Padé network are bounded across the recurrent horizon. We perform experiments on nonlinear systems such as Kortewegde Vries (KdV) and Kuramoto–Sivashinsky (KS) equations to show that the proposed approach achieves the best performance and at the same time is dataefficient. We also show that urgent realworld problems like Epidemic forecasting (for example, COVID19) can be formulated as a 2D timevarying operator problem. The proposed Padé exponential operators yield better prediction results ( better MAE than best neural operator (nonneural operator deep learning model)) compared to stateoftheart forecasting models.more » « less

Timeevolution of partial differential equations is fundamental for modeling several complex dynamical processes and events forecasting, but the operators associated with such problems are nonlinear. We propose a Pad´e approximation based exponential neural operator scheme for efficiently learning the map between a given initial condition and the activities at a later time. The multiwavelets bases are used for space discretization. By explicitly embedding the exponential operators in the model, we reduce the training parameters and make it more dataefficient which is essential in dealing with scarce and noisy realworld datasets. The Pad´e exponential operator uses a recurrent structure with shared parameters to model the nonlinearity compared to recent neural operators that rely on using multiple linear operator layers in succession. We show theoretically that the gradients associated with the recurrent Pad´e network are bounded across the recurrent horizon. We perform experiments on nonlinear systems such as Kortewegde Vries (KdV) and Kuramoto–Sivashinsky (KS) equations to show that the proposed approach achieves the best performance and at the same time is dataefficient. We also show that urgent realworld problems like epidemic forecasting (for example, COVID 19) can be formulated as a 2D timevarying operator problem. The proposed Pad´e exponential operators yield better prediction results (53% (52%) better MAE than best neural operator (nonneural operator deep learning model)) compared to stateoftheart forecasting models.more » « less