A Transformer-based deep direct sampling method is proposed for electrical impedance tomography, a well-known severely ill-posed nonlinear boundary value inverse problem. A real-time reconstruction is achieved by evaluating the learned inverse operator between carefully designed data and the reconstructed images. An effort is made to give a specific example to a fundamental question: whether and how one can benefit from the theoretical structure of a mathematical problem to develop task-oriented and structure-conforming deep neural networks? Specifically, inspired by direct sampling methods for inverse problems, the 1D boundary data in different frequencies are preprocessed by a partial differential equation-based feature map to yield 2D harmonic extensions as different input channels. Then, by introducing learnable non-local kernels, the direct sampling is recast to a modified attention mechanism. The new method achieves superior accuracy over its predecessors and contemporary operator learners and shows robustness to noises in benchmarks. This research shall strengthen the insights that, despite being invented for natural language processing tasks, the attention mechanism offers great flexibility to be modified in conformity with the a priori mathematical knowledge, which ultimately leads to the design of more physics-compatible neural architectures.
more »
« less
This content will become publicly available on December 10, 2025
Nonlocal Attention Operator: Materializing Hidden Knowledge Towards Interpretable Physics Discovery
Despite the recent popularity of attention-based neural architectures in core AI fields like natural language processing (NLP) and computer vision (CV), their potential in modeling complex physical systems remains underexplored. Learning problems in physical systems are often characterized as discovering operators that map between function spaces based on a few instances of function pairs. This task frequently presents a severely ill-posed PDE inverse problem. In this work, we propose a novel neural operator architecture based on the attention mechanism, which we refer to as the Nonlocal Attention Operator (NAO), and explore its capability in developing a foundation physical model. In particular, we show that the attention mechanism is equivalent to a double integral operator that enables nonlocal interactions among spatial tokens, with a data-dependent kernel characterizing the inverse mapping from data to the hidden parameter field of the underlying operator. As such, the attention mechanism extracts global prior information from training data generated by multiple systems, and suggests the exploratory space in the form of a nonlinear kernel map. Consequently, NAO can address ill-posedness and rank deficiency in inverse PDE problems by encoding regularization and achieving generalizability. We empirically demonstrate the advantages of NAO over baseline neural models in terms of generalizability to unseen data resolutions and system states. Our work not only suggests a novel neural operator architecture for learning interpretable foundation models of physical systems, but also offers a new perspective towards understanding the attention mechanism. Our code and data accompanying this paper are available at https://github.com/fishmoon1234/NAO.
more »
« less
- Award ID(s):
- 2238486
- PAR ID:
- 10610736
- Publisher / Repository:
- 38th Conference on Neural Information Processing Systems (NeurIPS 2024)
- Date Published:
- ISSN:
- 1049-5258
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Nonlocal operators with integral kernels have become a popular tool for designing solution maps between function spaces, due to their efficiency in representing long-range dependence and the attractive feature of being resolution-invariant. In this work, we provide a rigorous identifiability analysis and convergence study for learning kernels in nonlocal operators. It is found that kernel estimation is an ill-posed or even ill-defined inverse problem, leading to divergent estimators in the presence of modeling errors or measurement noises. To resolve this issue, we propose a nonparametric regression algorithm with a novel data-adaptive RKHS Tikhonov regularization method based on the function space of identifiability. The method yields a noisy-robust convergent estimator of the kernel as the data resolution refines, on both synthetic and real-world datasets. In particular, the method successfully learns a homogenized model for stress wave propagation in a heterogeneous solid, revealing the unknown governing laws from real-world data at the microscale. Our regularization method outperforms baseline methods in robustness, generalizability, and accuracy.more » « less
-
Abstract Traditional data-driven deep learning models often struggle with high training costs, error accumulation, and poor generalizability in complex physical processes. Physics-informed deep learning (PiDL) addresses these challenges by incorporating physical principles into the model. Most PiDL approaches regularize training by embedding governing equations into the loss function, yet this depends heavily on extensive hyperparameter tuning to weigh each loss term. To this end, we propose to leverage physics prior knowledge by “baking” the discretized governing equations into the neural network architecture via the connection between the partial differential equations (PDE) operators and network structures, resulting in a PDE-preserved neural network (PPNN). This method, embedding discretized PDEs through convolutional residual networks in a multi-resolution setting, largely improves the generalizability and long-term prediction accuracy, outperforming conventional black-box models. The effectiveness and merit of the proposed methods have been demonstrated across various spatiotemporal dynamical systems governed by spatiotemporal PDEs, including reaction-diffusion, Burgers’, and Navier-Stokes equations.more » « less
-
The physics-informed neural networks (PINNs) has been widely utilized to numerically approximate PDE problems. While PINNs has achieved good results in producing solutions for many partial differential equations, studies have shown that it does not perform well on phase field models. In this paper, we partially address this issue by introducing a modified physics-informed neural networks. In particular, they are used to numerically approximate Allen- Cahn-Ohta-Kawasaki (ACOK) equation with a volume constraint. Technically, the inverse of Laplacian in the ACOK model presents many challenges to the baseline PINNs. To take the zero- mean condition of the inverse of Laplacian, as well as the volume constraint, into consideration, we also introduce a separate neural network, which takes the second set of sampling points in the approximation process. Numerical results are shown to demonstrate the effectiveness of the modified PINNs. An additional benefit of this research is that the modified PINNs can also be applied to learn more general nonlocal phase-field models, even with an unknown nonlocal kernel.more » « less
-
In [Antil et al. Inverse Probl. 35 (2019) 084003.] we introduced a new notion of optimal control and source identification (inverse) problems where we allow the control/source to be outside the domain where the fractional elliptic PDE is fulfilled. The current work extends this previous work to the parabolic case. Several new mathematical tools have been developed to handle the parabolic problem. We tackle the Dirichlet, Neumann and Robin cases. The need for these novel optimal control concepts stems from the fact that the classical PDE models only allow placing the control/source either on the boundary or in the interior where the PDE is satisfied. However, the nonlocal behavior of the fractional operator now allows placing the control/source in the exterior. We introduce the notions of weak and very-weak solutions to the fractional parabolic Dirichlet problem. We present an approach on how to approximate the fractional parabolic Dirichlet solutions by the fractional parabolic Robin solutions (with convergence rates). A complete analysis for the Dirichlet and Robin optimal control problems has been discussed. The numerical examples confirm our theoretical findings and further illustrate the potential benefits of nonlocal models over the local ones.more » « less
An official website of the United States government
