Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
N/A (Ed.)Abstract This work unifies the analysis of various randomized methods for solving linear and nonlinear inverse problems with Gaussian priors by framing the problem in a stochastic optimization setting. By doing so, we show that many randomized methods are variants of a sample average approximation (SAA). More importantly, we are able to prove a single theoretical result that guarantees the asymptotic convergence for a variety of randomized methods. Additionally, viewing randomized methods as an SAA enables us to prove, for the first time, a single non-asymptotic error result that holds for randomized methods under consideration. Another important consequence of our unified framework is that it allows us to discover new randomization methods. We present various numerical results for linear, nonlinear, algebraic, and PDE-constrained inverse problems that verify the theoretical convergence results and provide a discussion on the apparently different convergence rates and the behavior for various randomized methods.more » « less
-
Abstract One of the reasons why many neural networks are capable of replicating complicated tasks or functions is their universal approximation property. Though the past few decades have seen tremendous advances in theories of neural networks, a single constructive and elementary framework for neural network universality remains unavailable. This paper is an effort to provide a unified and constructive framework for the universality of a large class of activation functions including most of the existing ones. At the heart of the framework is the concept of neural network approximate identity (nAI). The main result is as follows: any nAI activation function is universal in the space of continuous functions on compacta. It turns out that most of the existing activation functions are nAI, and thus universal. The framework induces several advantages over the contemporary counterparts. First, it is constructive with elementary means from functional analysis, probability theory, and numerical analysis. Second, it is one of the first unified and constructive attempts that is valid for most of the existing activation functions. Third, it provides new proofs for most activation functions. Fourth, for a given activation and error tolerance, the framework provides precisely the architecture of the corresponding one-hidden neural network with a predetermined number of neurons and the values of weights/biases. Fifth, the framework allows us to abstractly present the first universal approximation with a favorable non-asymptotic rate. Sixth, our framework also provides insights into the developments, and hence providing constructive derivations, of some of the existing approaches.more » « less
-
N/A (Ed.)Abstract Partial differential equation (PDE)-constrained inverse problems are some of the most challenging and computationally demanding problems in computational science today. Fine meshes required to accurately compute the PDE solution introduce an enormous number of parameters and require large-scale computing resources such as more processors and more memory to solve such systems in a reasonable time. For inverse problems constrained by time-dependent PDEs, the adjoint method often employed to compute gradients and higher order derivatives efficiently requires solving a time-reversed, so-called adjoint PDE that depends on the forward PDE solution at each timestep. This necessitates the storage of a high-dimensional forward solution vector at every timestep. Such a procedure quickly exhausts the available memory resources. Several approaches that trade additional computation for reduced memory footprint have been proposed to mitigate the memory bottleneck, including checkpointing and compression strategies. In this work, we propose a close-to-ideal scalable compression approach using autoencoders to eliminate the need for checkpointing and substantial memory storage, thereby reducing the time-to-solution and memory requirements. We compare our approach with checkpointing and an off-the-shelf compression approach on an earth-scale ill-posed seismic inverse problem. The results verify the expected close-to-ideal speedup for the gradient and Hessian-vector product using the proposed autoencoder compression approach. To highlight the usefulness of the proposed approach, we combine the autoencoder compression with the data-informed active subspace (DIAS) prior showing how the DIAS method can be affordably extended to large-scale problems without the need for checkpointing and large memory.more » « less
-
Efficient real-time solvers for forward and inverse problems are essential in engineering and science applications. Machine learning surrogate models have emerged as promising alter- natives to traditional methods, offering substantially reduced computational time. Never- theless, these models typically demand extensive training datasets to achieve robust gen- eralization across diverse scenarios. While physics-based approaches can partially mitigate this data dependency and ensure physics-interpretable solutions, addressing scarce data regimes remains a challenge. Both purely data-driven and physics-based machine learning approaches demonstrate severe overfitting issues when trained with insufficient data. We propose a novel model-constrained Tikhonov autoencoder neural network framework, called TAEN, capable of learning both forward and inverse surrogate models using a single arbitrary observational sample. We develop comprehensive theoretical foundations including forward and inverse inference error bounds for the proposed approach for linear cases. For compara- tive analysis, we derive equivalent formulations for pure data-driven and model-constrained approach counterparts. At the heart of our approach is a data randomization strategy with theoretical justification, which functions as a generative mechanism for exploring the train- ing data space, enabling effective training of both forward and inverse surrogate models even with a single observation, while regularizing the learning process. We validate our approach through extensive numerical experiments on two challenging inverse problems: 2D heat conductivity inversion and initial condition reconstruction for time-dependent 2D Navier–Stokes equations. Results demonstrate that TAEN achieves accuracy comparable to traditional Tikhonov solvers and numerical forward solvers for both inverse and forward problems, respectively, while delivering orders of magnitude computational speedups.more » « lessFree, publicly-accessible full text available November 1, 2026
-
This work presents a two-stage adaptive framework for progressively developing deep neural network (DNN) architectures that generalize well for a given training dataset. In the first stage, a layerwise training approach is adopted where a new layer is added each time and trained independently by freezing parameters in the previous layers. We impose desirable structures on the DNN by employing manifold regularization, sparsity regularization, and physics-informed terms. We introduce a $$\ epsilon-\delta$$ stability-promoting concept as a desirable property for a learning algorithm and show that employing manifold regularization yields a $$\epsilon-\delta$$ stability-promoting algorithm. Further, we also derive the necessary conditions for the trainability of a newly added layer and investigate the training saturation problem. In the second stage of the algorithm (post-processing), a sequence of shallow networks is employed to extract information from the residual produced in the first stage, thereby improving the prediction accuracy. Numerical investigations on prototype regression and classification problems demonstrate that the proposed approach can outperform fully connected DNNs of the same size. Moreover, by equipping the physics-informed neural network (PINN) with the proposed adaptive architecture strategy to solve partial differential equations, we numerically show that adaptive PINNs not only are superior to standard PINNs but also produce interpretable hidden layers with provable stability. We also apply our architecture design strategy to solve inverse problems governed by elliptic partial differential equations.more » « lessFree, publicly-accessible full text available June 1, 2026
-
Real-time accurate solutions of large-scale complex dynamical systems are critically needed for control, optimization, uncertainty quantification, and decision-making in practical engineering and science applications, particularly in digital twin contexts. Recent research on hybrid approaches combining numerical methods and machine learning in end-to-end training has shown significant improvements over either approach alone. However, using neural networks as surrogate models generally exhibits limitations in generalizability over different settings and in capturing the evolution of solution discontinuities. In this work, we develop a model-constrained discontinuous Galerkin Network DGNet approach, a significant extension to our previous work, for compressible Euler equations with out-of-distribution generalization. The core of DGNet is the synergy of several key strategies: (i) leveraging time integration schemes to capture temporal correlation and taking advantage of neural network speed for computation time reduction. This is the key to the temporal discretization-invariant property of DGNet; (ii) employing a model-constrained approach to ensure the learned tangent slope satisfies governing equations; (iii) utilizing a DG-inspired architecture for GNN where edges represent Riemann solver surrogate models and nodes represent volume integration correction surrogate models, enabling capturing discontinuity capability, aliasing error reduction, and mesh discretization generalizability. Such a design allows DGNet to learn the DG spatial discretization accurately; (iv) developing an input normalization strategy that allows surrogate models to generalize across different initial conditions, geometries, meshes, boundary conditions, and solution orders. In fact, the normalization is the key to spatial discretization-invariance for DGNet; and (v) incorporating a data randomization technique that not only implicitly promotes agreement between surrogate models and true numerical models up to second-order derivatives, ensuring long-term stability and prediction capacity, but also serves as a data generation engine during training, leading to enhanced generalization on unseen data. To validate the theoretical results, effectiveness, stability, and generalizability of our novel DGNet approach, we present comprehensive numerical results for 1D and 2D compressible Euler equation problems, including Sod Shock Tube, Lax Shock Tube, Isentropic Vortex, Forward Facing Step, Scramjet, Airfoil, Euler Benchmarks, Double Mach Reflection, and a Hypersonic Sphere Cone benchmark.more » « lessFree, publicly-accessible full text available May 15, 2026
-
We present a divergence-free and $$\Hsp\LRp{div}$$-conforming hybridized discontinuous Galerkin (HDG) method and a computationally efficient variant called embedded-HDG (E-HDG) for solving stationary incompressible viso-resistive magnetohydrodynamic (MHD) equations. The proposed E-HDG approach uses continuous facet unknowns for the vector-valued solutions (velocity and magnetic fields) while it uses discontinuous facet unknowns for the scalar variable (pressure and magnetic pressure). This choice of function spaces makes E-HDG computationally far more advantageous, due to the much smaller number of degrees of freedom, compared to the HDG counterpart. The benefit is even more significant for three-dimensional/high-order/fine mesh scenarios. On simplicial meshes, the proposed methods with a specific choice of approximation spaces are well-posed for linear(ized) MHD equations. For nonlinear MHD problems, we present a simple approach exploiting the proposed linear discretizations by using a Picard iteration. The beauty of this approach is that the divergence-free and $$\Hsp\LRp{div}$$-conforming properties of the velocity and magnetic fields are automatically carried over for nonlinear MHD equations. We study the accuracy and convergence of our E-HDG method for both linear and nonlinear MHD cases through various numerical experiments, including two- and three-dimensional problems with smooth and singular solutions. The numerical examples show that the proposed methods are pressure robust, and the divergence of the resulting velocity and magnetic fields is machine zero for both smooth and singular problems.more » « lessFree, publicly-accessible full text available December 1, 2025
-
Deep Learning (DL), in particular deep neural networks (DNN), by default is purely data-driven and in general does not require physics. This is the strength of DL but also one of its key limitations when applied to science and engineering problems in which underlying physical properties—such as stability, conservation, and positivity—and accuracy are required. DL methods in their original forms are not capable of respecting the underlying mathematical models or achieving desired accuracy even in big-data regimes. On the other hand, many data-driven science and engineering problems, such as inverse problems, typically have limited experimental or observational data, and DL would overfit the data in this case. Leveraging information encoded in the underlying mathematical models, we argue, not only compensates missing information in low data regimes but also provides opportunities to equip DL methods with the underlying physics, and hence promoting better generalization. This paper develops a model-constrained deep learning approach and its variant TNet—a Tikhonov neural network—that are capable of learning not only information hidden in the training data but also in the underlying mathematical models to solve inverse problems governed by partial differential equations in low data regimes. We provide the constructions and some theoretical results for the proposed approaches for both linear and nonlinear inverse problems. Since TNet is designed to learn inverse solution with Tikhonov regularization, it is interpretable: in fact it recovers Tikhonov solutions for linear cases while potentially approximating Tikhonov solutions in any desired accuracy for nonlinear inverse problems. We also prove that data randomization can enhance not only the smoothness of the networks but also their generalizations. Comprehensive numerical results confirm the theoretical findings and show that with even as little as 1 training data sample for 1D deconvolution, 5 for inverse 2D heat conductivity problem, 100 for inverse initial conditions for time-dependent 2D Burgers’ equation, and 50 for inverse initial conditions for 2D Navier-Stokes equations, TNet solutions can be as accurate as Tikhonov solutions while being several orders of magnitude faster. This is possible owing to the model-constrained term, replications, and randomization.more » « less
-
This work proposes a unified hp-adaptivity framework for hybridized discontinuous Galerkin (HDG) method for a large class of partial differential equations (PDEs) of Friedrichs’ type. In particular, we present unified hp-HDG formulations for abstract one-field and two-field structures and prove their well-posedness. In order to handle non-conforming interfaces we simply take advantage of HDG built-in mortar structures. With split-type mortars and the approximation space of trace, a numerical flux can be derived via Godunov approach and be naturally employed without any additional treatment. As a consequence, the proposed formulations are parameter-free. We perform several numerical experiments for time-independent and linear PDEs including elliptic, hyperbolic, and mixed-type to verify the proposed unified hp-formulations and demonstrate the effectiveness of hp-adaptation. Two adaptivity criteria are considered: one is based on a simple and fast error indicator, while the other is rigorous but more expensive using an adjoint-based error estimate. The numerical results show that these two approaches are comparable in terms of convergence rate even for problems with strong gradients, discontinuities, and singularities.more » « less
An official website of the United States government
