While cross entropy (CE) is the most commonly used loss function to train deep neural networks for classification tasks, many alternative losses have been developed to obtain better empirical performance. Among them, which one is the best to use is still a mystery, because there seem to be multiple factors affecting the answer, such as properties of the dataset, the choice of network architecture, and so on. This paper studies the choice of loss function by examining the last-layer features of deep networks, drawing inspiration from a recent line work showing that the global optimal solution of CE and mean-square-error (MSE) losses exhibits a Neural Collapse phenomenon. That is, for sufficiently large networks trained until convergence, (i) all features of the same class collapse to the corresponding class mean and (ii) the means associated with different classes are in a configuration where their pairwise distances are all equal and maximized. We extend such results and show through global solution and landscape analyses that a broad family of loss functions including commonly used label smoothing (LS) and focal loss (FL) exhibits Neural Collapse. Hence, all relevant losses (i.e., CE, LS, FL, MSE) produce equivalent features on training data. In particular, based on the unconstrained feature model assumption, we provide either the global landscape analysis for LS loss or the local landscape analysis for FL loss and show that the (only!) global minimizers are neural collapse solutions, while all other critical points are strict saddles whose Hessian exhibit negative curvature directions either in the global scope for LS loss or in the local scope for FL loss near the optimal solution. The experiments further show that Neural Collapse features obtained from all relevant losses (i.e., CE, LS, FL, MSE) lead to largely identical performance on test data as well, provided that the network is sufficiently large and trained until convergence.
more »
« less
CoPhy-PGNN: Learning Physics-guided Neural Networks with Competing Loss Functions for Solving Eigenvalue Problems
Physics-guided Neural Networks (PGNNs) represent an emerging class of neural networks that are trained using physics-guided (PG) loss functions (capturing violations in network outputs with known physics), along with the supervision contained in data. Existing work in PGNNs has demonstrated the efficacy of adding single PG loss functions in the neural network objectives, using constant trade-off parameters, to ensure better generalizability. However, in the presence of multiple PG functions with competing gradient directions, there is a need to adaptively tune the contribution of different PG loss functions during the course of training to arrive at generalizable solutions. We demonstrate the presence of competing PG losses in the generic neural network problem of solving for the lowest (or highest) eigenvector of a physics-based eigenvalue equation, which is commonly encountered in many scientific problems. We present a novel approach to handle competing PG losses and demonstrate its efficacy in learning generalizable solutions in two motivating applications of quantum mechanics and electromagnetic propagation. All the code and data used in this work are available at https://github.com/jayroxis/Cophy-PGNN.
more »
« less
- Award ID(s):
- 2026710
- PAR ID:
- 10227099
- Date Published:
- Journal Name:
- ACM Transactions on Intelligent Systems and Technology
- ISSN:
- 2157-6904
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Recent advancements in physics-informed machine learning have contributed to solving partial differential equations through means of a neural network. Following this, several physics-informed neural network works have followed to solve inverse problems arising in structural health monitoring. Other works involving physics-informed neural networks solve the wave equation with partial data and modeling wavefield data generator for efficient sound data generation. While a lot of work has been done to show that partial differential equations can be solved and identified using a neural network, little work has been done the same with more basic machine learning (ML) models. The advantage with basic ML models is that the parameters learned in a simpler model are both more interpretable and extensible. For applications such as ultrasonic nondestructive evaluation, this interpretability is essential for trustworthiness of the methods and characterization of the material system under test. In this work, we show an interpretable, physics-informed representation learning framework that can analyze data across multiple dimensions (e.g., two dimensions of space and one dimension of time). The algorithm comes with convergence guarantees. In addition, our algorithm provides interpretability of the learned model as the parameters correspond to the individual solutions extracted from data. We demonstrate how this algorithm functions with wavefield videos.more » « less
-
The inception of physics-constrained or physics-informed machine learning represents a paradigm shift, addressing the challenges associated with data scarcity and enhancing model interpretability. This innovative approach incorporates the fundamental laws of physics as constraints, guiding the training process of machine learning models. In this work, the physics-constrained convolutional recurrent neural network is further extended for solving spatial-temporal partial differential equations with arbitrary boundary conditions. Two notable advancements are introduced: the implementation of boundary conditions as soft constraints through finite difference-based differentiation, and the establishment of an adaptive weighting mechanism for the optimal allocation of weights to various losses. These enhancements significantly augment the network's ability to manage intricate boundary conditions and expedite the training process. The efficacy of the proposed model is validated through its application to two-dimensional phase transition, fluid dynamics, and reaction-diffusion problems, which are pivotal in materials modeling. Compared to traditional physics-constrained neural networks, the physics-constrained convolutional recurrent neural network demonstrates a tenfold increase in prediction accuracy within a similar computational budget. Moreover, the model's exceptional performance in extrapolating solutions for the Burgers' equation underscores its utility. Therefore, this research establishes the physics-constrained recurrent neural network as a viable surrogate model for sophisticated spatial-temporal PDE systems, particularly beneficial in scenarios plagued by sparse and noisy datasets.more » « less
-
Physics-informed neural networks (PINNs) have been widely used to solve partial differential equations (PDEs) in a forward and inverse manner using neural networks. However, balancing individual loss terms can be challenging, mainly when training these networks for stiff PDEs and scenarios requiring enforcement of numerous constraints. Even though statistical methods can be applied to assign relative weights to the regression loss for data, assigning relative weights to equation-based loss terms remains a formidable task. This paper proposes a method for assigning relative weights to the mean squared loss terms in the objective function used to train PINNs. Due to the presence of temporal gradients in the governing equation, the physics-informed loss can be recast using numerical integration through backward Euler discretization. The physics-uninformed and physics-informed networks should yield identical predictions when assessed at corresponding spatiotemporal positions. We refer to this consistency as “temporal consistency.” This approach introduces a unique method for training physics-informed neural networks (PINNs), redefining the loss function to allow for assigning relative weights with statistical properties of the observed data. In this work, we consider the two- and three-dimensional Navier–Stokes equations and determine the kinematic viscosity using the spatiotemporal data on the velocity and pressure fields. We consider numerical datasets to test our method. We test the sensitivity of our method to the timestep size, the number of timesteps, noise in the data, and spatial resolution. Finally, we use the velocity field obtained using particle image velocimetry experiments to generate a reference pressure field and test our framework using the velocity and pressure fields.more » « less
-
The values of two-player general-sum differential games are viscosity solutions to Hamilton-Jacobi-Isaacs (HJI) equations. Value and policy approximations for such games suffer from the curse of dimensionality (CoD). Alleviating CoD through physics-informed neural networks (PINN) encounters convergence issues when value discontinuity is present due to state constraints. On top of these challenges, it is often necessary to learn generalizable values and policies across a parametric space of games, eg, for game parameter inference when information is incomplete. To address these challenges, we propose in this paper a Pontryagin-mode neural operator that outperforms existing state-of-the-art (SOTA) on safety performance across games with parametric state constraints. Our key contribution is the introduction of a costate loss defined on the discrepancy between forward and backward costate rollouts, which are computationally cheap. We show that the discontinuity of costate dynamics (in the presence of state constraints) effectively enables the learning of discontinuous values, without requiring manually supervised data as suggested by the current SOTA. More importantly, we show that the close relationship between costates and policies makes the former critical in learning feedback control policies with generalizable safety performance.more » « less
An official website of the United States government

