NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Stable Minima of ReLU Neural Networks Suffer from the Curse of Dimensionality: The Neural Shattering Phenomenon

Liang, Tongtong; Qiao, Dan; Wang, Yu-Xiang; Parhi, Rahul (November 2025, Advances in neural information processing systems)

We study the implicit bias of flatness / low (loss) curvature and its effects on generalization in two-layer overparameterized ReLU networks with multivariate inputs---a problem well motivated by the minima stability and edge-of-stability phenomena in gradient-descent training. Existing work either requires interpolation or focuses only on univariate inputs. This paper presents new and somewhat surprising theoretical results for multivariate inputs. On two natural settings (1) generalization gap for flat solutions, and (2) mean-squared error (MSE) in nonparametric function estimation by stable minima, we prove upper and lower bounds, which establish that while flatness does imply generalization, the resulting rates of convergence necessarily deteriorate exponentially as the input dimension grows. This gives an exponential separation between the flat solutions compared to low-norm solutions (i.e., weight decay), which are known not to suffer from the curse of dimensionality. In particular, our minimax lower bound construction, based on a novel packing argument with boundary-localized ReLU neurons, reveals how flat solutions can exploit a kind of "neural shattering" where neurons rarely activate, but with high weight magnitudes. This leads to poor performance in high dimensions. We corroborate these theoretical findings with extensive numerical simulations. To the best of our knowledge, our analysis provides the first systematic explanation for why flat minima may fail to generalize in high dimensions.
more » « less
Free, publicly-accessible full text available November 30, 2026
Revisiting Source-Free Domain Adaptation: Insights into Representativeness, Generalization, and Variety

Zhu, Ronghang; Hu, Mengxuan; Zhuang, Weiming; Lyu, Lingjuan; Yu, Xiang; Li, Sheng (June 2025, The IEEE/CVF Conference on Computer Vision and Pattern Recognition)

Domain adaptation addresses the challenge where the distribution of target inference data differs from that of the source training data. Recently, data privacy has become a significant constraint, limiting access to the source domain. To mitigate this issue, Source-Free Domain Adaptation (SFDA) methods bypass source domain data by generating source-like data or pseudo-labeling the unlabeled target domain. However, these approaches often lack theoretical grounding. In this work, we provide a theoretical analysis of the SFDA problem, focusing on the general empirical risk of the unlabeled target domain. Our analysis offers a comprehensive understanding of how representativeness, generalization, and variety contribute to controlling the upper bound of target domain empirical risk in SFDA settings. We further explore how to balance this trade-off from three perspectives: sample selection, semantic domain alignment, and a progressive learning framework. These insights inform the design of novel algorithms. Experimental results demonstrate that our proposed method achieves state-of-the-art performance on three benchmark datasets--Office-Home, DomainNet, and VisDA-C--yielding relative improvements of 3.2%, 9.1%, and 7.5%, respectively, over the representative SFDA method, SHOT.
more » « less
Free, publicly-accessible full text available June 11, 2026
MoXCo: How I learned to stop exploring and love my local minima?

Singh, Esha; Sabach, Shoham; Wang, Yu-Xiang (March 2025, Proceedings of Machine Learning Research)

Deep neural networks are well-known for their generalization capabilities, largely attributed to optimizers’ ability to find "good" solutions in high-dimensional loss landscapes. This work aims to deepen the understanding of optimization specifically through the lens of loss landscapes. We propose a generalized framework for adaptive optimization that favors convergence to these "good" solutions. Our approach shifts the optimization paradigm from merely finding solutions quickly to discovering solutions that generalize well, establishing a careful balance between optimization efficiency and model generalization. We empirically validate our claims using two-layer, fully connected neural network with ReLU activation and demonstrate practical applicability through binary quantization of ResNets. Our numerical results demonstrate that these adaptive optimizers facilitate exploration leading to faster convergence speeds and narrow the generalization gap between stochastic gradient descent and other adaptive methods.
more » « less
Free, publicly-accessible full text available March 24, 2026
Three-Dimensional Object Detection and High-Resolution Traffic Parameter Extraction Using Low-Resolution LiDAR Data

https://doi.org/10.1061/JTEPBS.TEENG-8662

Zhang, Linlin; Yu, Xiang; Aboah, Armstrong; Adu-Gyamfi, Yaw (March 2025, Journal of Transportation Engineering, Part A: Systems)

Free, publicly-accessible full text available March 1, 2026
Differentially Private Reinforcement Learning with Self-Play

Qiao, Dan; Wang, Yu-Xiang (December 2024, Advances in neural information processing systems)

Full Text Available
NetworkGym: Reinforcement Learning Environments for Multi-Access Traffic Management in Network Simulation

Haider, Momin; Yin, Ming; Zhang, Menglei; Gupta, Arpit; Zhu, Jing; Wang, Yu-Xiang (December 2024, Proceedings of Machine Learning Research)

Mobile devices such as smartphones, laptops, and tablets can often connect to multiple access networks (e.g., Wi-Fi, LTE, and 5G) simultaneously. Recent advancements facilitate seamless integration of these connections below the transport layer, enhancing the experience for apps that lack inherent multi-path support. This optimization hinges on dynamically determining the traffic distribution across networks for each device, a process referred to as \textit{multi-access traffic splitting}. This paper introduces \textit{NetworkGym}, a high-fidelity network environment simulator that facilitates generating multiple network traffic flows and multi-access traffic splitting. This simulator facilitates training and evaluating different RL-based solutions for the multi-access traffic splitting problem. Our initial explorations demonstrate that the majority of existing state-of-the-art offline RL algorithms (e.g. CQL) fail to outperform certain hand-crafted heuristic policies on average. This illustrates the urgent need to evaluate offline RL algorithms against a broader range of benchmarks, rather than relying solely on popular ones such as D4RL. We also propose an extension to the TD3+BC algorithm, named Pessimistic TD3 (PTD3), and demonstrate that it outperforms many state-of-the-art offline RL algorithms. PTD3's behavioral constraint mechanism, which relies on value-function pessimism, is theoretically motivated and relatively simple to implement.
more » « less
Full Text Available
Stable Minima Cannot Overfit in Univariate ReLU Networks: Generalization by Large Step Sizes

Qiao, Dan; Zhang, Kaiqi; Singh, Esha; Soudry, Daniel; Wang, Yu-Xiang (October 2024, Advances in neural information processing systems)

We study the generalization of two-layer ReLU neural networks in a univariate nonparametric regression problem with noisy labels. This is a problem where kernels (\emph{e.g.} NTK) are provably sub-optimal and benign overfitting does not happen, thus disqualifying existing theory for interpolating (0-loss, global optimal) solutions. We present a new theory of generalization for local minima that gradient descent with a constant learning rate can \emph{stably} converge to. We show that gradient descent with a fixed learning rate η can only find local minima that represent smooth functions with a certain weighted \emph{first order total variation} bounded by 1/η−1/2+O˜(σ+MSE‾‾‾‾‾√) where σ is the label noise level, MSE is short for mean squared error against the ground truth, and O˜(⋅) hides a logarithmic factor. Under mild assumptions, we also prove a nearly-optimal MSE bound of O˜(n−4/5) within the strict interior of the support of the n data points. Our theoretical results are validated by extensive simulation that demonstrates large learning rate training induces sparse linear spline fits. To the best of our knowledge, we are the first to obtain generalization bound via minima stability in the non-interpolation case and the first to show ReLU NNs without regularization can achieve near-optimal rates in nonparametric regression.
more » « less
Full Text Available
Near-Optimal Reinforcement Learning with Self-Play under Adaptivity Constraints

Qiao, Dan; Wang, Yu-Xiang (July 2024, Proceedings of Machine Learning Research)

We study the problem of multi-agent reinforcement learning (MARL) with adaptivity constraints -- a new problem motivated by real-world applications where deployments of new policies are costly and the number of policy updates must be minimized. For two-player zero-sum Markov Games, we design a (policy) elimination based algorithm that achieves a regret of O˜(H3S2ABK‾‾‾‾‾‾‾‾‾‾√), while the batch complexity is only O(H+loglogK). In the above, S denotes the number of states, A,B are the number of actions for the two players respectively, H is the horizon and K is the number of episodes. Furthermore, we prove a batch complexity lower bound Ω(HlogAK+loglogK) for all algorithms with O˜(K‾‾√) regret bound, which matches our upper bound up to logarithmic factors. As a byproduct, our techniques naturally extend to learning bandit games and reward-free MARL within near optimal batch complexity. To the best of our knowledge, these are the first line of results towards understanding MARL with low adaptivity.
more » « less
Full Text Available
Nonparametric Classification on Low Dimensional Manifolds using Overparameterized Convolutional Residual Networks

Zhang, Zixuan; Zhang, Kaiqi; Chen, Minshuo; Takeda, Yuma; Wang, Mengdi; Zhao, Tuo; Wang, Yu-Xiang (October 2024, Advances in neural information processing systems)

Convolutional residual neural networks (ConvResNets), though overparameterized, can achieve remarkable prediction performance in practice, which cannot be well explained by conventional wisdom. To bridge this gap, we study the performance of ConvResNeXts, which cover ConvResNets as a special case, trained with weight decay from the perspective of nonparametric classification. Our analysis allows for infinitely many building blocks in ConvResNeXts, and shows that weight decay implicitly enforces sparsity on these blocks. Specifically, we consider a smooth target function supported on a low-dimensional manifold, then prove that ConvResNeXts can adapt to the function smoothness and low-dimensional structures and efficiently learn the function without suffering from the curse of dimensionality. Our findings partially justify the advantage of overparameterized ConvResNeXts over conventional machine learning models.
more » « less
Full Text Available
The Russia-Ukraine war reduced food production and exports with a disparate geographical impact worldwide

https://doi.org/10.1038/s43247-024-01915-5

Jia, Nan; Xia, Zilong; Li, Yinshuai; Yu, Xiang; Wu, Xutong; Li, Yingjie; Su, Rongfei; Wang, Mengting; Chen, Ruishan; Liu, Jianguo (December 2024, Communications Earth & Environment)

Full Text Available

« Prev Next »

Search for: All records