Search for: All records

Creators/Authors contains: "Zhou, Mo"

« Prev Next »

Total Resources

9

Resource Type
Conference Paper

6

Conference Proceeding

0

Dataset

0

Journal Article

3

Workshop Report

0

Availability
Full Text / Resource Available

8

Citation Only

1

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Implicit Regularization Leads to Benign Overfitting for Sparse Linear Regression

Zhou, Mo ; Ge, Rong ( January 2023 , International Conference on Machine Learning)

In deep learning, often the training process nds an interpolator (a solution with 0 training loss), but the test loss is still low. This phenomenon, known as benign overfitting, is a major mystery that received a lot of recent attention. One common mechanism for benign overfitting is implicit regularization, where the training process leads to additional properties for the interpolator, often characterized by minimizing certain norms. However, even for a simple sparse linear regression problem y = Ax+ noise with sparse x , neither minimum l_1 orl_`2 norm interpolator gives the optimal test loss. In this work, we give a different parametrization of the model which leads to a new implicit regularization effect that combines the benefit of l_1 and l_2 interpolators. We show that training our new model via gradient descent leads to an interpolator with near-optimal test loss. Our result is based on careful analysis of the training dynamics and provides another example of implicit regularization effect that goes beyond norm minimization.
more » « less
Full Text Available
Understanding The Robustness of Self-supervised Learning Through Topic Modeling

Luo, Zeping ; Weng, Cindy ; Wu, Shiyou ; Zhou, Mo ; Ge, Rong ( January 2023 , International Conference on Learning Representations)

Self-supervised learning has significantly improved the performance of many NLP tasks. In this paper, we highlight a key advantage of self-supervised learning - when applied to data generated by topic models, self-supervised learning can be oblivious to the specific model, and hence is less susceptible to model misspecification. In particular, we prove that commonly used self-supervised objectives based on reconstruction or contrastive samples can both recover useful posterior information for general topic models. Empirically, we show that the same objectives can perform competitively against posterior inference using the correct model, while outperforming posterior inference using misspecified model.
more » « less
Full Text Available
Understanding Edge-of-Stability Training Dynamics with a Minimalist Example

Zhu, Xingyu ; Wang, Zixuan ; Wang, Xiang ; Zhou, Mo ; Ge, Rong ( January 2023 , International Conference on Learning Representations)

Recently, researchers observed that gradient descent for deep neural networks operates in an “edge-of-stability” (EoS) regime: the sharpness (maximum eigenvalue of the Hessian) is often larger than stability threshold 2/\eta (where \eta is the step size). Despite this, the loss oscillates and converges in the long run, and the sharpness at the end is just slightly below 2/\eta . While many other well-understood nonconvex objectives such as matrix factorization or two-layer networks can also converge despite large sharpness, there is often a larger gap between sharpness of the endpoint and 2/\eta . In this paper, we study EoS phenomenon by constructing a simple function that has the same behavior. We give rigorous analysis for its training dynamics in a large local region and explain why the fnal converging point has sharpness close to 2/\eta . Globally we observe that the training dynamics for our example have an interesting bifurcating behavior, which was also observed in the training of neural nets.
more » « less
Full Text Available
A Local Convergence Theory for Mildly Over-Parameterized Two-Layer Neural Network

Zhou, Mo ; Ge, Rong ; Jin, Chi ( July 2021 , COLT)

While over-parameterization is widely believed to be crucial for the success of optimization for the neural networks, most existing theories on over-parameterization do not fully explain the reason -- they either work in the Neural Tangent Kernel regime where neurons don't move much, or require an enormous number of neurons. In practice, when the data is generated using a teacher neural network, even mildly over-parameterized neural networks can achieve 0 loss and recover the directions of teacher neurons. In this paper we develop a local convergence theory for mildly over-parameterized two-layer neural net. We show that as long as the loss is already lower than a threshold (polynomial in relevant parameters), all student neurons in an over-parameterized two-layer neural network will converge to one of teacher neurons, and the loss will go to 0. Our result holds for any number of student neurons as long as it is at least as large as the number of teacher neurons, and our convergence rate is independent of the number of student neurons. A key component of our analysis is the new characterization of local optimization landscape -- we show the gradient satisfies a special case of Lojasiewicz property which is different from local strong convexity or PL conditions used in previous work.
more » « less
Full Text Available
Actor-Critic Method for High Dimensional Static Hamilton--Jacobi--Bellman Partial Differential Equations based on Neural Networks

https://doi.org/10.1137/21M1402303

Zhou, Mo ; Han, Jiequn ; Lu, Jianfeng ( January 2021 , SIAM Journal on Scientific Computing)

Full Text Available
Towards Understanding the Importance of Shortcut Connections in Residual Networks

Liu, Tianyi Liu ; Chen, Minshuo ; Zhou, Mo ; Du, Simon ; Zhou, Enlu ; Zhao, Tuo. ( December 2019 , Advances in neural information processing systems)

Full Text Available
Toward Understanding the Importance of Noise in Training Neural Networks

Zhou, Mo ; Liu, Tianyi ; Li, Yan ; Lin, Dachao ; Zhou, Enlu ; Zhao, Tuo ( June 2019 , International Conference on Machine Learning)

Numerous empirical evidence has corroborated that the noise plays a crucial rule in effective and efficient training of deep neural networks. The theory behind, however, is still largely unknown. This paper studies this fundamental problem through training a simple two-layer convolutional neural network model. Although training such a network requires to solve a non-convex optimization problem with a spurious local optimum and a global optimum, we prove that a perturbed gradient descent algorithm in conjunction with noise annealing is guaranteed to converge to a global optimum in polynomial time with arbitrary initialization. This implies that the noise enables the algorithm to efficiently escape from the spurious local optimum. Numerical experiments are provided to support our theory.
more » « less
Full Text Available
Tree diversity reduces variability in sapling survival under drought

https://doi.org/10.1111/1365-2745.14294

Blondeel, Haben ; Guillemot, Joannès ; Martin‐StPaul, Nicolas ; Druel, Arsène ; Bilodeau‐Gauthier, Simon ; Bauhus, Jürgen ; Grossiord, Charlotte ; Hector, Andrew ; Jactel, Hervé ; Jensen, Joel ; et al ( April 2024 , Journal of Ecology)

Abstract
Enhancing tree diversity may be important to fostering resilience to drought‐related climate extremes. So far, little attention has been given to whether tree diversity can increase the survival of trees and reduce its variability in young forest plantations.
We conducted an analysis of seedling and sapling survival from 34 globally distributed tree diversity experiments (363,167 trees, 168 species, 3744 plots, 7 biomes) to answer two questions: (1) Do drought and tree diversity alter the mean and variability in plot‐level tree survival, with higher and less variable survival as diversity increases? and (2) Do species that survive poorly in monocultures survive better in mixtures and do specific functional traits explain monoculture survival?
Tree species richness reduced variability in plot‐level survival, while functional diversity (Rao's Q entropy) increased survival and also reduced its variability. Importantly, the reduction in survival variability became stronger as drought severity increased. We found that species with low survival in monocultures survived comparatively better in mixtures when under drought. Species survival in monoculture was positively associated with drought resistance (indicated by hydraulic traits such as turgor loss point), plant height and conservative resource‐acquisition traits (e.g. low leaf nitrogen concentration and small leaf size).
Synthesis.The findings highlight: (1) The effectiveness of tree diversity for decreasing the variability in seedling and sapling survival under drought; and (2) the importance of drought resistance and associated traits to explain altered tree species survival in response to tree diversity and drought. From an ecological perspective, we recommend mixing be considered to stabilize tree survival, particularly when functionally diverse forests with drought‐resistant species also promote high survival of drought‐sensitive species.

more » « less
Co-limitation towards lower latitudes shapes global forest diversity gradients

https://doi.org/10.1038/s41559-022-01831-x

Liang, Jingjing ; Gamarra, Javier G. ; Picard, Nicolas ; Zhou, Mo ; Pijanowski, Bryan ; Jacobs, Douglass F. ; Reich, Peter B. ; Crowther, Thomas W. ; Nabuurs, Gert-Jan ; de-Miguel, Sergio ; et al ( October 2022 , Nature Ecology & Evolution)

Full Text Available