NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Information criteria for model selection

https://doi.org/10.1002/wics.1607

Zhang, Jiawei; Yang, Yuhong; Ding, Jie (February 2023, WIREs Computational Statistics)

Abstract The rapid development of modeling techniques has brought many opportunities for data‐driven discovery and prediction. However, this also leads to the challenge of selecting the most appropriate model for any particular data task. Information criteria, such as the Akaike information criterion (AIC) and Bayesian information criterion (BIC), have been developed as a general class of model selection methods with profound connections with foundational thoughts in statistics and information theory. Many perspectives and theoretical justifications have been developed to understand when and how to use information criteria, which often depend on particular data circumstances. This review article will revisit information criteria by summarizing their key concepts, evaluation metrics, fundamental properties, interconnections, recent advancements, and common misconceptions to enrich the understanding of model selection in general. This article is categorized under:Data: Types and Structure > Traditional Statistical DataStatistical Learning and Exploratory Methods of the Data Sciences > Modeling MethodsStatistical and Graphical Methods of Data Analysis > Information Theoretic MethodsStatistical Models > Model Selection
more » « less
Distributed Architecture Search Over Heterogeneous Distributions

Mushtaq, Erum; He, Chaoyang; Ding, Jie; Avestimehr, Salman (November 2023, Transactions on machine learning research)

Federated learning (FL) is an efficient learning framework that assists distributed machine learning when data cannot be shared with a centralized server. Recent advancements in FL use predefined architecture-based learning for all clients. However, given that clients’ data are invisible to the server and data distributions are non-identical across clients, a predefined architecture discovered in a centralized setting may not be an optimal solution for all the clients in FL. Motivated by this challenge, we introduce SPIDER, an algorithmic frame- work that aims to Search PersonalIzed neural architecture for feDERated learning. SPIDER is designed based on two unique features: (1) alternately optimizing one architecture- homogeneous global model in a generic FL manner and architecture-heterogeneous local models that are connected to the global model by weight-sharing-based regularization, (2) achieving architecture-heterogeneous local models by a perturbation-based neural architecture search method. Experimental results demonstrate superior prediction performance compared with other state-of-the-art personalization methods. Code is available at https://github.com/ErumMushtaq/SPIDER.git.
more » « less
Full Text Available
Exploring Gradient Oscillation in Deep Neural Network Training

Morchdi, Chedi; Zhou, Yi; Ding, Jie; Wang, Bei (September 2023, IEEE)

Understanding optimization in deep learning is a fundamental problem, and recent findings have challenged the previously held belief that gradient descent stably trains deep networks. In this study, we delve deeper into the instability of gradient descent during the training of deep networks. By employing gradient descent to train various modern deep networks, we provide empirical evidence demonstrating that a significant portion of the optimization progress occurs through the utilization of oscillating gradients. These gradients exhibit a high negative correlation between adjacent iterations. Further- more, we make the following noteworthy observations about these gradient oscillations (GO): (i) GO manifests in different training stages for networks with diverse architectures; (ii) when using a large learning rate, GO consistently emerges across all layers of the networks; and (iii) when employing a small learning rate, GO is more prominent in the input layers compared to the output layers. These discoveries indicate that GO is an inherent characteristic of training different types of neural networks and may serve as a source of inspiration for the development of novel optimizer designs.
more » « less
Full Text Available
Visualizing and Analyzing the Topology of Neuron Activations in Deep Adversarial Training

Zhou, Youjia; Zhou, Yi; Ding, Jie; Wang, Bei (July 2023, Workshop on Topology, Algebra, and Geometry in Machine Learning (TAG-ML) at the 40th International Conference on Machine Learning)

Deep models are known to be vulnerable to data adversarial attacks, and many adversarial training techniques have been developed to improve their adversarial robustness. While data adversaries attack model predictions through modifying data, little is known about their impact on the neuron activations produced by the model, which play a crucial role in determining the model’s predictions and interpretability. In this work, we aim to develop a topological understanding of adversarial training to enhance its interpretability. We analyze the topological structure—in particular, mapper graphs—of neuron activations of data samples produced by deep adversarial training. Each node of a mapper graph represents a cluster of activations, and two nodes are connected by an edge if their corresponding clusters have a nonempty intersection. We provide an interactive visualization tool that demonstrates the utility of our topological framework in exploring the activation space. We found that stronger attacks make the data samples more indistinguishable in the neuron activation space that leads to a lower accuracy. Our tool also provides a natural way to identify the vulnerable data samples that may be useful in improving model robustness.
more » « less
Full Text Available
Assisted Learning for Organizations with Limited Imbalanced Data

Chen, C.; Zhou, J.; Ding, J.; Zhou, Y. (May 2023, Transactions on machine learning research)

Full Text Available
Provable Identifiability of Two-Layer ReLU Neural Networks via LASSO Regularization

https://doi.org/10.1109/TIT.2023.3274152

Li, Gen; Wang, Ganghua; Ding, Jie (January 2023, IEEE Transactions on Information Theory)

Full Text Available
Assisted Unsupervised Domain Adaptation

Chen, C.; Zhang, J.; Ding, J.; Zhou, Y. (January 2023, Abstracts of papers IEEE International Symposium on Information Theory)

Full Text Available
Mismatched Supervised Learning

https://doi.org/10.1109/ICASSP43922.2022.9747362

Xian, Xun; Hong, Mingyi; Ding, Jie (May 2022, International Conference on Acoustics, Speech, & Signal Processing (ICASSP))

Full Text Available
Parallel Assisted Learning

https://doi.org/10.1109/TSP.2022.3229637

Wang, Xinran; Zhang, Jiawei; Hong, Mingyi; Yang, Yuhong; Ding, Jie (January 2022, IEEE Transactions on Signal Processing)

Full Text Available
SemiFL: Semi-Supervised Federated Learning for Unlabeled Clients with Alternate Training

Diao, E.; Ding, J.; Tarokh, V. (January 2022, Advances in neural information processing systems)

Full Text Available

Search for: All records