Learning Sparse Nonparametric DAGs
We develop a framework for learning sparse
nonparametric directed acyclic graphs (DAGs)
from data. Our approach is based on a recent algebraic characterization of DAGs that
led to a fully continuous program for scorebased learning of DAG models parametrized
by a linear structural equation model (SEM).
We extend this algebraic characterization to
nonparametric SEM by leveraging nonparametric sparsity based on partial derivatives,
resulting in a continuous optimization problem that can be applied to a variety of nonparametric and semiparametric models including GLMs, additive noise models, and
index models as special cases. Unlike existing approaches that require specific modeling choices, loss functions, or algorithms, we
present a completely general framework that
can be applied to general nonlinear models
(e.g. without additive noise), general differentiable loss functions, and generic blackbox
optimization routines.
 Award ID(s):
 1909816
 Publication Date:
 NSFPAR ID:
 10197055
 Journal Name:
 Proceedings of Machine Learning Research
 Volume:
 108
 Page Range or eLocationID:
 34143425
 ISSN:
 26403498
 Sponsoring Org:
 National Science Foundation
More Like this

We develop a framework for learning sparse nonparametric directed acyclic graphs (DAGs) from data. Our approach is based on a recent algebraic characterization of DAGs that led to a fully continuous program for scorebased learning of DAG models parametrized by a linear structural equation model (SEM). We extend this algebraic characterization to nonparametric SEM by leveraging nonparametric sparsity based on partial derivatives, resulting in a continuous optimization problem that can be applied to a variety of nonparametric and semiparametric models including GLMs, additive noise models, and index models as special cases. Unlike existing approaches that require specific modeling choices, lossmore »

Embedding properties of network realizations of dissipative reduced order models Jörn Zimmerling, Mikhail Zaslavsky,Rob Remis, Shasri Moskow, Alexander Mamonov, Murthy Guddati, Vladimir Druskin, and Liliana Borcea Mathematical Sciences Department, Worcester Polytechnic Institute https://www.wpi.edu/people/vdruskin Abstract Realizations of reduced order models of passive SISO or MIMO LTI problems can be transformed to tridiagonal and blocktridiagonal forms, respectively, via dierent modications of the Lanczos algorithm. Generally, such realizations can be interpreted as ladder resistorcapacitorinductor (RCL) networks. They gave rise to network syntheses in the rst half of the 20th century that was at the base of modern electronics design and consecutively to MORmore »

We propose a novel family of connectionist models based on kernel machines and consider the problem of learning layer by layer a compositional hypothesis class (i.e., a feedforward, multilayer architecture) in a supervised setting. In terms of the models, we present a principled method to “kernelize” (partly or completely) any neural network (NN). With this method, we obtain a counterpart of any given NN that is powered by kernel machines instead of neurons. In terms of learning, when learning a feedforward deep architecture in a supervised setting, one needs to train all the components simultaneously using backpropagation (BP) since theremore »

Integrating regularization methods with standard loss functions such as the least squares, hinge loss, etc., within a regression framework has become a popular choice for researchers to learn predictive models with lower variance and better generalization ability. Regularizers also aid in building interpretable models with highdimensional data which makes them very appealing. It is observed that each regularizer is uniquely formulated in order to capture dataspecific properties such as correlation, structured sparsity and temporal smoothness. The problem of obtaining a consensus among such diverse regularizers while learning a predictive model is extremely important in order to determine the optimal regularizermore »

While deep learning is successful in a number of applications, it is not yet well understood theoretically. A theoretical characterization of deep learning should answer questions about their approximation power, the dynamics of optimization, and good outofsample performance, despite overparameterization and the absence of explicit regularization. We review our recent results toward this goal. In approximation theory both shallow and deep networks are known to approximate any continuous functions at an exponential cost. However, we proved that for certain types of compositional functions, deep networks of the convolutional type (even without weight sharing) can avoid the curse of dimensionality. Inmore »