skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Convergence of graph Laplacian with kNN self-tuned kernels
Abstract Kernelized Gram matrix $$W$$ constructed from data points $$\{x_i\}_{i=1}^N$$ as $$W_{ij}= k_0( \frac{ \| x_i - x_j \|^2} {\sigma ^2} ) $$ is widely used in graph-based geometric data analysis and unsupervised learning. An important question is how to choose the kernel bandwidth $$\sigma $$, and a common practice called self-tuned kernel adaptively sets a $$\sigma _i$$ at each point $$x_i$$ by the $$k$$-nearest neighbor (kNN) distance. When $$x_i$$s are sampled from a $$d$$-dimensional manifold embedded in a possibly high-dimensional space, unlike with fixed-bandwidth kernels, theoretical results of graph Laplacian convergence with self-tuned kernels have been incomplete. This paper proves the convergence of graph Laplacian operator $$L_N$$ to manifold (weighted-)Laplacian for a new family of kNN self-tuned kernels $$W^{(\alpha )}_{ij} = k_0( \frac{ \| x_i - x_j \|^2}{ \epsilon \hat{\rho }(x_i) \hat{\rho }(x_j)})/\hat{\rho }(x_i)^\alpha \hat{\rho }(x_j)^\alpha $$, where $$\hat{\rho }$$ is the estimated bandwidth function by kNN and the limiting operator is also parametrized by $$\alpha $$. When $$\alpha = 1$$, the limiting operator is the weighted manifold Laplacian $$\varDelta _p$$. Specifically, we prove the point-wise convergence of $$L_N f $$ and convergence of the graph Dirichlet form with rates. Our analysis is based on first establishing a $C^0$ consistency for $$\hat{\rho }$$ which bounds the relative estimation error $$|\hat{\rho } - \bar{\rho }|/\bar{\rho }$$ uniformly with high probability, where $$\bar{\rho } = p^{-1/d}$$ and $$p$$ is the data density function. Our theoretical results reveal the advantage of the self-tuned kernel over the fixed-bandwidth kernel via smaller variance error in low-density regions. In the algorithm, no prior knowledge of $$d$$ or data density is needed. The theoretical results are supported by numerical experiments on simulated data and hand-written digit image data.  more » « less
Award ID(s):
2007040
PAR ID:
10335267
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Information and Inference: A Journal of the IMA
ISSN:
2049-8772
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract We study the statistical estimation problem of orthogonal group synchronization and rotation group synchronization. The model is $$Y_{ij} = Z_i^* Z_j^{*T} + \sigma W_{ij}\in{\mathbb{R}}^{d\times d}$$ where $$W_{ij}$$ is a Gaussian random matrix and $$Z_i^*$$ is either an orthogonal matrix or a rotation matrix, and each $$Y_{ij}$$ is observed independently with probability $$p$$. We analyze an iterative polar decomposition algorithm for the estimation of $Z^*$ and show it has an error of $$(1+o(1))\frac{\sigma ^2 d(d-1)}{2np}$$ when initialized by spectral methods. A matching minimax lower bound is further established that leads to the optimality of the proposed algorithm as it achieves the exact minimax risk. 
    more » « less
  2. null (Ed.)
    ABSTRACT We report three-dimensional hydrodynamical simulations of shocks ($${\cal M_{\rm shock}}\ge 4$$) interacting with fractal multicloud layers. The evolution of shock–multicloud systems consists of four stages: a shock-splitting phase in which reflected and refracted shocks are generated, a compression phase in which the forward shock compresses cloud material, an expansion phase triggered by internal heating and shock re-acceleration, and a mixing phase in which shear instabilities generate turbulence. We compare multicloud layers with narrow ($$\sigma _{\rho }=1.9\bar{\rho }$$) and wide ($$\sigma _{\rho }=5.9\bar{\rho }$$) lognormal density distributions characteristic of Mach ≈ 5 supersonic turbulence driven by solenoidal and compressive modes. Our simulations show that outflowing cloud material contains imprints of the density structure of their native environments. The dynamics and disruption of multicloud systems depend on the porosity and the number of cloudlets in the layers. ‘Solenoidal’ layers mix less, generate less turbulence, accelerate faster, and form a more coherent mixed-gas shell than the more porous ‘compressive’ layers. Similarly, multicloud systems with more cloudlets quench mixing via a shielding effect and enhance momentum transfer. Mass loading of diffuse mixed gas is efficient in all models, but direct dense gas entrainment is highly inefficient. Dense gas only survives in compressive clouds, but has low speeds. If normalized with respect to the shock-passage time, the evolution shows invariance for shock Mach numbers ≥10 and different cloud-generating seeds, and slightly weaker scaling for lower Mach numbers and thinner cloud layers. Multicloud systems also have better convergence properties than single-cloud systems, with a resolution of eight cells per cloud radius being sufficient to capture their overall dynamics. 
    more » « less
  3. The third Painlevé equation in its generic form, often referred to as Painlevé-III($$D_6$$), is given by $$ \frac{{\rm d}^2u}{{\rm d}x^2} =\frac{1}{u}\left(\frac{{\rm d}u}{{\rm d}x} \right)^2-\frac{1}{x} \frac{{\rm d}u}{{\rm d}x} + \frac{\alpha u^2 + \beta}{x}+4u^3-\frac{4}{u}, \qquad \alpha,\beta \in \mathbb C. $$ Starting from a generic initial solution $$u_0(x)$$ corresponding to parameters $$\alpha$$, $$\beta$$, denoted as the triple $$(u_0(x),\alpha,\beta)$$, we apply an explicit Bäcklund transformation to generate a family of solutions $$(u_n(x),\alpha + 4n,\beta + 4n)$$ indexed by $$n \in \mathbb N$$. We study the large $$n$$ behavior of the solutions $$(u_n(x), \alpha + 4n, \beta + 4n)$$ under the scaling $x = z/n$ in two different ways: (a) analyzing the convergence properties of series solutions to the equation, and (b) using a Riemann-Hilbert representation of the solution $$u_n(z/n)$$. Our main result is a proof that the limit of solutions $$u_n(z/n)$$ exists and is given by a solution of the degenerate Painlevé-III equation, known as Painlevé-III($$D_8$$), $$ \frac{{\rm d}^2U}{{\rm d}z^2} =\frac{1}{U}\left(\frac{{\rm d}U}{{\rm d}z}\right)^2-\frac{1}{z} \frac{{\rm d}U}{{\rm d}z} + \frac{4U^2 + 4}{z}.$$ A notable application of our result is to rational solutions of Painlevé-III($$D_6$$), which are constructed using the seed solution $(1,4m,-4m)$ where $$m \in \mathbb C \setminus \big(\mathbb Z + \frac{1}{2}\big)$$ and can be written as a particular ratio of Umemura polynomials. We identify the limiting solution in terms of both its initial condition at $z = 0$ when it is well defined, and by its monodromy data in the general case. Furthermore, as a consequence of our analysis, we deduce the asymptotic behavior of generic solutions of Painlevé-III, both $$D_6$$ and $$D_8$$ at $z = 0$. We also deduce the large $$n$$ behavior of the Umemura polynomials in a neighborhood of $z = 0$. 
    more » « less
  4. Abstract In the (special) smoothing spline problem one considers a variational problem with a quadratic data fidelity penalty and Laplacian regularization. Higher order regularity can be obtained via replacing the Laplacian regulariser with a poly-Laplacian regulariser. The methodology is readily adapted to graphs and here we consider graph poly-Laplacian regularization in a fully supervised, non-parametric, noise corrupted, regression problem. In particular, given a dataset$$\{x_i\}_{i=1}^n$$ { x i } i = 1 n and a set of noisy labels$$\{y_i\}_{i=1}^n\subset \mathbb {R}$$ { y i } i = 1 n R we let$$u_n{:}\{x_i\}_{i=1}^n\rightarrow \mathbb {R}$$ u n : { x i } i = 1 n R be the minimizer of an energy which consists of a data fidelity term and an appropriately scaled graph poly-Laplacian term. When$$y_i = g(x_i)+\xi _i$$ y i = g ( x i ) + ξ i , for iid noise$$\xi _i$$ ξ i , and using the geometric random graph, we identify (with high probability) the rate of convergence of$$u_n$$ u n togin the large data limit$$n\rightarrow \infty $$ n . Furthermore, our rate is close to the known rate of convergence in the usual smoothing spline model. 
    more » « less
  5. Given only a finite collection of points sampled from a Riemannian manifold embedded in a Euclidean space, in this paper we propose a new method to numerically solve elliptic and parabolic partial differential equations (PDEs) supplemented with boundary conditions. Since the construction of triangulations on unknown manifolds can be both difficult and expensive, both in terms of computational and data requirements, our goal is to solve these problems without a triangulation. Instead, we rely only on using the sample points to define quadrature formulas on the unknown manifold. Our main tool is the diffusion maps algorithm. We re-analyze this well-known method in a variational sense for manifolds with boundary. Our main result is that the variational diffusion maps graph Laplacian is a consistent estimator of the Dirichlet energy on the manifold. This improves upon previous results and provides a rigorous justification of the well-known relationship between diffusion maps and the Neumann eigenvalue problem. Moreover, using semigeodesic coordinates we derive the first uniform asymptotic expansion of the diffusion maps kernel integral operator for manifolds with boundary. This expansion relies on a novel lemma which relates the extrinsic Euclidean distance to the coordinate norm in a normal collar of the boundary. We then use a recently developed method of estimating the distance to boundary function (notice that the boundary location is assumed to be unknown) to construct a consistent estimator for boundary integrals. Finally, by combining these various estimators, we illustrate how to impose Dirichlet and Neumann conditions for some common PDEs based on the Laplacian. Several numerical examples illustrate our theoretical findings. 
    more » « less