skip to main content

Title: Adaptive nonparametric regression with the K-nearest neighbour fused lasso
Summary The fused lasso, also known as total-variation denoising, is a locally adaptive function estimator over a regular grid of design points. In this article, we extend the fused lasso to settings in which the points do not occur on a regular grid, leading to a method for nonparametric regression. This approach, which we call the $K$-nearest-neighbours fused lasso, involves computing the $K$-nearest-neighbours graph of the design points and then performing the fused lasso over this graph. We show that this procedure has a number of theoretical advantages over competing methods: specifically, it inherits local adaptivity from its connection to the fused lasso, and it inherits manifold adaptivity from its connection to the $K$-nearest-neighbours approach. In a simulation study and an application to flu data, we show that excellent results are obtained. For completeness, we also study an estimator that makes use of an $\epsilon$-graph rather than a $K$-nearest-neighbours graph and contrast it with the $K$-nearest-neighbours fused lasso.
; ; ;
Award ID(s):
Publication Date:
Journal Name:
Page Range or eLocation-ID:
293 to 310
Sponsoring Org:
National Science Foundation
More Like this
  1. In sparse linear regression, the SLOPE estimator generalizes LASSO by assigning magnitude-dependent regular- izations to different coordinates of the estimate. In this paper, we present an asymptotically exact characterization of the performance of SLOPE in the high-dimensional regime where the number of unknown parameters grows in proportion to the number of observations. Our asymptotic characterization enables us to derive optimal regularization sequences to either minimize the MSE or to maximize the power in variable selection under any given level of Type-I error. In both cases, we show that the optimal design can be recast as certain infinite-dimensional convex optimization problems, which have efficient and accurate finite-dimensional approximations. Numerical simulations verify our asymptotic predictions. They also demonstrate the superi- ority of our optimal design over LASSO and a regularization sequence previously proposed in the literature.
  2. We initiate the algorithmic study of retracting a graph into a cycle in the graph, which seeks a mapping of the graph vertices to the cycle vertices so as to minimize the maximum stretch of any edge, subject to the constraint that the restriction of the mapping to the cycle is the identity map. This problem has its roots in the rich theory of retraction of topological spaces, and has strong ties to well-studied metric embedding problems such as minimum bandwidth and0-extension. Our first result is anO(min{k,√n})-approximation for retracting any graph on n nodes to a cycle with k nodes. We also show a surprising connection to Sperner’s Lemma that rules out the possibility of improving this result using certain natural convex relaxations of the problem. Nevertheless, if the problem is restricted to planar graphs, we show that we can overcome these integrality gaps by giving an optimal combinatorial algorithm, which is the technical centerpiece of the paper. Building on our planar graph algorithm, we also obtain a constant-factor approximation algorithm for retraction of points in the Euclidean plane to a uniform cycle.
  3. In this paper, we propose MetaMobi, a novel spatio-temporal multi-dots connectivity-aware modeling and Meta model update approach for crowd Mobility learning. MetaMobi analyzes real-world Wi-Fi association data collected from our campus wireless infrastructure, with the goal towards enabling a smart connected campus. Specifically, MetaMobi aims at addressing the following two major challenges with existing crowd mobility sensing system designs: (a) how to handle the spatially, temporally, and contextually varying features in large-scale human crowd mobility distributions; and (b) how to adapt to the impacts of such crowd mobility patterns as well as the dynamic changes in crowd sensing infrastructures. To handle the first challenge, we design a novel multi-dots connectivity-aware learning approach, which jointly learns the crowd flow time series of multiple buildings with fusion of spatial graph connectivities and temporal attention mechanisms. Furthermore, to overcome the adaptivity issues due to changes in the crowd sensing infrastructures (e.g., installation of new ac- cess points), we further design a novel meta model update approach with Bernoulli dropout, which mitigates the over- fitting behaviors of the model given few-shot distributions of new crowd mobility datasets. Extensive experimental evaluations based on the real-world campus wireless dataset (including over 76 million Wi-Fi association andmore »disassociation records) demonstrate the accuracy, effectiveness, and adaptivity of MetaMobi in forecasting the campus crowd flows, with 30% higher accuracy compared to the state-of-the-art approaches.« less
  4. Summary

    We consider the supervised classification setting, in which the data consist of p features measured on n observations, each of which belongs to one of K classes. Linear discriminant analysis (LDA) is a classical method for this problem. However, in the high dimensional setting where p≫n, LDA is not appropriate for two reasons. First, the standard estimate for the within-class covariance matrix is singular, and so the usual discriminant rule cannot be applied. Second, when p is large, it is difficult to interpret the classification rule that is obtained from LDA, since it involves all p features. We propose penalized LDA, which is a general approach for penalizing the discriminant vectors in Fisher’s discriminant problem in a way that leads to greater interpretability. The discriminant problem is not convex, so we use a minorization–maximization approach to optimize it efficiently when convex penalties are applied to the discriminant vectors. In particular, we consider the use of L1 and fused lasso penalties. Our proposal is equivalent to recasting Fisher’s discriminant problem as a biconvex problem. We evaluate the performances of the resulting methods on a simulation study, and on three gene expression data sets. We also survey past methods for extendingmore »LDA to the high dimensional setting and explore their relationships with our proposal.

    « less
  5. Abstract Kernelized Gram matrix $W$ constructed from data points $\{x_i\}_{i=1}^N$ as $W_{ij}= k_0( \frac{ \| x_i - x_j \|^2} {\sigma ^2} ) $ is widely used in graph-based geometric data analysis and unsupervised learning. An important question is how to choose the kernel bandwidth $\sigma $, and a common practice called self-tuned kernel adaptively sets a $\sigma _i$ at each point $x_i$ by the $k$-nearest neighbor (kNN) distance. When $x_i$s are sampled from a $d$-dimensional manifold embedded in a possibly high-dimensional space, unlike with fixed-bandwidth kernels, theoretical results of graph Laplacian convergence with self-tuned kernels have been incomplete. This paper proves the convergence of graph Laplacian operator $L_N$ to manifold (weighted-)Laplacian for a new family of kNN self-tuned kernels $W^{(\alpha )}_{ij} = k_0( \frac{ \| x_i - x_j \|^2}{ \epsilon \hat{\rho }(x_i) \hat{\rho }(x_j)})/\hat{\rho }(x_i)^\alpha \hat{\rho }(x_j)^\alpha $, where $\hat{\rho }$ is the estimated bandwidth function by kNN and the limiting operator is also parametrized by $\alpha $. When $\alpha = 1$, the limiting operator is the weighted manifold Laplacian $\varDelta _p$. Specifically, we prove the point-wise convergence of $L_N f $ and convergence of the graph Dirichlet form with rates. Our analysis is based on first establishing a $C^0$more »consistency for $\hat{\rho }$ which bounds the relative estimation error $|\hat{\rho } - \bar{\rho }|/\bar{\rho }$ uniformly with high probability, where $\bar{\rho } = p^{-1/d}$ and $p$ is the data density function. Our theoretical results reveal the advantage of the self-tuned kernel over the fixed-bandwidth kernel via smaller variance error in low-density regions. In the algorithm, no prior knowledge of $d$ or data density is needed. The theoretical results are supported by numerical experiments on simulated data and hand-written digit image data.« less