skip to main content


Title: Adaptive nonparametric regression with the K-nearest neighbour fused lasso
Summary The fused lasso, also known as total-variation denoising, is a locally adaptive function estimator over a regular grid of design points. In this article, we extend the fused lasso to settings in which the points do not occur on a regular grid, leading to a method for nonparametric regression. This approach, which we call the $K$-nearest-neighbours fused lasso, involves computing the $K$-nearest-neighbours graph of the design points and then performing the fused lasso over this graph. We show that this procedure has a number of theoretical advantages over competing methods: specifically, it inherits local adaptivity from its connection to the fused lasso, and it inherits manifold adaptivity from its connection to the $K$-nearest-neighbours approach. In a simulation study and an application to flu data, we show that excellent results are obtained. For completeness, we also study an estimator that makes use of an $\epsilon$-graph rather than a $K$-nearest-neighbours graph and contrast it with the $K$-nearest-neighbours fused lasso.  more » « less
Award ID(s):
1712996
NSF-PAR ID:
10383980
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Biometrika
Volume:
107
Issue:
2
ISSN:
0006-3444
Page Range / eLocation ID:
293 to 310
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In sparse linear regression, the SLOPE estimator generalizes LASSO by assigning magnitude-dependent regular- izations to different coordinates of the estimate. In this paper, we present an asymptotically exact characterization of the performance of SLOPE in the high-dimensional regime where the number of unknown parameters grows in proportion to the number of observations. Our asymptotic characterization enables us to derive optimal regularization sequences to either minimize the MSE or to maximize the power in variable selection under any given level of Type-I error. In both cases, we show that the optimal design can be recast as certain infinite-dimensional convex optimization problems, which have efficient and accurate finite-dimensional approximations. Numerical simulations verify our asymptotic predictions. They also demonstrate the superi- ority of our optimal design over LASSO and a regularization sequence previously proposed in the literature. 
    more » « less
  2. In this paper, we propose MetaMobi, a novel spatio-temporal multi-dots connectivity-aware modeling and Meta model update approach for crowd Mobility learning. MetaMobi analyzes real-world Wi-Fi association data collected from our campus wireless infrastructure, with the goal towards enabling a smart connected campus. Specifically, MetaMobi aims at addressing the following two major challenges with existing crowd mobility sensing system designs: (a) how to handle the spatially, temporally, and contextually varying features in large-scale human crowd mobility distributions; and (b) how to adapt to the impacts of such crowd mobility patterns as well as the dynamic changes in crowd sensing infrastructures. To handle the first challenge, we design a novel multi-dots connectivity-aware learning approach, which jointly learns the crowd flow time series of multiple buildings with fusion of spatial graph connectivities and temporal attention mechanisms. Furthermore, to overcome the adaptivity issues due to changes in the crowd sensing infrastructures (e.g., installation of new ac- cess points), we further design a novel meta model update approach with Bernoulli dropout, which mitigates the over- fitting behaviors of the model given few-shot distributions of new crowd mobility datasets. Extensive experimental evaluations based on the real-world campus wireless dataset (including over 76 million Wi-Fi association and disassociation records) demonstrate the accuracy, effectiveness, and adaptivity of MetaMobi in forecasting the campus crowd flows, with 30% higher accuracy compared to the state-of-the-art approaches. 
    more » « less
  3. We initiate the algorithmic study of retracting a graph into a cycle in the graph, which seeks a mapping of the graph vertices to the cycle vertices so as to minimize the maximum stretch of any edge, subject to the constraint that the restriction of the mapping to the cycle is the identity map. This problem has its roots in the rich theory of retraction of topological spaces, and has strong ties to well-studied metric embedding problems such as minimum bandwidth and0-extension. Our first result is anO(min{k,√n})-approximation for retracting any graph on n nodes to a cycle with k nodes. We also show a surprising connection to Sperner’s Lemma that rules out the possibility of improving this result using certain natural convex relaxations of the problem. Nevertheless, if the problem is restricted to planar graphs, we show that we can overcome these integrality gaps by giving an optimal combinatorial algorithm, which is the technical centerpiece of the paper. Building on our planar graph algorithm, we also obtain a constant-factor approximation algorithm for retraction of points in the Euclidean plane to a uniform cycle. 
    more » « less
  4. Given a collection of vectors, the approximate K-nearest-neighbor graph (KGraph for short) connects every vector to its approximate K-nearest-neighbors (KNN for short). KGraph plays an important role in high dimensional data visualization, semantic search, manifold learning, and machine learning. The vectors are typically vector representations of real-world objects (e.g., images and documents), which often come with a few structured attributes, such as times-tamps and locations. In this paper, we study the all-range approximate K-nearest-neighbor graph (ARKGraph) problem. Specifically, given a collection of vectors, each associated with a numerical search key (e.g., a timestamp), we aim to build an index that takes a search key range as the query and returns the KGraph of vectors whose search keys are within the query range. ARKGraph can facilitate interactive high dimensional data visualization, data mining, etc. A key challenge of this problem is the huge index size. This is because, given n vectors, a brute-force index stores a KGraph for every search key range, which results in O (K n 3 ) index size as there are O ( n 2 ) search key ranges and each KGraph takes O (K n ) space. We observe that the KNN of a vector in nearby ranges are often the same, which can be grouped together to save space. Based on this observation, we propose a series of novel techniques that reduce the index size significantly to just O (K n log n ) in the average case. Furthermore, we develop an efficient indexing algorithm that constructs the optimized ARKGraph index directly without exhaustively calculating the distance between every pair of vectors. To process a query, for each vector in the query range, we only need O (log log n + K log K) to restore its KNN in the query range from the optimized ARKGraph index. We conducted extensive experiments on real-world datasets. Experimental results show that our optimized ARKGraph index achieved a small index size, low query latency, and good scalability. Specifically, our approach was 1000x faster than the baseline method that builds a KGraph for all the vectors in the query range on-the-fly. 
    more » « less
  5. Abstract

    We consider estimating average treatment effects (ATE) of a binary treatment in observational data when data‐driven variable selection is needed to select relevant covariates from a moderately large number of available covariates . To leverage covariates among predictive of the outcome for efficiency gain while using regularization to fit a parametric propensity score (PS) model, we consider a dimension reduction of based on fitting both working PS and outcome models using adaptive LASSO. A novel PS estimator, the Double‐index Propensity Score (DiPS), is proposed, in which the treatment status is smoothed over the linear predictors for from both the initial working models. The ATE is estimated by using the DiPS in a normalized inverse probability weighting estimator, which is found to maintain double robustness and also local semiparametric efficiency with a fixed number of covariatesp. Under misspecification of working models, the smoothing step leads to gains in efficiency and robustness over traditional doubly robust estimators. These results are extended to the case wherepdiverges with sample size and working models are sparse. Simulations show the benefits of the approach in finite samples. We illustrate the method by estimating the ATE of statins on colorectal cancer risk in an electronic medical record study and the effect of smoking on C‐reactive protein in the Framingham Offspring Study.

     
    more » « less