skip to main content

Search for: All records

Award ID contains: 1712996

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Banerjee, Arindam ; Fukumizu, Kenji (Ed.)
    We consider the contextual bandit problem, where a player sequentially makes decisions based on past observations to maximize the cumulative reward. Although many algorithms have been proposed for contextual bandit, most of them rely on finding the maximum likelihood estimator at each iteration, which requires 𝑂(𝑡) time at the 𝑡-th iteration and are memory inefficient. A natural way to resolve this problem is to apply online stochastic gradient descent (SGD) so that the per-step time and memory complexity can be reduced to constant with respect to 𝑡, but a contextual bandit policy based on online SGD updates that balances exploration and exploitation has remained elusive. In this work, we show that online SGD can be applied to the generalized linear bandit problem. The proposed SGD-TS algorithm, which uses a single-step SGD update to exploit past information and uses Thompson Sampling for exploration, achieves 𝑂̃ (𝑇‾‾√) regret with the total time complexity that scales linearly in 𝑇 and 𝑑, where 𝑇 is the total number of rounds and 𝑑 is the number of features. Experimental results show that SGD-TS consistently outperforms existing algorithms on both synthetic and real datasets.
  2. Summary The fused lasso, also known as total-variation denoising, is a locally adaptive function estimator over a regular grid of design points. In this article, we extend the fused lasso to settings in which the points do not occur on a regular grid, leading to a method for nonparametric regression. This approach, which we call the $K$-nearest-neighbours fused lasso, involves computing the $K$-nearest-neighbours graph of the design points and then performing the fused lasso over this graph. We show that this procedure has a number of theoretical advantages over competing methods: specifically, it inherits local adaptivity from its connection to the fused lasso, and it inherits manifold adaptivity from its connection to the $K$-nearest-neighbours approach. In a simulation study and an application to flu data, we show that excellent results are obtained. For completeness, we also study an estimator that makes use of an $\epsilon$-graph rather than a $K$-nearest-neighbours graph and contrast it with the $K$-nearest-neighbours fused lasso.
  3. This paper addresses detecting anomalous patterns in images, time-series, and tensor data when the location and scale of the pattern and the pattern itself is unknown a priori. The multiscale scan statistic convolves the proposed pattern with the image at various scales and returns the maximum of the resulting tensor. Scale corrected multiscale scan statistics apply different standardizations at each scale, and the limiting distribution under the null hypothesis---that the data is only noise---is known for smooth patterns. We consider the problem of simultaneously learning and detecting the anomalous pattern from a dictionary of smooth patterns and a database of many tensors. To this end, we show that the multiscale scan statistic is a subexponential random variable, and prove a chaining lemma for standardized suprema, which may be of independent interest. Then by averaging the statistics over the database of tensors we can learn the pattern and obtain Bernstein-type error bounds. We will also provide a construction of an epsilon-net of the location and scale parameters, providing a computationally tractable approximation with similar error bounds.
  4. In this paper, we propose a listwise approach for constructing user-specific rankings in recommendation systems in a collaborative fashion. We contrast the listwise approach to previous pointwise and pairwise approaches, which are based on treating either each rating or each pairwise comparison as an independent instance respectively. By extending the work of (Cao et al. 2007), we cast listwise collaborative ranking as maximum likelihood under a permutation model which applies probability mass to permutations based on a low rank latent score matrix. We present a novel algorithm called SQL-Rank, which can accommodate ties and missing data and can run in linear time. We develop a theoretical framework for analyzing listwise ranking methods based on a novel representation theory for the permutation model. Applying this framework to collaborative ranking, we derive asymptotic statistical rates as the number of users and items grow together. We conclude by demonstrating that our SQL-Rank method often outperforms current state-of-the-art algorithms for implicit feedback such as Weighted-MF and BPR and achieve favorable results when compared to explicit feedback algorithms such as matrix factorization and collaborative ranking.
  5. We consider the problem of estimating the values of a function over n nodes of a d-dimensional grid graph (having equal side lengths) from noisy observations. The function is assumed to be smooth, but is allowed to exhibit different amounts of smoothness at different regions in the grid. Such heterogeneity eludes classical measures of smoothness from nonparametric statistics, such as Holder smoothness. Meanwhile, total variation (TV) smoothness classes allow for heterogeneity, but are restrictive in another sense: only constant functions count as perfectly smooth (achieve zero TV). To move past this, we define two new higher-order TV classes, based on two ways of compiling the discrete derivatives of a parameter across the nodes. We relate these two new classes to Holder classes, and derive lower bounds on their minimax errors. We also analyze two naturally associated trend filtering methods; when d=2, each is seen to be rate optimal over the appropriate class.
  6. In the 1-dimensional multiple changepoint detection problem, we derive a new fast error rate for the fused lasso estimator, under the assumption that the mean vector has a sparse number of changepoints. This rate is seen to be suboptimal (compared to the minimax rate) by only a factor of loglogn. Our proof technique is centered around a novel construction that we call a lower interpolant. We extend our results to misspecified models and exponential family distributions. We also describe the implications of our error analysis for the approximate screening of changepoints.
  7. A point-to-point process describes a dynamic network where a set of edge events are observed, each of which is associated with a time of occurrence and two vertices lying in their state spaces. This study intends to investigate one application of such processes, using NYC Taxi and Limousine Commission dataset that reports taxi trips between two locations at a certain time. Here a point-to-point process is formed with edge events being taxi trips and the vertices adjacent to the edge events are pick-up and drop-off locations, described by latitude and longitude pairs. The intensity of an edge event can have a temporal dependence in addition to being dependent on a latent, spatially-coherent community structure for the vertices. To this end, we have developed a methodology that estimates a spatially smoothed community structure and localizes temporal change-points for point-to-point processes. By applying this to our dataset, we can explore the spatio-temporal dynamics of the demand of taxi trips. More specifically, with reasonable assumptions, the latent community structure is estimated by spectral partitioning based on a low-rank reconstruction of aggregated taxi-trip network; and the temporal change-point localization can be carried out by solving a matrix group fused LASSO program.