Sparse online learning has received extensive attention during
the past few years. Most of existing algorithms that
utilize ℓ1-norm regularization or ℓ1-ball projection assume
that the feature space is fixed or changes by following explicit
constraints. However, this assumption does not always
hold in many real applications. Motivated by this observation,
we propose a new online learning algorithm tailored for
data streams described by open feature spaces, where new
features can be occurred, and old features may be vanished
over various time spans. Our algorithm named RSOL provides
a strategy to adapt quickly to such feature dynamics
by encouraging sparse model representation with an ℓ1- and
ℓ2-mixed regularizer. We leverage the proximal operator of
the ℓ1,2-mixed norm and show that our RSOL algorithm enjoys
a closed-form solution at each iteration. A sub-linear
regret bound of our proposed algorithm is guaranteed with a
solid theoretical analysis. Empirical results benchmarked on
nine streaming datasets validate the effectiveness of the proposed
RSOL method over three state-of-the-art algorithms.
Keywords: online learning, sparse learning, streaming feature
selection, open feature spaces, ℓ1,2 mixed norm
more »
« less
Sketching Linear Classifiers over Data Streams
We introduce a new sub-linear space sketch the "Weight-Median
Sketch" for learning compressed linear classifiers over data streams
while supporting the efficient recovery of large-magnitude weights
in the model. This enables memory-limited execution of several
statistical analyses over streams, including online feature selection,
streaming data explanation, relative deltoid detection, and
streaming estimation of pointwise mutual information. Unlike related
sketches that capture the most frequently-occurring features
(or items) in a data stream, the Weight-Median Sketch captures
the features that are most discriminative of one stream (or class)
compared to another. The Weight-Median Sketch adopts the core
data structure used in the Count-Sketch, but, instead of sketching
counts, it captures sketched gradient updates to the model parameters.
We provide a theoretical analysis that establishes recovery
guarantees for batch and online learning, and demonstrate empirical
improvements in memory-accuracy trade-offs over alternative
memory-budgeted methods, including count-based sketches and
feature hashing.
more »
« less
- PAR ID:
- 10061701
- Date Published:
- Journal Name:
- SIGMOD
- Page Range / eLocation ID:
- 757 to 772
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)The problem of recovering heavy components of a high-dimensional vector from compressed data is of great interest in broad applications, such as feature extraction under scarce computing memory and distributed learning under limited bandwidth. Recently, a compression algorithm called count sketch has gained wide popularity to recover heavy components in various fields. In this paper, we carefully analyze count sketch and illustrate that its default recovery method, namely median filtering, has a distinct error pattern of reporting false positives. To counteract this error pattern, we propose a new scheme called zero checking which adopts a two-step recovery approach to improve the probability of detecting false positives. Our proposed technique builds on rigorous error analysis, which enables us to optimize the selection of a key design parameter for maximum performance gain. The empirical results show that our scheme achieves better recovery accuracy than median filtering and requires less samples to accurately recover heavy components.more » « less
-
In this paper, we explore a novel online learning setting, where the online learners are presented with “doubly-streaming” data. Namely, the data instances constantly streaming in are described by feature spaces that over-time evolve, with new features emerging and old features fading away. The main challenge of this problem lies in the fact that the newly emerging features are described by very few samples, resulting in weak learners that tend to make error predictions. A seemingly plausible idea to overcome the challenge is to establish a relationship between the old and new feature spaces, so that an online learner can leverage the knowledge learned from the old features to better the learning performance on the new features. Unfortunately, this idea does not scale up to high-dimensional feature spaces that entail very complex feature interplay. Specifically. a tradeoff between onlineness, which biases shallow learners, and expressiveness, which requires deep models, is inevitable. Motivated by this, we propose a novel paradigm, named Online Learning Deep models from Data of Double Streams (OLD3S), where a shared latent subspace is discovered to summarize information from the old and new feature spaces, building an intermediate feature mapping relationship. A key trait of OLD3S is to treat the model capacity as a learnable semantics, aiming to yield optimal model depth and parameters jointly in accordance with the complexity and non-linearity of the input data streams in an online fashion. To ablate its efficacy and applicability, two variants of OLD3S are proposed namely, OLD-Linear that learns the relationship by a linear function; and OLD-FD learns that two consecutive feature spaces pre-and-post evolution with fixed deep depth. Besides, instead of re-starting the entire learning process from scratch, OLD3S learns multiple newly emerging feature spaces in a lifelong manner, retaining the knowledge from the learned and vanished feature space to enjoy a jump-start of the new features’ learning process. Both theoretical analysis and empirical studies substantiate the viability and effectiveness of our proposed approach.more » « less
-
We initiate a systematic study of linear sketching over $\ftwo$. For a given Boolean function treated as $f \colon \ftwo^n \to \ftwo$ a randomized $\ftwo$-sketch is a distribution $\mathcal M$ over $d \times n$ matrices with elements over $\ftwo$ such that $\mathcal Mx$ suffices for computing $f(x)$ with high probability. Such sketches for $d \ll n$ can be used to design small-space distributed and streaming algorithms. Motivated by these applications we study a connection between $\ftwo$-sketching and a two-player one-way communication game for the corresponding XOR-function. We conjecture that $\ftwo$-sketching is optimal for this communication game. Our results confirm this conjecture for multiple important classes of functions: 1) low-degree $\ftwo$-polynomials, 2) functions with sparse Fourier spectrum, 3) most symmetric functions, 4) recursive majority function. These results rely on a new structural theorem that shows that $\ftwo$-sketching is optimal (up to constant factors) for uniformly distributed inputs. Furthermore, we show that (non-uniform) streaming algorithms that have to process random updates over $\ftwo$ can be constructed as $\ftwo$-sketches for the uniform distribution. In contrast with the previous work of Li, Nguyen and Woodruff (STOC'14) who show an analogous result for linear sketches over integers in the adversarial setting our result does not require the stream length to be triply exponential in $n$ and holds for streams of length $\tilde O(n)$ constructed through uniformly random updates.more » « less
-
We initiate a systematic study of linear sketching over F_2. For a given Boolean function treated as f : F_2^n -> F_2 a randomized F_2-sketch is a distribution M over d x n matrices with elements over F_2 such that Mx suffices for computing f(x) with high probability. Such sketches for d << n can be used to design small-space distributed and streaming algorithms. Motivated by these applications we study a connection between F_2-sketching and a two-player one-way communication game for the corresponding XOR-function. We conjecture that F_2-sketching is optimal for this communication game. Our results confirm this conjecture for multiple important classes of functions: 1) low-degree F_2-polynomials, 2) functions with sparse Fourier spectrum, 3) most symmetric functions, 4) recursive majority function. These results rely on a new structural theorem that shows that F_2-sketching is optimal (up to constant factors) for uniformly distributed inputs. Furthermore, we show that (non-uniform) streaming algorithms that have to process random updates over F_2 can be constructed as F_2-sketches for the uniform distribution. In contrast with the previous work of Li, Nguyen and Woodruff (STOC'14) who show an analogous result for linear sketches over integers in the adversarial setting our result does not require the stream length to be triply exponential in n and holds for streams of length O(n) constructed through uniformly random updates.more » « less