Practical Data-Dependent Metric Compression with Provable Guarantees

Indyk, Piotr; Razenshteyn, Ilya P.; Wagner, Tal

Citation Details

We introduce a new distance-preserving compact representation of multi-dimensional point-sets. Given n points in a d-dimensional space where each coordinate is represented using B bits (i.e., dB bits per point), it produces a representation of size O( d log(d B/epsilon) +log n) bits per point from which one can approximate the distances up to a factor of 1 + epsilon. Our algorithm almost matches the recent bound of Indyk et al, 2017} while being much simpler. We compare our algorithm to Product Quantization (PQ) (Jegou et al, 2011) a state of the art heuristic metric compression method. We evaluate both algorithms on several data sets: SIFT, MNIST, New York City taxi time series and a synthetic one-dimensional data set embedded in a high-dimensional space. Our algorithm produces representations that are comparable to or better than those produced by PQ, while having provable guarantees on its performance. more »

Award ID(s):: 1740751 1447476

PAR ID:: 10065217

Author(s) / Creator(s):: Indyk, Piotr; Razenshteyn, Ilya P.; Wagner, Tal

Date Published:: 2017-01-01

Journal Name:: Annual Conference on Neural Information Processing Systems

Page Range / eLocation ID:: 2614-2623

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this