Exploring Size-Speed Trade-Offs in Static Index Pruning

Rodriguez, Juan; Suel, Torsten

doi:10.1109/BigData.2018.8622177

Citation Details

Exploring Size-Speed Trade-Offs in Static Index Pruning

Static index pruning techniques remove postings from inverted index structures in order to decrease index size and query processing cost, while minimizing the resulting loss in result quality. A number of authors have proposed pruning techniques that use basic properties of postings as well as results of past queries to decide what postings should be kept. However, many open questions remain, and our goal is to address some of them using a machine learning based approach that tries to predict the usefulness of a posting. In this paper, we explore the following questions: (1) How much does an approach that learns from a rich set of features outperform previous work that uses heuristic approaches or just a few features? (2) What is the relationship between index size and query processing speed in static index pruning? We show that an approach that prunes postings using a rich set of features including post-hits and doc-hits can significantly outperform previous approaches, and that there is a very pronounced trade-off between index size and query processing speed for static index pruning that has not been previously explored. more »

Award ID(s):: 1718680

PAR ID:: 10171551

Author(s) / Creator(s):: Rodriguez, Juan; Suel, Torsten

Date Published:: 2018-12-01

Journal Name:: 2018 IEEE International Conference on Big Data

Page Range / eLocation ID:: 1093 to 1100

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1109/BigData.2018.8622177

More Like this