NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A Comparison of Top-k Threshold Estimation Techniques for Disjunctive Query Processing

https://doi.org/10.1145/3340531.3412080

Mallia, Antonio; Siedlaczek, Michal; Sun, Mengyang; Suel, Torsten (October 2020, ACM)

Full Text Available
Forward Index Compression for Instance Retrieval in an Augmented Reality Application

https://doi.org/10.1109/BigData47090.2019.9006023

Wang, Qi; Siedlaczek, Michal; Chen, Yen-Yu; Gormish, Michael; Suel, Torsten (December 2019, 2019 IEEE International Conference on Big Data)

Instance retrieval systems are widely used in applications such as robot navigation, medical diagnosis, and augmented reality. Blippar is a company that creates compelling augmented reality experiences or provides you with the tools to build your own. In this paper we focus on one of the company's augmented-reality applications, with which users are able to point their phone cameras at different objects in order to receive information about the objects in real time. In this paper, we provide what we believe to be the first study of forward index compression techniques for such instance retrieval systems. First, we perform an analysis of real-world data from a large-scale commercial instance retrieval system, run by Blippar focusing on augmented reality. Then we propose an entropy-based lossless compression strategy. Experiments show that our proposed Huffman-based approach outperforms a variety of other compression techniques, while also increasing overall system efficiency slightly.
more » « less
Full Text Available
An Experimental Study of Index Compression and DAAT Query Processing Methods

https://doi.org/10.1007/978-3-030-15712-8_23

Mallia, Antonio; Siedlaczek, Michal; Suel, Torsten (January 2019, European Conference on Information Retrieval)

In the last two decades, the IR community has seen numerous advances in top-k query processing and inverted index compression techniques. While newly proposed methods are typically compared against several baselines, these evaluations are often very limited, and we feel that there is no clear overall picture on the best choices of algorithms and compression methods. In this paper, we attempt to address this issue by evaluating a number of state-of-the-art index compression methods and safe disjunctive DAAT query processing algorithms. Our goal is to understand how much index compression performance impacts overall query processing speed, how the choice of query processing algorithm depends on the compression method used, and how performance is impacted by document reordering techniques and the number of results returned, keeping in mind that current search engines typically use sets of hundreds or thousands of candidates for further reranking.
more » « less
Full Text Available
Exploiting Global Impact Ordering for Higher Throughput in Selective Search

https://doi.org/10.1007/978-3-030-15719-7_2

Siedlaczek, Michal; Rodriguez, Juan; Suel, Torsten (January 2019, European Conference on Information Retrieval)

We investigate potential benefits of exploiting a global impact ordering in a selective search architecture. We propose a generalized, ordering-aware version of the learning-to-rank-resources framework along with a modified selection strategy. By allowing partial shard processing we are able to achieve a better initial trade-off between query cost and precision than the current state of the art. Thus, our solution is suitable for increasing query throughput during periods of peak load or in low-resource systems.
more » « less
Full Text Available
PISA: Performant Indexes and Search for Academia

Mallia, Antonio; Siedlaczek, Michal; Mackenzie, Joel; Suel, Torsten (January 2019, Proceedings of the Open-Source IR Replicability Challenge)

Performant Indexes and Search for Academia (PISA) is an experimental search engine that focuses on efficient implementations of state- of-the-art representations and algorithms for text retrieval. In this work, we outline our effort in creating a replicable search run from PISA for the 2019 Open Source Information Retrieval Replicability Challenge, which encourages the information retrieval community to produce replicable systems through the use of a containerized, Docker-based infrastructure. We also discuss the origins, current functionality, and future direction and challenges for the PISA system.
more » « less
Full Text Available
Fast Bag-Of-Words Candidate Selection in Content-Based Instance Retrieval Systems

https://doi.org/10.1109/BigData.2018.8621935

Siedlaczek, Michal; Wang, Qi; Chen, Yen-Yu; Suel, Torsten (December 2018, 2018 IEEE International Conference on Big Data)

Many content-based image search and instance retrieval systems implement bag-of-visual-words strategies for candidate selection. Visual processing of an image results in hundreds of visual words that make up a document, and these words are used to build an inverted index. Query processing then consists of an initial candidate selection phase that queries the inverted index, followed by more complex reranking of the candidates using various image features. The initial phase typically uses disjunctive top-k query processing algorithms originally proposed for searching text collections. Our objective in this paper is to optimize the performance of disjunctive top-k computation for candidate selection in content-based instance retrieval systems. While there has been extensive previous work on optimizing this phase for textual search engines, we are unaware of any published work that studies this problem for instance retrieval, where both index and query data are quite different from the distributions commonly found and exploited in the textual case. Using data from a commercial large-scale instance retrieval system, we address this challenge in three steps. First, we analyze the quantitative properties of index structures and queries in the system, and discuss how they differ from the case of text retrieval. Second, we describe an optimized term-at-a-time retrieval strategy that significantly outperforms baseline term-at-a-time and document-at-a-time strategies, achieving up to 66% speed-up over the most efficient baseline. Finally, we show that due to the different properties of the data, several common safe and unsafe early termination techniques from the literature fail to provide any significant performance benefits.
more » « less
Full Text Available

Search for: All records