NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Faster Learned Sparse Retrieval with Guided Traversal

https://doi.org/10.1145/3477495.3531774

Mallia, Antonio; Mackenzie, Joel; Suel, Torsten; Tonellotto, Nicola (July 2022, ACM)

Full Text Available
Using Conjunctions for Faster Disjunctive Top-k Queries

https://doi.org/10.1145/3488560.3498489

Siedlaczek, Michał; Mallia, Antonio; Suel, Torsten (February 2022, ACM)

Full Text Available
Optimizing Iterative Algorithms for Social Network Sharding

https://doi.org/10.1109/BigData52589.2021.9671621

Deng, Zishi; Suel, Torsten (December 2021, IEEE)

Full Text Available
Learning Passage Impacts for Inverted Indexes

https://doi.org/10.1145/3404835.3463030

Mallia, Antonio; Khattab, Omar; Suel, Torsten; Tonellotto, Nicola (July 2021, ACM)

Full Text Available
Fast Disjunctive Candidate Generation Using Live Block Filtering

https://doi.org/10.1145/3437963.3441813

Mallia, Antonio; Siedlaczek, Michał; Suel, Torsten (March 2021, ACM)

Full Text Available
Feature Extraction for Large-Scale Text Collections

https://doi.org/10.1145/3340531.3412773

Gallagher, Luke; Mallia, Antonio; Culpepper, J Shane; Suel, Torsten; Cambazoglu, B Barla (October 2020, ACM)

Full Text Available
A Comparison of Top-k Threshold Estimation Techniques for Disjunctive Query Processing

https://doi.org/10.1145/3340531.3412080

Mallia, Antonio; Siedlaczek, Michal; Sun, Mengyang; Suel, Torsten (October 2020, ACM)

Full Text Available
Supporting Interoperability Between Open-Source Search Engines with the Common Index File Format

https://doi.org/10.1145/3397271.3401404

Lin, Jimmy; Mackenzie, Joel; Kamphuis, Chris; Macdonald, Craig; Mallia, Antonio; Siedlaczek, Michał; Trotman, Andrew; de_Vries, Arjen (July 2020, ACM)

There exists a natural tension between encouraging a diverse ecosystem of open-source search engines and supporting fair, replicable comparisons across those systems. To balance these two goals, we examine two approaches to providing interoperability between the inverted indexes of several systems. The first takes advantage of internal abstractions around index structures and building wrappers that allow one system to directly read the indexes of another. The second involves sharing indexes across systems via a data exchange specification that we have developed, called the Common Index File Format (CIFF). We demonstrate the first approach with the Java systems Anserini and Terrier, and the second approach with Anserini, JASSv2, OldDog, PISA, and Terrier. Together, these systems provide a wide range of implementations and features, with different research goals. Overall, we recommend CIFF as a low-effort approach to support independent innovation while enabling the types of fair evaluations that are critical for driving the field forward.
more » « less
Full Text Available
Forward Index Compression for Instance Retrieval in an Augmented Reality Application

https://doi.org/10.1109/BigData47090.2019.9006023

Wang, Qi; Siedlaczek, Michal; Chen, Yen-Yu; Gormish, Michael; Suel, Torsten (December 2019, 2019 IEEE International Conference on Big Data)

Instance retrieval systems are widely used in applications such as robot navigation, medical diagnosis, and augmented reality. Blippar is a company that creates compelling augmented reality experiences or provides you with the tools to build your own. In this paper we focus on one of the company's augmented-reality applications, with which users are able to point their phone cameras at different objects in order to receive information about the objects in real time. In this paper, we provide what we believe to be the first study of forward index compression techniques for such instance retrieval systems. First, we perform an analysis of real-world data from a large-scale commercial instance retrieval system, run by Blippar focusing on augmented reality. Then we propose an entropy-based lossless compression strategy. Experiments show that our proposed Huffman-based approach outperforms a variety of other compression techniques, while also increasing overall system efficiency slightly.
more » « less
Full Text Available
GPU-Accelerated Decoding of Integer Lists

https://doi.org/10.1145/3357384.3358067

Mallia, Antonio; Siedlaczek, Michał; Suel, Torsten; Zahran, Mohamed (November 2019, Proceedings of the 28th ACM International Conference on Information and Knowledge Management)

An inverted index is the basic data structure used in most current large-scale information retrieval systems. It can be modeled as a collection of sorted sequences of integers. Many compression techniques for inverted indexes have been studied in the past, with some of them reaching tremendous decompression speeds through the use of SIMD instructions available on modern CPUs. While there has been some work on query processing algorithms for Graphics Processing Units (GPUs), little of it has focused on how to efficiently access compressed index structures, and we see some potential for significant improvements in decompression speed. In this paper, we describe and implement two encoding schemes for index decompression on GPU architectures. Their format and decoding algorithm is adapted from existing CPU-based compression methods to exploit the execution model and memory hierarchy offered by GPUs. We show that our solutions, GPU-BP and GPU-VByte, achieve significant speedups over their already carefully optimized CPU counterparts.
more » « less
Full Text Available

« Prev Next »

Search for: All records