NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Faster Learned Sparse Retrieval with Guided Traversal

https://doi.org/10.1145/3477495.3531774

Mallia, Antonio; Mackenzie, Joel; Suel, Torsten; Tonellotto, Nicola (July 2022, ACM)

Full Text Available
Using Conjunctions for Faster Disjunctive Top-k Queries

https://doi.org/10.1145/3488560.3498489

Siedlaczek, Michał; Mallia, Antonio; Suel, Torsten (February 2022, ACM)

Full Text Available
Learning Passage Impacts for Inverted Indexes

https://doi.org/10.1145/3404835.3463030

Mallia, Antonio; Khattab, Omar; Suel, Torsten; Tonellotto, Nicola (July 2021, ACM)

Full Text Available
Fast Disjunctive Candidate Generation Using Live Block Filtering

https://doi.org/10.1145/3437963.3441813

Mallia, Antonio; Siedlaczek, Michał; Suel, Torsten (March 2021, ACM)

Full Text Available
A Comparison of Top-k Threshold Estimation Techniques for Disjunctive Query Processing

https://doi.org/10.1145/3340531.3412080

Mallia, Antonio; Siedlaczek, Michal; Sun, Mengyang; Suel, Torsten (October 2020, ACM)

Full Text Available
Feature Extraction for Large-Scale Text Collections

https://doi.org/10.1145/3340531.3412773

Gallagher, Luke; Mallia, Antonio; Culpepper, J Shane; Suel, Torsten; Cambazoglu, B Barla (October 2020, ACM)

Full Text Available
GPU-Accelerated Decoding of Integer Lists

https://doi.org/10.1145/3357384.3358067

Mallia, Antonio; Siedlaczek, Michał; Suel, Torsten; Zahran, Mohamed (November 2019, Proceedings of the 28th ACM International Conference on Information and Knowledge Management)

An inverted index is the basic data structure used in most current large-scale information retrieval systems. It can be modeled as a collection of sorted sequences of integers. Many compression techniques for inverted indexes have been studied in the past, with some of them reaching tremendous decompression speeds through the use of SIMD instructions available on modern CPUs. While there has been some work on query processing algorithms for Graphics Processing Units (GPUs), little of it has focused on how to efficiently access compressed index structures, and we see some potential for significant improvements in decompression speed. In this paper, we describe and implement two encoding schemes for index decompression on GPU architectures. Their format and decoding algorithm is adapted from existing CPU-based compression methods to exploit the execution model and memory hierarchy offered by GPUs. We show that our solutions, GPU-BP and GPU-VByte, achieve significant speedups over their already carefully optimized CPU counterparts.
more » « less
Full Text Available
Supporting Interoperability Between Open-Source Search Engines with the Common Index File Format

https://doi.org/10.1145/3397271.3401404

Lin, Jimmy; Mackenzie, Joel; Kamphuis, Chris; Macdonald, Craig; Mallia, Antonio; Siedlaczek, Michał; Trotman, Andrew; de_Vries, Arjen (July 2020, ACM)

There exists a natural tension between encouraging a diverse ecosystem of open-source search engines and supporting fair, replicable comparisons across those systems. To balance these two goals, we examine two approaches to providing interoperability between the inverted indexes of several systems. The first takes advantage of internal abstractions around index structures and building wrappers that allow one system to directly read the indexes of another. The second involves sharing indexes across systems via a data exchange specification that we have developed, called the Common Index File Format (CIFF). We demonstrate the first approach with the Java systems Anserini and Terrier, and the second approach with Anserini, JASSv2, OldDog, PISA, and Terrier. Together, these systems provide a wide range of implementations and features, with different research goals. Overall, we recommend CIFF as a low-effort approach to support independent innovation while enabling the types of fair evaluations that are critical for driving the field forward.
more » « less
Full Text Available
Efficient top-k document retrieval

Mallia, Antonio (January 2019, Proceedings of the 9th PhD Symposium on Future Directions in Information Access)

Over the past few decades, the IR community has been making a continuous effort to improve the efficiency of search in large collections of documents. Query processing is still one of the main bottlenecks in large-scale search systems. The top-k document retrieval problem, which can be defined as reporting the k most relevant documents from a collection for a given query, can be extremely expensive, as it involves scoring large amounts of documents. In this work, we investigate the top-k document retrieval problem from several angles with the aim of improving the efficiency of this task in large-scale search systems. Finally, we briefly describe our initial findings and conclude by proposing future directions to follow.
more » « less
Full Text Available
Faster BlockMax WAND with Longer Skipping

https://doi.org/10.1007/978-3-030-15712-8_52

Mallia, Antonio; Porciani, Elia (January 2019, European Conference on Information Retrieval)

One of the major problems for modern search engines is to keep up with the tremendous growth in the size of the web and the number of queries submitted by users. The amount of data being generated today can only be processed and managed with specialized technologies. BlockMax WAND and the more recent Variable BlockMax WAND represent the most advanced query processing algorithms that make use of dynamic pruning techniques, which allow them to retrieve the top k most relevant documents for a given query without any effectiveness degradation of its ranking. In this paper, we describe a new technique for the BlockMax WAND family of query processing algorithm, which improves block skipping in order to increase its efficiency. We show that our optimization is able to improve query processing speed on short queries by up to 37% with negligible additional space overhead.
more » « less
Full Text Available

« Prev Next »

Search for: All records