There exists a natural tension between encouraging a diverse ecosystem of open-source search engines and supporting fair, replicable comparisons across those systems. To balance these two goals, we examine two approaches to providing interoperability between the inverted indexes of several systems. The first takes advantage of internal abstractions around index structures and building wrappers that allow one system to directly read the indexes of another. The second involves sharing indexes across systems via a data exchange specification that we have developed, called the Common Index File Format (CIFF). We demonstrate the first approach with the Java systems Anserini and Terrier, and the second approach with Anserini, JASSv2, OldDog, PISA, and Terrier. Together, these systems provide a wide range of implementations and features, with different research goals. Overall, we recommend CIFF as a low-effort approach to support independent innovation while enabling the types of fair evaluations that are critical for driving the field forward.
more »
« less
PISA: Performant Indexes and Search for Academia
Performant Indexes and Search for Academia (PISA) is an experimental search engine that focuses on efficient implementations of state- of-the-art representations and algorithms for text retrieval. In this work, we outline our effort in creating a replicable search run from PISA for the 2019 Open Source Information Retrieval Replicability Challenge, which encourages the information retrieval community to produce replicable systems through the use of a containerized, Docker-based infrastructure. We also discuss the origins, current functionality, and future direction and challenges for the PISA system.
more »
« less
- Award ID(s):
- 1718680
- PAR ID:
- 10171641
- Date Published:
- Journal Name:
- Proceedings of the Open-Source IR Replicability Challenge
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Developing a universal model that can efficiently and effectively respond to a wide range of information access requests-from retrieval to recommendation to question answering--has been a long-lasting goal in the information retrieval community. This paper argues that the flexibility, efficiency, and effectiveness brought by the recent development in dense retrieval and approximate nearest neighbor search have smoothed the path towards achieving this goal. We develop a generic and extensible dense retrieval framework, called framework, that can handle a wide range of (personalized) information access requests, such as keyword search, query by example, and complementary item recommendation. Our proposed approach extends the capabilities of dense retrieval models for ad-hoc retrieval tasks by incorporating user-specific preferences through the development of a personalized attentive network. This allows for a more tailored and accurate personalized information access experience. Our experiments on real-world e-commerce data suggest the feasibility of developing universal information access models by demonstrating significant improvements even compared to competitive baselines specifically developed for each of these individual information access tasks. This work opens up a number of fundamental research directions for future exploration.more » « less
-
Existing learning to rank models for information retrieval are trained based on explicit or implicit query-document relevance information. In this paper, we study the task of learning a retrieval model based on user-item interactions. Our model has potential applications to the systems with rich user-item interaction data, such as browsing and recommendation, in which having an accurate search engine is desired. This includes media streaming services and e-commerce websites among others. Inspired by the neural approaches to collaborative filtering and the language modeling approaches to information retrieval, our model is jointly optimized to predict user-item interactions and reconstruct the item textual descriptions. In more details, our model learns user and item representations such that they can accurately predict future user-item interactions, while generating an effective unigram language model for each item. Our experiments on four diverse datasets in the context of movie and product search and recommendation demonstrate that our model substantially outperforms competitive retrieval baselines, in addition to providing comparable performance to state-of-the-art hybrid recommendation models.more » « less
-
As children search the internet for materials, they o en turn to search engines that, unfortunately, o er children li le support as they formulate queries to initiate the search process or examine resources for relevance. While some solutions have been proposed to address this, inherent to this issue is the need to evaluate the e ectiveness of these solutions. We posit that the evaluation of the diverse aspects involved in the search process – from query suggestion generation to resource retrieval – requires a complex, multi-faceted approach that draws on evaluation methods utilized in human-computer interaction, information retrieval, natural language processing, education, and psychology.more » « less
-
Researchers in interactive information retrieval (IIR) have studied and refined 2D presentations of search results for years. Recent advances are bringing augmented reality (AR) and virtual reality (VR) to real-world systems, though the IIR community has done relatively little work to explore and understand aspects of 3D presentations of search results, effects of immersive environments, and the impacts of spatial cognition and different spatial arrangements of results displays in 3D. In the research proposed here, I outline my plan to use immerse environments to investigate how users’ spatial cognition may influence the information retrieval process. Specifically, this work will observe how spatial arrangements of search results affect users’ ability to find information in the postquery, visual search phase of the IIR process across quantitative and qualitative measures.more » « less
An official website of the United States government

