skip to main content


Title: Answer Interaction in Non-factoid Question Answering Systems
Information retrieval systems are evolving from document retrieval to answer retrieval. Web search logs provide large amounts of data about how people interact with ranked lists of documents, but very little is known about interaction with answer texts. In this paper, we use Amazon Mechanical Turk to investigate three answer presentation and interaction approaches in a non-factoid question answering setting. We find that people perceive and react to good and bad answers very differently, and can identify good answers relatively quickly. Our results provide the basis for further investigation of effective answer interaction and feedback methods.  more » « less
Award ID(s):
1715095
NSF-PAR ID:
10092532
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Proceedings of the ACM SIGIR Conference on Human Interaction and Retrieval (CHIIR 19)
Page Range / eLocation ID:
249 to 253
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Relevance feedback techniques assume that users provide relevance judgments for the top k (usually 10) documents and then re-rank using a new query model based on those judgments. Even though this is effective, there has been little research recently on this topic because requiring users to provide substantial feedback on a result list is impractical in a typical web search scenario. In new environments such as voice-based search with smart home devices, however, feedback about result quality can potentially be obtained during users' interactions with the system. Since there are severe limitations on the length and number of results that can be presented in a single interaction in this environment, the focus should move from browsing result lists to iterative retrieval and from retrieving documents to retrieving answers. In this paper, we study iterative relevance feedback techniques with a focus on retrieving answer passages. We first show that iterative feedback can be at least as effective as the top-k approach on standard TREC collections, and more effective on answer passage collections. We then propose an iterative feedback model for answer passages based on semantic similarity at passage level and show that it can produce significant improvements compared to both word-based iterative feedback models and those based on term-level semantic similarity. 
    more » « less
  2. The growth of the Web in recent years has resulted in the development of various online platforms that provide healthcare information services. These platforms contain an enormous amount of information, which could be beneficial for a large number of people. However, navigating through such knowledgebases to answer specific queries of healthcare consumers is a challenging task. A majority of such queries might be non-factoid in nature, and hence, traditional keyword-based retrieval models do not work well for such cases. Furthermore, in many scenarios, it might be desirable to get a short answer that sufficiently answers the query, instead of a long document with only a small amount of useful information. In this paper, we propose a neural network model for ranking documents for question answering in the healthcare domain. The proposed model uses a deep attention mechanism at word, sentence, and document levels, for efficient retrieval for both factoid and non-factoid queries, on documents of varied lengths. Specifically, the word-level cross-attention allows the model to identify words that might be most relevant for a query, and the hierarchical attention at sentence and document levels allows it to do effective retrieval on both long and short documents. We also construct a new large-scale healthcare question-answering dataset, which we use to evaluate our model. Experimental evaluation results against several state-of-the-art baselines show that our model outperforms the existing retrieval techniques. 
    more » « less
  3. null (Ed.)
    Recent work on Question Answering (QA) and Conversational QA (ConvQA) emphasizes the role of retrieval: a system first retrieves evidence from a large collection and then extracts answers. This open-retrieval setting typically assumes that each question is answerable by a single span of text within a particular passage (a span answer). The supervision signal is thus derived from whether or not the system can recover an exact match of this ground-truth answer span from the retrieved passages. This method is referred to as span-match weak supervision. However, information-seeking conversations are challenging for this span-match method since long answers, especially freeform answers, are not necessarily strict spans of any passage. Therefore, we introduce a learned weak supervision approach that can identify a paraphrased span of the known answer in a passage. Our experiments on QuAC and CoQA datasets show that although a span-match weak supervisor can handle conversations with span answers, it is not sufficient for freeform answers generated by people. We further demonstrate that our method is more flexible since it can handle both span answers and freeform answers. In particular, our method outperforms the span-match method on conversations with freeform answers, and it can be more powerful when combined with the span-match method. We also conduct in-depth analyses to show more insights on open-retrieval ConvQA under a weak supervision setting. 
    more » « less
  4. Ethnoracial identity refers to the racial and ethnic categories that people use to classify themselves and others. How it is measured in surveys has implications for understanding inequalities. Yet how people self-identify may not conform to the categories standardized survey questions use to measure ethnicity and race, leading to potential measurement error. In interviewer-administered surveys, answers to survey questions are achieved through interviewer–respondent interaction. An analysis of interviewer–respondent interaction can illuminate whether, when, how, and why respondents experience problems with questions. In this study, we examine how indicators of interviewer–respondent interactional problems vary across ethnoracial groups when respondents answer questions about ethnicity and race. Further, we explore how interviewers respond in the presence of these interactional problems. Data are provided by the 2013–2014 Voices Heard Survey, a computer-assisted telephone survey designed to measure perceptions of participating in medical research among an ethnoracially diverse sample of respondents.

     
    more » « less
  5. As intelligent systems increasingly blend into our everyday life, artificial social intelligence becomes a prominent area of research. Intelligent systems must be socially intelligent in order to comprehend human intents and maintain a rich level of interaction with humans. Human language offers a unique unconstrained approach to probe through questions and reason through answers about social situations. This unconstrained approach extends previous attempts to model social intelligence through numeric supervision (e.g. sentiment and emotions labels). In this paper, we introduce the Social-IQ, an unconstrained benchmark specifically designed to train and evaluate socially intelligent technologies. By providing a rich source of open-ended questions and answers, Social-IQ opens the door to explainable social intelligence. The dataset contains rigorously annotated and validated videos, questions and answers, as well as annotations for the complexity level of each question and answer. Social- IQ contains 1, 250 natural in-thewild social situations, 7, 500 questions and 52, 500 correct and incorrect answers. Although humans can reason about social situations with very high accuracy (95.08%), existing state-of-the-art computational models struggle on this task. As a result, Social-IQ brings novel challenges that will spark future research in social intelligence modeling, visual reasoning, and multimodal question answering (QA). 
    more » « less