There is substantial evidence from behavioral economics and decision sciences demonstrating that in the context of decision-making under uncertainty, the carriers of value behind actions are gains and losses defined relative to a reference point (e.g. pre-action expectations), rather than the absolute final outcomes. Also, the capability of early predicting session-level search decisions and user experience is essential for developing reactive and proactive search recommendations. To address these research gaps, our study aims to 1) develop reference dependence features based on a series of simulated user expectations or reference points in first query segments of sessions, and 2) examine the extent to which we can enhance the performance of early predicting session behavior and user satisfaction by constructing and employing reference dependence features. Based on the experimental results on three datasets of varying types, we found that incorporating reference dependent features developed in first query segments into prediction models achieves better performance than using baseline cost-benefit features only in early predicting three key session metrics (user satisfaction score, session clicks, and session dwell time). Also, when running simulations by varying the search time expectation and rate of user satisfaction decay, the results demonstrate that users tended to expect to complete their search within a minute and showed a rapid rate of satisfaction decay in a logarithmic fashion once surpassing the estimated expectation points. By factoring in a user's search time expectation and measuring their behavioral response once the expectation is not met, we can further improve the performance of early prediction models and enhance our understanding of users' behavioral patterns. 
                        more » 
                        « less   
                    
                            
                            Characterizing and Early Predicting User Performance for Adaptive Search Path Recommendation
                        
                    
    
            ABSTRACT User search performance is multidimensional in nature and may be better characterized by metrics that depict users' interactions with both relevant and irrelevant results. Despite previous research on one‐dimensional measures, it is still unclear how to characterize different dimensions of user performance and leverage the knowledge in developing proactive recommendations. To address this gap, we propose and empirically test a framework of search performance evaluation and build early performance prediction models to simulate proactive search path recommendations. Experimental results from four datasets of diverse types (1,482 sessions and 5,140 query segments from both controlled lab and natural settings) demonstrate that: 1) Cluster patterns characterized by cost‐gain‐based multifaceted metrics can effectively differentiate high‐performing users from other searchers, which form the empirical basis for proactive recommendations; 2) whole‐session performance can be reliably predicted at early stages of sessions (e.g., first and second queries); 3) recommendations built upon the search paths of system‐identified high‐performing searchers can significantly improve the search performance of struggling users. Experimental results demonstrate the potential of our approach for leveraging collective wisdom from automatically identified high‐performance user groups in developing and evaluating proactive in‐situ search recommendations. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 2106152
- PAR ID:
- 10543487
- Publisher / Repository:
- Wiley Library
- Date Published:
- Journal Name:
- Proceedings of the Association for Information Science and Technology
- Volume:
- 60
- Issue:
- 1
- ISSN:
- 2373-9231
- Page Range / eLocation ID:
- 408 to 420
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Abstract Evaluation metrics such as precision, recall and normalized discounted cumulative gain have been widely applied inad hocretrieval experiments. They have facilitated the assessment of system performance in various topics over the past decade. However, the effectiveness of such metrics in capturing users’ in-situ search experience, especially in complex search tasks that trigger interactive search sessions, is limited. To address this challenge, it is necessary to adaptively adjust the evaluation strategies of search systems to better respond to users’ changing information needs and evaluation criteria. In this work, we adopt a taxonomy of search task states that a user goes through in different scenarios and moments of search sessions, and perform a meta-evaluation of existing metrics to better understand their effectiveness in measuring user satisfaction. We then built models for predicting task states behind queries based on in-session signals. Furthermore, we constructed and meta-evaluated new state-aware evaluation metrics. Our analysis and experimental evaluation are performed on two datasets collected from a field study and a laboratory study, respectively. Results demonstrate that the effectiveness of individual evaluation metrics varies across task states. Meanwhile, task states can be detected from in-session signals. Our new state-aware evaluation metrics could better reflect in-situ user satisfaction than an extensive list of the widely used measures we analyzed in this work in certain states. Findings of our research can inspire the design and meta-evaluation of user-centered adaptive evaluation metrics, and also shed light on the development of state-aware interactive search systems.more » « less
- 
            In interactive IR (IIR), users often seek to achieve different goals (e.g. exploring a new topic, finding a specific known item) at different search iterations and thus may evaluate system performances differently. Without state-aware approach, it would be extremely difficult to simulate and achieve real-time adaptive search evaluation and recommendation. To address this gap, our work identifies users' task states from interactive search sessions and meta-evaluates a series of online and offline evaluation metrics under varying states based on a user study dataset consisting of 1548 unique query segments from 450 search sessions. Our results indicate that: 1) users' individual task states can be identified and predicted from search behaviors and implicit feedback; 2) the effectiveness of mainstream evaluation measures (measured based upon their respective correlations with user satisfaction) vary significantly across task states. This study demonstrates the implicit heterogeneity in user-oriented IR evaluation and connects studies on complex search tasks with evaluation techniques. It also informs future research on the design of state-specific, adaptive user models and evaluation metrics.more » « less
- 
            Previous researches demonstrate that users’ actions in search interaction are associated with relative gains and losses to reference points, known as the reference dependence effect. However, this widely confirmed effect is not represented in most user models underpinning existing search evaluation metrics. In this study, we propose a new evaluation metric framework, namely Reference Dependent Metric (ReDeM), for assessing query-level search by incorporating the effect of reference dependence into the modelling of user search behavior. To test the overall effectiveness of the proposed framework, (1) we evaluate the performance, in terms of correlation with user satisfaction, of ReDeMs built upon different reference points against that of the widely-used metrics on three search datasets; (2) we examine the performance of ReDeMs under different task states, like task difficulty and task urgency; and (3) we analyze the statistical reliability of ReDeMs in terms of discriminative power. Experimental results indicate that: (1) ReDeMs integrated with a proper reference point achieve better correlations with user satisfaction than most of the existing metrics, like Discounted Cumulative Gain (DCG) and Rank-Biased Precision (RBP), even though their parameters have already been well-tuned; (2) ReDeMs reach relatively better performance compared to existing metrics when the task triggers a high-level cognitive load; (3) the discriminative power of ReDeMs is far stronger than Expected Reciprocal Rank (ERR), slightly stronger than Precision and similar to DCG, RBP and INST. To our knowledge, this study is the first to explicitly incorporate the reference dependence effect into the user browsing model and offline evaluation metrics. Our work illustrates a promising approach to leveraging the insights about user biases from cognitive psychology in better evaluating user search experience and enhancing user models.more » « less
- 
            Advances in sign-language recognition technology have enabled researchers to investigate various methods that can assist users in searching for an unfamiliar sign in ASL using sign-recognition technology. Users can generate a query by submitting a video of themselves performing the sign they believe they encountered somewhere and obtain a list of possible matches. However, there is disagreement among developers of such technology on how to report the performance of their systems, and prior research has not examined the relationship between the performance of search technology and users’ subjective judgements for this task. We conducted three studies using a Wizard-of-Oz prototype of a webcam-based ASL dictionary search system to investigate the relationship between the performance of such a system and user judgements. We found that, in addition to the position of the desired word in a list of results, the placement of the desired word above or below the fold and the similarity of the other words in the results list affected users’ judgements of the system. We also found that metrics that incorporate the precision of the overall list correlated better with users’ judgements than did metrics currently reported in prior ASL dictionary research.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    