skip to main content


Title: The More You Ask, the Less You Get: When Additional Questions Hurt External Validity

Researchers and practitioners in marketing, economics, and public policy often use preference elicitation tasks to forecast real-world behaviors. These tasks typically ask a series of similarly structured questions. The authors posit that every time a respondent answers an additional elicitation question, two things happen: (1) they provide information about some parameter(s) of interest, such as their time preference or the partworth for a product attribute, and (2) the respondent increasingly “adapts” to the task—that is, using task-specific decision processes specialized for this task that may or may not apply to other tasks. Importantly, adaptation comes at the cost of potential mismatch between the task-specific decision process and real-world processes that generate the target behaviors, such that asking more questions can reduce external validity. The authors used mouse and eye tracking to trace decision processes in time preference measurement and conjoint choice tasks. Respondents increasingly relied on task-specific decision processes as more questions were asked, leading to reduced external validity for both related tasks and real-world behaviors. Importantly, the external validity of measured preferences peaked after as few as seven questions in both types of tasks. When measuring preferences, less can be more.

 
more » « less
NSF-PAR ID:
10370416
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
SAGE Publications
Date Published:
Journal Name:
Journal of Marketing Research
Volume:
59
Issue:
5
ISSN:
0022-2437
Page Range / eLocation ID:
p. 963-982
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Response time (RT) – the time elapsing from the beginning of question reading for a given question until the start of the next question – is a potentially important indicator of data quality that can be reliably measured for all questions in a computer-administered survey using a latent timer (i.e., triggered automatically by moving on to the next question). In interviewer-administered surveys, RTs index data quality by capturing the entire length of time spent on a question–answer sequence, including interviewer question-asking behaviors and respondent question-answering behaviors. Consequently, longer RTs may indicate longer processing or interaction on the part of the interviewer, respondent, or both. RTs are an indirect measure of data quality; they do not directly measure reliability or validity, and we do not directly observe what factors lengthen the administration time. In addition, either too long or too short RTs could signal a problem (Ehlen, Schober, and Conrad 2007). However, studies that link components of RTs (interviewers’ question reading and response latencies) to interviewer and respondent behaviors that index data quality strengthen the claim that RTs indicate data quality (Bergmann and Bristle 2019; Draisma and Dijkstra 2004; Olson, Smyth, and Kirchner 2019). In general, researchers tend to consider longer RTs as signaling processing problems for the interviewer, respondent, or both (Couper and Kreuter 2013; Olson and Smyth 2015; Yan and Olson 2013; Yan and Tourangeau 2008). Previous work demonstrates that RTs are associated with various characteristics of interviewers (where applicable), questions, and respondents in web, telephone, and face-to-face interviews (e.g., Couper and Kreuter 2013; Olson and Smyth 2015; Yan and Tourangeau 2008). We replicate and extend this research by examining how RTs are associated with various question characteristics and several established tools for evaluating questions. We also examine whether increased interviewer experience in the study shortens RTs for questions with characteristics that impact the complexity of the interviewer’s task (i.e., interviewer instructions and parenthetical phrases). We examine these relationships in the context of a sample of racially diverse respondents who answered questions about participation in medical research and their health. 
    more » « less
  2. Social interaction is inherently bidirectional, but research on autistic peer interactions often frames communication as unidirectional and in isolation from the peer context. This study investigated natural peer interactions among six autistic and six non-autistic adolescents in an inclusive school club over 5 months (14 45-min sessions in total) to examine the students’ peer preferences in real-world social interactions and how the preferences changed over time. We further examined whether social behavior characteristics differ between student and peer neurotype combinations. Findings showed that autistic students were more likely to interact with autistic peers then non-autistic peers. In both autistic and non-autistic students, the likelihood of interacting with a same-neurotype peer increased over time. Autistic and non-autistic students’ within-neurotype social interactions were more likely to reflect relational than functional purposes, be characterized as sharing thoughts and experiences rather than requesting help or objects, and be highly reciprocal, as compared with cross-neurotype interactions. These peer preferences and patterns of social interactions were not found among student-peer dyads with the same genders. These findings suggest that peer interaction is determined by more than just a student’s autism diagnosis, but by a combination of student and peer neurotypes. Lay abstract Autistic students often experience challenges in peer interactions, especially for young adolescents who are navigating the increased social expectations in secondary education. Previous research on the peer interactions of autistic adolescents mainly compared the social behaviors of autistic and non-autistic students and overlooked the peers in the social context. However, recent research has shown that the social challenges faced by autistic may not be solely contributed by their social differences, but a mismatch in the social communication styles between autistic and non-autistic people. As such, this study aimed to investigate the student-and-peer match in real-world peer interactions between six autistic and six non-autistic adolescents in an inclusive school club. We examined the odds of autistic and non-autistic students interacting with either an autistic peer, a non-autistic peer, or multiple peers, and the results showed that autistic students were more likely to interact with autistic peers then non-autistic peers. This preference for same-group peer interactions strengthened over the 5-month school club in both autistic and non-autistic students. We further found that same-group peer interactions, in both autistic and non-autistic students, were more likely to convey a social interest rather than a functional purpose or need, be sharing thoughts, experiences, or items rather than requesting help or objects, and be highly reciprocal than cross-group social behaviors. Collectively, our findings support that peer interaction outcomes may be determined by the match between the group memberships of the student and their peers, either autistic or non-autistic, rather than the student’s autism diagnosis. 
    more » « less
  3. Abstract

    Science education frameworks in the United States have moved strongly in recent years to incorporate more dimensions of learning, including measuring student use of scientific practices employed during scientific inquiry. For instance, the Next Generation Science Standards and related multidimensional frameworks adopted or adapted recently by more than 30 United States include numerous complex science performance skills required of students. This article considers whether valid and reliable evidence can be obtained in online performance tasks to yield an estimate of both student inquiry practices and of the ability of students to explain their understanding of scientific concepts. A data set from a Virtual Performance Assessment (VPA) task,There's a New Frog in Town, is examined. Delivered through an online system, the VPA task engages students in guided inquiry through problem solving, modeling, and exploration. The VPAs are designed to produce evidence on more than one latent trait in the respondent performance. Results of the case study reported here indicated that maps of student proficiency in scientific inquiry were possible to generate from the VPA data set, using measurement models. Addition of process data through a new hybrid measurement model, mIRT‐Bayes, improved reliability of results. Results indicated overall that virtual performance tasks may be helpful for science assessment, especially if assessment time is short and a goal is to increase the validity and quality of performance measures with authentic and engaging virtual activities.

     
    more » « less
  4. This study was performed to investigate the validity of a real world version of the Trail Making Test (TMT) across age strata, compared to the current standard TMT which is delivered using a pen-paper protocol. We developed a real world version of the TMT, the Can-TMT, that involves the retrieval of food cans, with numeric or alphanumerical labels, from a shelf in ascending order. Eye tracking data was acquired during the Can-TMT to calculate task completion time and compared to that of the Paper-TMT. Results indicated a strong significant correlation between the real world and paper tasks for both TMTA and TMTB versions of the tasks, indicative of the validity of the real world task. Moreover, the two age groups exhibited significant differences on the TMTA and TMTB versions of both task modalities (paper and can), further supporting the validity of the real world task. This work will have a significant impact on our ability to infer skill or impairment with visual search, spatial reasoning, working memory, and motor proficiency during complex real-world tasks. Thus, we hope to fill a critical need for an exam with the resolution capable of determining deficits which subjective or reductionist assessments may otherwise miss. 
    more » « less
  5. Inferring emotions from others’ non-verbal behavior is a pervasive and fundamental task in social interactions. Typically, real-life encounters imply the co-location of interactants, i.e., their embodiment within a shared spatial-temporal continuum in which the trajectories of the interaction partner’s Expressive Body Movement (EBM) create mutual social affordances. Shared Virtual Environments (SVEs) and Virtual Characters (VCs) are increasingly used to study social perception, allowing to reconcile experimental stimulus control with ecological validity. However, it remains unclear whether display modalities that enable co-presence have an impact on observers responses to VCs’ expressive behaviors. Drawing upon ecological approaches to social perception, we reasoned that sharing the space with a VC should amplify affordances as compared to a screen display, and consequently alter observers’ perceptions of EBM in terms of judgment certainty, hit rates, perceived expressive qualities (arousal and valence), and resulting approach and avoidance tendencies. In a between-subject design, we compared the perception of 54 10-s animations of VCs performing three daily activities (painting, mopping, sanding) in three emotional states (angry, happy, sad)—either displayed in 3D as a co-located VC moving in shared space, or as a 2D replay on a screen that was also placed in the SVEs. Results confirm the effective experimental control of the variable of interest, showing that perceived co-presence was significantly affected by the display modality, while perceived realism and immersion showed no difference. Spatial presence and social presence showed marginal effects. Results suggest that the display modality had a minimal effect on emotion perception. A weak effect was found for the expression “happy,” for which unbiased hit rates were higher in the 3D condition. Importantly, low hit rates were observed for all three emotion categories. However, observers judgments significantly correlated for category assignment and across all rating dimensions, indicating universal decoding principles. While category assignment was erroneous, though, ratings of valence and arousal were consistent with expectations derived from emotion theory. The study demonstrates the value of animated VCs in emotion perception studies and raises new questions regarding the validity of category-based emotion recognition measures.

     
    more » « less