<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Eliciting risk preferences: is a single item enough?</title></titleStmt>
			<publicationStmt>
				<publisher>Taylor &amp; Francis</publisher>
				<date>12/02/2023</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10608738</idno>
					<idno type="doi">10.1080/13669877.2023.2288016</idno>
					<title level='j'>Journal of Risk Research</title>
<idno>1366-9877</idno>
<biblScope unit="volume">26</biblScope>
<biblScope unit="issue">12</biblScope>					

					<author>Don C Zhang</author><author>Gino Howard</author><author>Russell A Matthews</author><author>Tyler Cowley</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Economists and psychologists frequently use single-item measures of risk preferences despite potential limitations in reliability and criterion validity compared to their multi-item counterparts. This can be particularly problematic when individual differences in risk preferences are used to predict real-world economic, health, and financial outcomes. In this paper, we compare a popular single-item measure of risk preference, the General Risk Question (GRQ), to multi-item measures of domain-general and -specific risk preference measures. In a two-wave survey study of 434 adults, we found that the GRQ had good psychometric reliability and converged with other multi-item measures of risk preferences. The GRQ also exhibited a similar pattern of associations with other personality and demographic variables as compared to multiitem measures. However, the predictive validity of the GRQ was lower than multi-item measures for most of the outcomes examined. The GRQ also explained less incremental variance for realworld outcomes over the Big Five personality traits than the multi-item counterparts. Although the GRQ is a construct-valid measure of risk preferences, researchers should nonetheless consider the trade-off between survey efficiency and predictive efficacy when deciding whether a single item is enough.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Eliciting risk preferences: is a single item enough? Social scientists have historically been skeptical of using self-report questionnaires <ref type="bibr">(Nisbett &amp; Wilson, 1977)</ref>. This is also true of risk researchers, where self-report measures of risk preferences (i.e., stated risk preferences) were assumed to be inferior to behavioral elicitations in the lab (i.e., revealed risk preferences) <ref type="bibr">(Harrison &amp; Rutstr&#246;m, 2008)</ref>. Recent research from psychology and economics, however, has highlighted that: self-report measures are more than 8cheap talk9 and that they reflect genuine individual differences in risk preferences <ref type="bibr">(Arslan et al., 2020;</ref><ref type="bibr">Dohmen et al., 2011;</ref><ref type="bibr">Steiner et al., 2021)</ref>; and that self-report measures are often more useful than behavioral elicitations of risk preferences for predicting real-life outcomes <ref type="bibr">(Charness et al., 2020;</ref><ref type="bibr">Frey et al., 2017;</ref><ref type="bibr">Kaiser &amp; Oswald, 2022;</ref><ref type="bibr">Tasoff &amp; Zhang, 2021)</ref>.</p><p>Considering the benefits of self-report measures, several large-scale economic panel surveys include a measure of self-report risk preference (e.g., German Socio-Economic Panel (SOEP) <ref type="bibr">(Dohmen et al., 2011)</ref>. Given practical restraints such as survey length, these panel studies typically include only a one-item measure of risk preference. A popular item known as the General Risk Question (GRQ) asks respondents to indicate the degree to which they are riskseeking (versus risk-averse). The simplicity and brevity of the GRQ also make it ideal for measuring individual differences in risk preferences in a wide range of experimental and nonexperimental settings <ref type="bibr">(Bran &amp; Vaidis, 2019;</ref><ref type="bibr">Lonnqvist et al., 2015)</ref>. It is not surprising that the GRQ has also become increasingly popular in both economic and psychological research. In fact, the GRQ is one of the few psychological traits measured annually in the SOEP.</p><p>Despite its practical advantages, a single-item measure of risk preference has several potential theoretical and methodological shortcomings <ref type="bibr">(Fisher et al., 2016;</ref><ref type="bibr">Menkhoff &amp; Sakha, 2017)</ref>. According to psychometric theory, single-item measures of psychological constructs such as risk preference suffer from low reliability and content coverage, which could undermine the construct validity and predictive efficacy of the measure <ref type="bibr">(Matthews et al., 2022)</ref>. Indeed, psychometricians have long cautioned against the use of single-item measures of individual differences <ref type="bibr">(Gardner et al., 1998;</ref><ref type="bibr">Wanous &amp; Hudy, 2001)</ref>. Nevertheless, single items may be comparable, or even superior, to their multi-item counterparts <ref type="bibr">(Allen et al., 2022;</ref><ref type="bibr">Bergkvist &amp; Rossiter, 2007)</ref> and are suitable for some situations. Indeed, single-item measures remain widely used in a variety of social science disciplines such as psychology, marketing, and organizational behavior <ref type="bibr">(Ang &amp; Eisend, 2018;</ref><ref type="bibr">Matthews et al., 2022)</ref>.</p><p>Although the GRQ is widely used in economic and psychological sciences as a concise measure of general risk preference, it is not clear if the GRQ meets the requirements for construct validity, especially compared to other multi-item measures of risk preference that have undergone more thorough construct validation efforts. Without establishing the validity of the GRQ, it may be difficult to accurately understand the impact of risk preferences on economic, health, and social outcomes <ref type="bibr">(Bagozzi et al., 1991)</ref>. Furthermore, the observed relationships between risk preference measured with the GRQ with real-world outcomes may be understated. Thus, whether the use of the GRQ in lieu of multi-item measures of risk preference is justified depends on the psychometric properties of the measure.</p><p>The purpose of this paper, therefore, is to examine the psychometric qualities of the single-item measure of risk preference (i.e., the GRQ) compared to other multi-item self-report measures of risk preferences. Specifically, we report comparisons of psychometric reliability and predictive validity of single-item and multi-item measures of risk preference. Furthermore, we examine the convergent validity of the GRQ with multi-item measures of risk preferences as well as the discriminant validity of single-vs. multi-item measures of risk preference with other individual difference constructs (e.g., Big Five personality). Together, this paper sheds light on the construct validity of the GRQ and its predictive efficacy compared to other multi-item measures of risk preferences.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Self-report measures of risk preferences</head><p>Risk researchers have historically been skeptical of self-report measures of economic preferences such as risk preferences (i.e., stated risk preferences; <ref type="bibr">Charness et al., 2013)</ref>. Instead, it is assumed that an accurate assessment of risk preferences can only be obtained with incentivecompatible tasks where people must choose between risky vs. riskless options <ref type="bibr">(Holt &amp; Laury, 2002)</ref>. Unlike stated risk preference measures, these lab-based elicitations (i.e., revealed risk preferences) reveal people9s risk preferences based on their behaviors, rather than self-reports.</p><p>Although a behavioral approach has been the zeitgeist for scholars interested in measuring risk preferences, emerging research has identified several empirical and theoretical shortcomings of this approach. First, behavior across different elicitations of risk-taking tends to diverge, as they each capture unique variance associated with the task, rather than the respondent <ref type="bibr">(Pedroni et al., 2017;</ref><ref type="bibr">Zhou et al., 2021)</ref>. For example, <ref type="bibr">Pedroni et al., (2017)</ref> found minimal convergence in individual risk preferences across six different sixteen elicitation methods. Their findings call into question the validity of behavioral elicitations as measures of stable risk preferences (c.f. <ref type="bibr">Holzmeister &amp; Stefan, 2021)</ref>.</p><p>Second, risk preferences are often confounded with task-specific characteristics that may require different risky behaviors to maximize incentives. Performance on incentive-compatible tasks may be influenced by the decision-makers9 capacity to take calculated risks to maximize their winnings within the situational constraints of the tasks. Thus, revealed risk preferences may reflect individual differences in quantitative ability (e.g., numeracy), in addition to risk preference <ref type="bibr">(Lilleholt, 2019;</ref><ref type="bibr">Millroth et al., 2020)</ref>. These limitations have led to diminishing confidence in laboratory-based behavioral measures <ref type="bibr">(Rouder &amp; Haaf, 2019)</ref> and a renewed appreciation for stated measures of risk preferences, such as the GRQ, in economic and psychological research <ref type="bibr">(Arslan et al., 2020;</ref><ref type="bibr">Dohmen et al., 2011;</ref><ref type="bibr">Zhang et al., 2023)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Single vs. multi-item measures of risk preferences</head><p>Single-item measures are often used in longitudinal panel studies because they provide an efficient measure of economic, demographic, and health characteristics. Single-item measures can also be used to measure attitudes (e.g., life satisfaction), traits (e.g., core self-efficacy), and beliefs (e.g., political ideology). In the case of risk preference, the most popular single-item measure appears in the German Socio-Economic Panel (SOEP) where respondents are asked to respond to the item: &lt;Are you generally a person who is fully prepared to take risks, or do you try to avoid taking risks?= using a 10-point Likert scale ranging from 1: 8not at all willing to take risks9 to 10: 8very willing to take risks9= <ref type="bibr">(Dohmen et al., 2011)</ref>. This item, which has been labeled the General Risk Question <ref type="bibr">(Arslan et al., 2020)</ref>, is also used in other panel surveys such as the Household, Income and Labour Dynamics in Australia (HILDA) and the British Household Panel Survey (BHPS).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Benefits of the GRQ</head><p>There are several good reasons to use a single-item measure of risk preference. First, single-item measures can be completed very quickly and impose minimal burden on survey takers. This benefit is particularly important in survey research, where participant motivation can affect their attentiveness to survey items. Relatedly, multi-item measures often contain similar items that give the appearance of redundancy, which may frustrate survey takers and negatively affect survey completion rates <ref type="bibr">(Fuchs &amp; Diamantopoulos, 2009)</ref>. Collectively then, it has been suggested that the motivational and emotional toll of long and repetitive surveys may ultimately undermine the validity of the measures <ref type="bibr">(Rogelberg et al., 2002)</ref>. Applied social scientists (e.g., economists) are also limited by the amount of time that the target population has for their research. Therefore, it is critical to maximize the efficiency by which psychological constructs are measured. Second, a well-constructed single-item measure is better suited when measuring 8doubly concrete9 constructs, where the object and attribute of the measurement are unambiguous for the respondent <ref type="bibr">(Drolet &amp; Morrison, 2001)</ref>. For example, the predictive validity of single-item measures for consumer attitudes about brands is comparable to its multi-item counterparts <ref type="bibr">(Bergkvist &amp; Rossiter, 2007)</ref>. Regardless of the possible benefits of using single-item measures, as discussed next, psychometricians have noted several potential shortcomings of single-item measures.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Psychometric reliability of the GRQ</head><p>Increasingly, risk researchers have begun to recognize the importance of psychometric properties for measures of risk preferences such as reliability and validity <ref type="bibr">(Mata et al., 2018)</ref>.</p><p>Single-item measures are often criticized on the grounds of reliability because, unlike multi-item measures, the reliability of single-item measures is harder to compute in a typical empirical investigation; that is, internal consistency estimates of reliability, the most commonly used measure of reliability, cannot be calculated for single-item measures. For this reason, the reliability of single-item measures in empirical studies is (incorrectly) presumed to be unknown.</p><p>Even if computed based on established methods <ref type="bibr">(Wanous &amp; Hudy, 2001)</ref>, single-item measures tend to have lower pseudo-internal consistency reliability than multi-item measures of the same construct <ref type="bibr">(Allen et al., 2022)</ref>. As discussed by <ref type="bibr">Matthews et al. (2022)</ref> though, this requires scholars to consider other types of reliability. That is, several methods exist for empirically examining the psychometric reliability of single-item measures, which we describe below.</p><p>As noted, traditional approaches for assessing psychometric reliability rely on internal consistency, which is obtained by computing the average inter-item correlations <ref type="bibr">(Cho &amp; Kim, 2015)</ref>. This approach, however, does not allow for the estimation of reliability for single-item measures, such as the GRQ. Two methods exist for estimating the reliability of single-item measures. The first method for estimating the psychometric reliability for single-item measures is by using factor analysis whereby single-item measures are loaded onto a factor that is composed of multi-item measures of the same construct <ref type="bibr">(Wanous &amp; Hudy, 2001)</ref>. A key part of this approach is the assumption that the communality of a measure is a conservative estimate of reliability therefore the total variance is equivalent to the sum of communality, specificity, and unreliability, and when the variance is unknown or unspecified, the communality is equal to the reliability <ref type="bibr">(Wanous &amp; Hudy, 2001)</ref>.</p><p>The communality of a measure is calculated by squaring the factor loadings and summing them based on the number of overall extracted components. In other words, when considering the total variance, communalities are the proportion of variance explained specifically by the factors. A key issue with this approach though is that if the multi-item measure demonstrates weaker psychometric characteristics, this will directly affect the interpretation of the single-item measure9s reliability <ref type="bibr">(Matthews et al. 2022)</ref>. The reliability of single-item measures can also be observed through test-retest reliability. For single items, test-retest reliability can be estimated by administering the GRQ twice and observing the Pearson9s correlation between the two administrations. Alternatively, test-retest reliability can also be ascertained using the intra-class correlation (ICC).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Criterion validity of the GRQ</head><p>Criterion validity reflects the degree to which it predicts meaningful outcomes of interest <ref type="bibr">(Cronbach &amp; Meehl, 1955)</ref>. The criterion validity of a predictor variable is limited by its reliability. Thus, conventional wisdom amongst psychometricians is that single-item measures tend to have weaker predictive validity than multi-item measures because single-item measures are assumed to have more random measurement error, which undermines its reliability <ref type="bibr">(Novick, 1966)</ref>. In theory, however, there is no reason why single-item measures should exhibit weaker predictive validity or reliability than multi-item counterparts (also see <ref type="bibr">Allen et al., 2022)</ref>. Multiitem measures are also more prone to criterion contamination, whereby some of the items may reflect characteristics that are irrelevant to the focal construct of interest <ref type="bibr">(Drolet &amp; Morrison, 2001)</ref>. Said another way, multi-item measures of risk preference can include items that reflect other constructs such as recklessness or assertiveness, which may introduce measurement error.</p><p>Existing research on the criterion validity of single measures revealed inconsistent findings. <ref type="bibr">Matthews et al. (2022)</ref>9s examination of 91 organizational constructs revealed that single-item measures -when properly developed -are equally reliable and predictive than multiitem measures of the same constructs. On average, the authors observed a degradation of only r = .02 in criterion validity between a single-item and multi-item measure of the same construct.</p><p>Similarly, a meta-analytic investigation of 189 advertising studies found that single-item measures had almost identical criterion validity as multi-item measures for predicting consumer attitudes <ref type="bibr">(Ang &amp; Eisend, 2018)</ref>. Thus, it remains an empirical question whether the GRQ -a single-item measure of risk preference -is more or less predictive than multi-item measures.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Bandwidth vs. Fidelity</head><p>One factor that may influence the criterion validity of the GRQ is the conceptual correspondence between the predictor and outcome <ref type="bibr">(Hogan &amp; Roberts, 1996)</ref>. Research has shown that domain-general measures are superior for prediction when the criterion variable is broad and covers multiple domains of risk-taking. In their examination of the general risk factor, for example, <ref type="bibr">Highhouse et al., (2017)</ref> found that the general risk factor was more predictive than risk attitude in any single domain (e.g., health) for predicting conceptually broad outcomes such as workplace deviance, which entails multiple risky domains (e.g., social, ethical, financial).</p><p>Similarly, <ref type="bibr">Zhang et al., (2019)</ref> found that general risk propensity was more predictive of broad outcomes such as entrepreneurial intentions than domain-specific measures of DOSPERT. In contrast, domain-specific measures of risk preference (e.g., health risks) are more predictive of outcomes in corresponding domains (e.g., long-term health problems) (Also see <ref type="bibr">Charness et al., 2020)</ref>. Because the GRQ is a domain-general measure of risk preference, we expect that it will be more predictive of broad outcomes (e.g., work deviance) than domain-specific risk preference measures whereas domain-specific measures will be more predictive of domain-matched outcomes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Discriminant and convergent validity of the GRQ</head><p>Convergent validity refers to the degree to which a construct converges (i.e., correlates) with measures of similar constructs. Convergent validity is often the first step in establishing the construct validity of new measures. Evidence of convergent validity of the GRQ can be acquired by examining the degree to which the GRQ correlates with other measures of general risk preference. Here, correlations greater than r = 0.70 are considered sufficient evidence of convergent validity <ref type="bibr">(Carlson &amp; Herdman, 2012)</ref>. Discriminant validity refers to the degree to which a construct differs from other theoretically distinct constructs <ref type="bibr">(R&#246;nkk&#246; &amp; Cho, 2021)</ref>.</p><p>Discriminant validity is particularly important in the social sciences where new constructs and measures are often introduced to the literature without evidence for their uniqueness from existing ones (i.e., old wine, new bottle) <ref type="bibr">(Shaffer et al., 2016)</ref>. One common criterion for establishing the uniqueness of personality constructs is by demonstrating its uniqueness from the Big Five <ref type="bibr">(Goldberg &amp; Saucier, 1998)</ref>, the most popular and comprehensive model of personality.</p><p>Considering recent meta-analytic findings showing the uniqueness of general risk propensity from the Big Five <ref type="bibr">(Highhouse et al. 2022)</ref>, we expect the GRQ to be distinct from the Big Five.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Method</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Sample and Procedure</head><p>We gathered data from two time points using Prolific.co, a crowd-sourcing platform used for social science research<ref type="foot">foot_0</ref> . Data were collected from a total of 608 participants at Time 1 and 520 participants at Time 2 four weeks later. The GRQ was administered at both time points. We also included four multi-item risk measures split between two surveys such that each survey had one domain-general and one domain-specific measure. Other individual difference measures (e.g., personality) were administered at Time 1 and all outcome variables are measured at Time 2 to reduce common method variance. We included four attention check questions (e.g., &lt;If you are paying attention, please select strongly disagree=) across the two surveys. We removed participants who missed more than one out of four attention check questions. The final sample size of attentive participants that participated in both surveys was 434. The average age was 37 years old (SD = 12.8), 50% male, 86% Caucasian, 69% were employed part-time or full-time, and 62% had at least an associate degree or higher.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Measures</head><p>General Risk Question. General Risk Question (GRQ) is a single-item measure of general risk preference. The item asks respondents: &lt;How do you see yourself: are you generally a person who is fully prepared to take risks or do you try to avoid taking risks? Please tick a box on the scale, where 0 = 8not at all willing to take risks9 and 10 = 8very willing to take risks9.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Domain-General Risk Measures</head><p>GRiPS (Time 1). General Risk Propensity Scale (GRiPS) is an eight-item, self-report scale <ref type="bibr">(Zhang et al., 2019)</ref>. The GRiPS questions are on a 5-point Likert scale (1 = strongly disagree to 5 = strongly agree). An example item from this scale is 88Taking risks makes life more fun=. The internal consistency of the GRiPS is 0.93.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>RPS (Time 2).</head><p>The Risk Propensity Scale (RPS) is a seven-item, self-report scale <ref type="bibr">(Meertens &amp; Lion, 2008)</ref>. The RPS questions are on a 9-point Likert scale (1 = totally disagree to 9 = totally agree). Responses with higher scores indicate greater risk propensity. An example item from this scale is &lt;I take risks regularly=. The internal consistency of the RPS is 0.83.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Domain-Specific Risk Measures</head><p>DOSPERT (Time 1). Domain-Specific Risk-Taking Scale (DOSPERT) is a 30-item selfreport measure of domain-specific risk-taking propensity <ref type="bibr">(Blais &amp; Weber, 2006)</ref>. It measures individual risk intentions/behavioral intentions in five different domains <ref type="bibr">(Blais and Weber, 2006)</ref>. The DOSPERT questions are on a 7-point Likert scale (1 = extremely unlikely to 7 = extremely likely). An example item from this scale is 88Having an affair with a married man/woman= (social). The internal consistency of the DOSPERT range from 0.64 to 0.87. The overall internal consistency of the summated DOSPERT score is 0.88.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>RTI (Time 2).</head><p>The Risk-Taking Index (RTI) is a 12-item measure of an individual's engagement across 6 risk-taking domains <ref type="bibr">(Nicholson et al., 2005)</ref>. The RTI questions are on a 5point Likert scale (1 = never to 5 = very often). Items are separated by risk-taking behaviors now versus in the past. An example item from this scale is 88recreational risks=. The internal consistency of the RTI ranges from 0.64 to 0.78 across the five domains. The overall internal consistency of the summated RTI score is 0.76.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Other Individual Differences</head><p>Big Five Personality. IPIP-NEO-60 is a 60-item self-report measure of the Big Five personality <ref type="bibr">(Maples-Keller et al., 2019)</ref>. The IPIP-NEO-60 questions are on a 5-point Likert scale (1 = strongly disagree to 5 = strongly agree). An example item from this scale is 88Lose my temper=. The internal consistency ranges from 0.74 to 0.86. Dark Triad. Short Dark Triad Scale (SD3) is a 27-item, self-report scale <ref type="bibr">(Jones &amp; Paulhus, 2014)</ref>. The SD3 questions are on a 5-point Likert scale (1 = strongly disagree to 5 = strongly agree). Example items from this scale are Narcissism 88I have been compared to famous people99, Machiavellianism 88It9s not wise to tell your secrets99, and Subclinical Psychopathy 88Payback needs to be quick and nasty99. The internal consistency ranges from 0.73 to 0.76. Subjective Numeracy. The Subjective Numeracy Scale (SNS) is an eight-item, selfreport scale <ref type="bibr">(Fagerlin et al., 2007)</ref>. The SNS has four questions that ask participants to assess their numerical ability across multiple contexts. Additionally, it has four questions that ask participants to state their preferences for the presentation of numerical and probabilistic information. The SNS questions are on a 6-point Likert scale (1 = not at all good to 6 = extremely good). An example item from this scale is 88How good are you at working with fractions?=. The alpha score for the overall scale demonstrates good reliability with an alpha of 0.91.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Broad Outcomes</head><p>Workplace Deviance. We measured workplace deviance using a nine-item, self-report scale developed by Robinson &amp; O9Leary-Kelly, (1998). Participants responded to each statement on a 5-point Likert scale (1 = Never to 5 = Almost Always). An example item from this scale is 88Deliberately bent or broke rules=. The internal consistency of the scale is 0.81.</p><p>Entrepreneurial Intentions. The Entrepreneurial Intention Questionnaire (EIQ) is a five-item, self-report scale <ref type="bibr">(Linan et al., 2011)</ref>. The EIQ questions are on a 7-point Likert scale</p><p>(1 = total disagreement to 5 = total agreement). An example item from this scale is 88A career as an entrepreneur is attractive for me=. The internal consistency of the scale is 0.93.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Narrow Outcomes</head><p>Narrow outcomes are characterized by specific behaviors that fall within a single domain of risk-taking, rather than outcomes that reflect risk-taking across multiple domains. We included eleven outcome variables (e.g., job change, car accidents, etc.) matched to a specific dimension of risk-taking. The full list of outcomes, items, and relevant risk dimensions is presented in Table <ref type="table">1</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Analytical Plan</head><p>We examine the validity of the GRQ in several ways. First, we computed and compared the reliability of the GRQ using three methods: 1) test-retest reliability, 2) communality, and 3) ICC <ref type="bibr">(Wanous &amp; Hudy, 2001)</ref>. Second, we examined the convergent and discriminant validity of the GRQ by examining its correlation with other multi-item domain-general measures such as the GRiPS and RPS as well as domain-specific measures such as the DOSPERT and RTI.</p><p>Finally, we examine the discriminant validity of the GRQ from the Big Five personality traits.</p><p>We used multiple regression analyses to examine the proportion of variance in the GRQ explained by the combined Big Five traits to demonstrate the uniqueness of GRQ from the Big Five. We also include subjective numeracy as an additional individual difference variable, for exploratory purposes. Finally, we examine the criterion validity of the GRQ by examining its correlation with a wide range of broad (e.g., workplace deviance) and narrow outcomes (e.g., number of broken bones).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Results</head><p>Table <ref type="table">2</ref> contains the means, standard deviations, and internal consistencies of the study9s risk-taking measures, as well as demographic characteristics. Consistent with past research, we found both sex 2 (rs = 0.14 to 0.27) and age (rs = -0.10 to -0.24) were significantly correlated with risk preference across different measures. We did not, however, find any significant correlations between risk preference and employment, education, or income level.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Psychometric reliability</head><p>Table <ref type="table">3</ref> contains the reliability of the GRQ for both survey administrations as well as the reliability of multi-item measures of risk preferences. Consistent with recent research on examining the psychometric characteristics of single-item measures <ref type="bibr">(Matthews et al., 2022)</ref>, testretest reliability was assessed using traditional Pearson9s correlation coefficient and the intraitem correlation (ICC), which has been suggested as an alternative approach to calculating testretest reliability. We found that the GRQ had acceptable test-retest reliability (r = 0.70, ICCmixed = 0.71[95% C.I. = 0.66, 0.76]) albeit slightly lower than the test-retest reliability of multi-item measures (e.g., GRiPS, r3 months = 0.80, <ref type="bibr">Zhang et al., 2019)</ref>. In addition to the test-retest reliability, reliability for the single-item GRQ was also obtained using the factor analysis methods described by <ref type="bibr">(Wanous &amp; Hudy, 2001.)</ref> To do so, we computed the communality of the GRQ with each of the two multi-item measures (GRiPS and RPS). Using this method, we found the GRQ exhibited good reliability (0.76 to 0.93).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Convergent validity</head><p>We first examined the convergent validity for single vs. multi-item of general risk preference measures. The GRQ measured at both Time 1 and Time 2 was significantly correlated with both the domain-general measures of risk preference (GRiPS; r = 0.69 to 0.79; RPS; r = 0.65 to 0.74) as well as the summated risk preference score using domain-specific measures (Summated DOSPERT: r = 0.52 to 0.57; Summated RTI: r = 0.41 to 0.49). The magnitude of correlations suggests that the GRQ had better convergent validity with domain-general measures, compared to summated scores using domain-specific scales.</p><p>We next examined the convergence between the GRQ with the domain-specific risk scores obtained in the DOSPERT (Time 1) and the RTI (Time 2). The GRQ was moderately correlated with each of the five DOSPERT dimensions: social (r = 0.33), recreation (r = 0.52), finance (r = 0.46), health (r = 0.36), and ethics (r = 0.27). Likewise, the GRQ was also moderately correlated with five RTI dimensions: recreation (r = 0.40), career (r = 0.24), financial (r = 0.43), safety (r = 0.33), and social (r = 0.32). Interestingly, the GRQ was not significantly correlated with the health dimension of the RTI (r = 0.07). These results suggest that the GRQ better captures one9s general preference for risks, rather than risk preferences in any single domain.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Discriminant validity</head><p>We first examined the degree to which risk preferences measured using single-vs. multiitem measures are empirically distinct from the Big Five, the dominant model of personality.</p><p>Table <ref type="table">4</ref> contains the results of multiple regression analysis. Overall, the Big Five explained 25% and 34% of the variance in two multi-item measures of general risk propensity <ref type="bibr">(RPS and GRiPS)</ref> respectively. The Big Five also explained 18% and 36% of the variance in the two domainspecific measures of risk preference (RTI and DOSPERT) respectively. As for the single item, we found that the Big Five explained 17% and 27% of the variance for the GRQ measured at Time 1 and Time 2 respectively. Collectively, these results suggest that the GRQ -like other measures of risk preference -is relatively distinct from the Big Five model of personality.</p><p>In addition to the discriminant validity from the Big Five, we also examined the degree to which risk preference measures are distinct from the Dark Triad model. Table <ref type="table">5</ref> contains the results. First, the dark triad traits explained 21.37% of the variance in the GRQ while explaining 17.68% and 33.18% of the variance in the RTI and DOSPERT respectively. The dark triad traits explained between 24.06% and 34.24% of the variance for the domain-general risk preference measures (GRiPS, RPS).</p><p>For exploratory purposes, we also examined the association between risk preference and subjective numeracy. We found a weak relationship between subjective numeracy and risk preferences. Observed bivariate correlations range from r = 0.08 to r = 0.17.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Criterion and incremental validity</head><p>GRQ vs. general risk preference measures. We compared the criterion validity of the GRQ with multi-item measures of general risk preference measures for both broad and specific outcomes. Table <ref type="table">5</ref> contains the bivariate correlations between each risk measure and both broad and narrow outcomes. We present the results separately for cross-sectional predictions where the predictors and criterion were both measured at the same time (concurrent validity) versus timelagged predictions where the predictors and criterion were measured at separate time points (predictive validity).</p><p>We found that compared to multi-item measures of risk preference, the GRQ was less predictive of workplace deviance. However, the GRQ had comparable and slightly stronger predictions for entrepreneurial intentions than other multi-item measures. As for narrow outcomes, we found the GRQ to be a weaker predictor across most outcome variables when compared with multi-item measures. The difference in the magnitude of prediction was most pronounced for speeding frequency, number of romantic relationships, frequency of cheating in relationships, and frequency of job change.</p><p>Although the difference was smaller for other outcomes, the GRQ did not 8out-predict9 any other multi-item measures of general risk preference for any of the narrow outcomes or vice versa. The average criterion validity for the GRQ across 13 outcomes was rmean= .14 and rmean = .16 for concurrent and predictive validities respectively whereas the criterion validity for the two multi-item general measures were rmean = .18 (GRiPS) and rmean = .20 (RPS) and the criterion validity for the two multi-item domain-specific measures were rmean = .22 (DOSPERT) and rmean = .23 (RTI).</p><p>GRQ vs. domain-specific risk preference measures. We next compared the criterion validity of the GRQ with domain-specific measures (e.g., DOSPERT). Table <ref type="table">6</ref> contains the bivariate correlations between each risk measure and both broad and narrow outcomes. For broad outcomes, we found the GRQ to better predict deviance than some of the risk domains (e.g., recreation) but not others (e.g., ethical). Interestingly, we found that the GRQ better predicted entrepreneurial intent than any of the specific domains of risk from both the DOSPERT and RTI.</p><p>For narrow outcomes, a domain-relevant measure of risk preference always outperformed the GRQ in terms of predictive validity. Speeding was better predicted by recreation and safety risk attitudes; the number of romantic relationships was better predicted by financial and recreation risk attitudes; frequency of cheating in romantic relationships was better predicted by ethical risk attitudes; frequency of moving and job change were better predicted by social risk attitudes; and number of car accidents was better predicted by safety risks. Overall, the prediction of narrow outcomes was significantly better if domain-relevant measures of risktaking were used.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Incremental validity of the GRQ vs. multi-item risk measures</head><p>We next examined the incremental predictive validity of the GRQ over the Big Five, as well as comparisons of incremental validity of the GRQ vs. multi-item risk preference measures (Table <ref type="table">7</ref>). After controlling for the Big Five, the GRQ (Time 1 and Time 2) added incremental prediction for entrepreneurial intent. The GRQ also added incremental prediction for deviance, but only when it was measured at the same time as the outcome variable. In contrast, the multiitem measures explained more incremental variance over the Big Five for workplace deviance.</p><p>Interestingly, the GRQ contributed more incremental prediction for entrepreneurial intent than all but one multi-item measure (summated RTI).</p><p>As far as specific outcomes, the GRQ measured at Time 1 only explained incremental variance over the Big Five for three outcomes (speeding, broken bones, frequency of moving). In contrast, the eight-item GRiPS also measured at Time 1 explained incremental variance beyond the Big Five for six outcomes. Of the narrow outcomes where both GRQ and GRiPS explained incremental variance, the GRiPS explained more unique variance than the GRQ for speeding (4.4% vs. 1.0%); number of broken bones (2.1% vs. 1.4%), but not frequency of moving (1.4% vs. 1.7%). The incremental prediction was even greater for summated scores using a domainspecific measure (e.g., DOSPERT).</p><p>Together, the predictive utility of the GRQ was generally inferior to that of multi-item measures of general risk preference. The eight-item GRiPS and the seven-item RPS both exhibited stronger overall predictive utility as well as incremental prediction over the Big Five.</p><p>However, the GRQ demonstrated a surprisingly strong prediction of entrepreneurial intent compared to multi-item measures. Overall, general risk preference obtained by summing across domains (e.g., DOSPERT and RTI) resulted in the best overall prediction, though at the cost of a much longer scale.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Discussion</head><p>Single-item measures of risk preferences are frequently used in longitudinal survey studies and laboratory experiments due to their economic efficiency. This paper is the first to examine the psychometric qualities of the GRQ, a popular single-item measure of general risk preferences in terms of reliability, convergent validity, discriminant validity, and criterion validity relative to multi-item measures of risk preferences. In a two-wave survey study, we compared the psychometric qualities (reliability, convergent validity, discriminant validity, and criterion validity) of the GRQ with four well-established measures of risk preferences.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Reliability</head><p>The most mentioned limitation of single-item measures is psychometric reliability, an argument that has been put into question based on recent research on the application of singleitem measures <ref type="bibr">(Matthews et al, 2022)</ref>. Consistent with recommendations by <ref type="bibr">Matthews et al. (2002)</ref> that it is necessary to empirically examine the possible pros and cons of using single-item measures (relative to multi-item measures), results from the current program of research suggest that both the test-retest reliability and internal consistency of the GRQ is adequate and comparable to longer multi-item measures.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Convergent Validity</head><p>The GRQ converged with existing multi-item measures of domain-general risk preferences. This is not surprising, as the multi-item measures consist of items similar to that of the GRQ (e.g., &lt;I enjoy taking risks in most aspects of my life; GRiPS). The correlation between the GRQ and summated scores from domain-specific measures (e.g., DOSPERT) was slightly lower than with domain-general measures (e.g., GRiPS). One explanation is that domain-specific measures such as the DOSPERT do not cover all possible risk domains that are sampled by individuals in their response to the GRQ. Put simply, domain-specific measures may not capture all types of risks that people think about when they answer questions in the GRQ, which results in content deficiency of the DOSPERT, and reduced convergence between the GRQ.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Discriminant Validity</head><p>The GRQ also inhabited a similar position within the nomological network as multi-item measures of risk preferences. Specifically, the GRQ and multi-item risk measures share a similar pattern of relationships with the Big Five and Dark Triad. Specifically, the GRQ and multi-item risk measures share a similar pattern of relationships with the Big Five and Dark Triad. Also consistent with past research, the GRQ appears to be distinct from both the Big Five and Dark Triad <ref type="bibr">(Highhouse et al., 2022;</ref><ref type="bibr">Joseph &amp; Zhang, 2021)</ref>. The Big Five only accounted for between 17% and 26% of the variance in the GRQ whereas the Dark Triad accounted for between 23% and 27%. Interestingly, the Big Five accounted for more variance in multi-item measures of risk preferences. This could be attributed to the measurement error associated with single-item measures. Alternatively, these findings may suggest greater contamination of multi-item measures. Nevertheless, our findings show that risk preference appears to be distinct from the Big Five, regardless of the method of measurement.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Criterion Validity</head><p>Our results suggest that the GRQ was inferior to multi-item measures of risk preferences for predicting real-world outcomes. It also explained less incremental variance over the Big Five, despite less shared variance with the Big Five. Interestingly, the GRQ was equally or slightly more predictive of entrepreneurial intentions compared to multi-item measures of general risk preferences as well as domain-specific measures (specific domain scores and summated scores).</p><p>These results suggest that the GRQ may be sufficient in studies of entrepreneurial activities.</p><p>However, the GRQ was noticeably worse at predicting negative outcomes such as workplace deviance, excessive speeding, cheating in romantic relationships, number of broken bones, and number of car accidents. These results suggest that the GRQ may be better suited for predicting more industrious aspects of risk-taking than reckless and unethical risky behaviors. These results are similar to meta-analytic findings showing that general risk preference was a better predictor of adaptive (vs. maladaptive) outcomes <ref type="bibr">(Highhouse et al. 2022)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Implications</head><p>Our findings suggest that the use of GRQ in longitudinal panel studies such as the SOEP is likely justified due to the benefit of economic efficiency without significant sacrifices to reliability. We also find that the GRQ, for the most part, exhibits similar psychometric qualities as multi-item counterparts due to its similar pattern of correlations with other personality measures such as the Big Five. Thus, the position of the GRQ within the nomological network of personality traits is likely similar to that of multi-item measures of risk preferences. Overall, these findings suggest that the GRQ is a valid measure of general risk preferences.</p><p>Although we found that the predictive efficacy of the GRQ was weaker for some of the outcomes measured in this study, the observed reductions were modest. Therefore, studies with sufficient statistical power (e.g., large sample size) should be able to detect true relationships between risk preference and real-world outcomes as long as the researchers accept that the magnitude of that association be attenuated. For this reason, the GRQ is not advised when expected effects are weak or when the study is potentially underpowered. In such cases, we advise researchers to consider longer measures of general risk preference (e.g., GRiPS) if space allows and domain-specific measures (e.g., DOSPERT) when the study context and outcomes of interests are well-aligned with a specific domain (e.g., health). Nevertheless, single measures of risk preference may be suitable if the researchers are not concerned with specific predictions. For example, the GRQ may be fine to use as a statistical control of general risk preferences.</p><p>Our differ from what was observed by <ref type="bibr">Matthews et al. (2022)</ref>, where the criterion validity of their single-item measures performed much better than the GRQ. One explanation is that the GRQ was not developed to maximize construct validity. Thus, our results do not necessarily speak to the shortcoming of using single items as a measure of all risk preferences. However, it does point to the possibility that the GRQ may need revision to be on equal psychometric footing with its multi-item counterparts. In sum, researchers should carefully consider the context of the research and weigh the trade-off between survey efficiency and predictive accuracy when deciding whether to use the GRQ.  </p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0"><p>This research was approved by the Louisiana State University Institutional Review Board(#11920)   </p></note>
		</body>
		</text>
</TEI>
