skip to main content


Title: The Impact of Forecast Inconsistency and Probabilistic Forecasts on Users’ Trust and Decision-Making
Abstract

When forecasts for a major weather event begin days in advance, updates may be more accurate but inconsistent with the original forecast. Evidence suggests that resulting inconsistency may reduce user trust. However, adding an uncertainty estimate to the forecast may attenuate any loss of trust due to forecast inconsistency, as has been shown with forecast inaccuracy. To evaluate this hypothesis, this experiment tested the impact on trust of adding probabilistic snow-accumulation forecasts to single-value forecasts in a series of original and revised forecast pairs (based on historical records) that varied in both consistency and accuracy. Participants rated their trust in the forecasts and used them to make school-closure decisions. One-half of the participants received single-value forecasts, and one-half also received the probability of 6 in. or more (decision threshold in the assigned task). As with previous research, forecast inaccuracy was detrimental to trust, although probabilistic forecasts attenuated the effect. Moreover, the inclusion of probabilistic forecasts allowed participants to make economically better decisions. Surprisingly, in this study inconsistency increased rather than decreased trust, perhaps because it alerted participants to uncertainty and led them to make more cautious decisions. Furthermore, the positive effect of inconsistency on trust was enhanced by the inclusion of probabilistic forecast. This work has important implications for practical settings, suggesting that both probabilistic forecasts and forecast inconsistency provide useful information to decision-makers. Therefore, members of the public may benefit from well-calibrated uncertainty estimates and newer, more reliable information.

Significance Statement

The purpose of this study was to clarify how explicit uncertainty information and forecast inconsistency impact trust and decision-making in the context of sequential forecasts from the same source. This is important because trust is critical for effective risk communication. In the absence of trust, people may not use available information and subsequently may put themselves and others at greater-than necessary risk. Our results suggest that updating forecasts when newer, more reliable information is available and providing reliable uncertainty estimates can support user trust and decision-making.

 
more » « less
NSF-PAR ID:
10438734
Author(s) / Creator(s):
 ;  
Publisher / Repository:
American Meteorological Society
Date Published:
Journal Name:
Weather, Climate, and Society
Volume:
15
Issue:
3
ISSN:
1948-8327
Page Range / eLocation ID:
p. 693-709
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract: 100 words Jurors are increasingly exposed to scientific information in the courtroom. To determine whether providing jurors with gist information would assist in their ability to make well-informed decisions, the present experiment utilized a Fuzzy Trace Theory-inspired intervention and tested it against traditional legal safeguards (i.e., judge instructions) by varying the scientific quality of the evidence. The results indicate that jurors who viewed high quality evidence rated the scientific evidence significantly higher than those who viewed low quality evidence, but were unable to moderate the credibility of the expert witness and apply damages appropriately resulting in poor calibration. Summary: <1000 words Jurors and juries are increasingly exposed to scientific information in the courtroom and it remains unclear when they will base their decisions on a reasonable understanding of the relevant scientific information. Without such knowledge, the ability of jurors and juries to make well-informed decisions may be at risk, increasing chances of unjust outcomes (e.g., false convictions in criminal cases). Therefore, there is a critical need to understand conditions that affect jurors’ and juries’ sensitivity to the qualities of scientific information and to identify safeguards that can assist with scientific calibration in the courtroom. The current project addresses these issues with an ecologically valid experimental paradigm, making it possible to assess causal effects of evidence quality and safeguards as well as the role of a host of individual difference variables that may affect perceptions of testimony by scientific experts as well as liability in a civil case. Our main goal was to develop a simple, theoretically grounded tool to enable triers of fact (individual jurors) with a range of scientific reasoning abilities to appropriately weigh scientific evidence in court. We did so by testing a Fuzzy Trace Theory-inspired intervention in court, and testing it against traditional legal safeguards. Appropriate use of scientific evidence reflects good calibration – which we define as being influenced more by strong scientific information than by weak scientific information. Inappropriate use reflects poor calibration – defined as relative insensitivity to the strength of scientific information. Fuzzy Trace Theory (Reyna & Brainerd, 1995) predicts that techniques for improving calibration can come from presentation of easy-to-interpret, bottom-line “gist” of the information. Our central hypothesis was that laypeople’s appropriate use of scientific information would be moderated both by external situational conditions (e.g., quality of the scientific information itself, a decision aid designed to convey clearly the “gist” of the information) and individual differences among people (e.g., scientific reasoning skills, cognitive reflection tendencies, numeracy, need for cognition, attitudes toward and trust in science). Identifying factors that promote jurors’ appropriate understanding of and reliance on scientific information will contribute to general theories of reasoning based on scientific evidence, while also providing an evidence-based framework for improving the courts’ use of scientific information. All hypotheses were preregistered on the Open Science Framework. Method Participants completed six questionnaires (counterbalanced): Need for Cognition Scale (NCS; 18 items), Cognitive Reflection Test (CRT; 7 items), Abbreviated Numeracy Scale (ABS; 6 items), Scientific Reasoning Scale (SRS; 11 items), Trust in Science (TIS; 29 items), and Attitudes towards Science (ATS; 7 items). Participants then viewed a video depicting a civil trial in which the defendant sought damages from the plaintiff for injuries caused by a fall. The defendant (bar patron) alleged that the plaintiff (bartender) pushed him, causing him to fall and hit his head on the hard floor. Participants were informed at the outset that the defendant was liable; therefore, their task was to determine if the plaintiff should be compensated. Participants were randomly assigned to 1 of 6 experimental conditions: 2 (quality of scientific evidence: high vs. low) x 3 (safeguard to improve calibration: gist information, no-gist information [control], jury instructions). An expert witness (neuroscientist) hired by the court testified regarding the scientific strength of fMRI data (high [90 to 10 signal-to-noise ratio] vs. low [50 to 50 signal-to-noise ratio]) and gist or no-gist information both verbally (i.e., fairly high/about average) and visually (i.e., a graph). After viewing the video, participants were asked if they would like to award damages. If they indicated yes, they were asked to enter a dollar amount. Participants then completed the Positive and Negative Affect Schedule-Modified Short Form (PANAS-MSF; 16 items), expert Witness Credibility Scale (WCS; 20 items), Witness Credibility and Influence on damages for each witness, manipulation check questions, Understanding Scientific Testimony (UST; 10 items), and 3 additional measures were collected, but are beyond the scope of the current investigation. Finally, participants completed demographic questions, including questions about their scientific background and experience. The study was completed via Qualtrics, with participation from students (online vs. in-lab), MTurkers, and non-student community members. After removing those who failed attention check questions, 469 participants remained (243 men, 224 women, 2 did not specify gender) from a variety of racial and ethnic backgrounds (70.2% White, non-Hispanic). Results and Discussion There were three primary outcomes: quality of the scientific evidence, expert credibility (WCS), and damages. During initial analyses, each dependent variable was submitted to a separate 3 Gist Safeguard (safeguard, no safeguard, judge instructions) x 2 Scientific Quality (high, low) Analysis of Variance (ANOVA). Consistent with hypotheses, there was a significant main effect of scientific quality on strength of evidence, F(1, 463)=5.099, p=.024; participants who viewed the high quality evidence rated the scientific evidence significantly higher (M= 7.44) than those who viewed the low quality evidence (M=7.06). There were no significant main effects or interactions for witness credibility, indicating that the expert that provided scientific testimony was seen as equally credible regardless of scientific quality or gist safeguard. Finally, for damages, consistent with hypotheses, there was a marginally significant interaction between Gist Safeguard and Scientific Quality, F(2, 273)=2.916, p=.056. However, post hoc t-tests revealed significantly higher damages were awarded for low (M=11.50) versus high (M=10.51) scientific quality evidence F(1, 273)=3.955, p=.048 in the no gist with judge instructions safeguard condition, which was contrary to hypotheses. The data suggest that the judge instructions alone are reversing the pattern, though nonsignificant, those who received the no gist without judge instructions safeguard awarded higher damages in the high (M=11.34) versus low (M=10.84) scientific quality evidence conditions F(1, 273)=1.059, p=.30. Together, these provide promising initial results indicating that participants were able to effectively differentiate between high and low scientific quality of evidence, though inappropriately utilized the scientific evidence through their inability to discern expert credibility and apply damages, resulting in poor calibration. These results will provide the basis for more sophisticated analyses including higher order interactions with individual differences (e.g., need for cognition) as well as tests of mediation using path analyses. [References omitted but available by request] Learning Objective: Participants will be able to determine whether providing jurors with gist information would assist in their ability to award damages in a civil trial. 
    more » « less
  2. Background

    Objective numeracy appears to support better medical decisions and health outcomes. The more numerate generally understand and use numbers more and make better medical decisions, including more informed medical choices. Numeric self-efficacy—an aspect of subjective numeracy that is also known as numeric confidence—also relates to decision making via emotional reactions to and inferences from experienced difficulty with numbers and via persistence linked with numeric comprehension and healthier behaviors over time. Furthermore, it moderates the effects of objective numeracy on medical outcomes.

    Purpose

    We briefly review the numeracy and decision-making literature and then summarize more recent literature on 3 separable effects of numeric self-efficacy. Although dual-process theories can account for the generally superior decision making of the highly numerate, they have neglected effects of numeric self-efficacy. We discuss implications for medical decision-making (MDM) research and practice. Finally, we propose a modification to dual-process theories, adding a “motivational mind” to integrate the effects of numeric self-efficacy on decision-making processes (i.e., inferences from experienced difficulty with numbers, greater persistence, and greater use of objective-numeracy skills) important to high-quality MDM.

    Conclusions

    The power of numeric self-efficacy (confidence) has been little considered in MDM, but many medical decisions and behaviors require persistence to be successful over time (e.g., comprehension, medical-recommendation adherence). Including numeric self-efficacy in research and theorizing will increase understanding of MDM and promote development of better decision interventions.

    Highlights

    Research demonstrates that objective numeracy supports better medical decisions and health outcomes. The power of numeric self-efficacy (aka numeric confidence) has been little considered but appears critical to emotional reactions and inferences that patients and others make when encountering numeric information (e.g., in decision aids) and to greater persistence in medical decision-making tasks involving numbers. The present article proposes a novel modification to dual-process theory to account for newer findings and to describe how numeracy mechanisms can be better understood. Because being able to adapt interventions to improve medical decisions depends in part on having a good theory, future research should incorporate numeric self-efficacy into medical decision-making theories and interventions.

     
    more » « less
  3. Summary

    Forecasts of future dangerousness are often used to inform the sentencing decisions of convicted offenders. For individuals who are sentenced to probation or paroled to community supervision, such forecasts affect the conditions under which they are to be supervised. The statistical criterion for these forecasts is commonly called recidivism, which is defined as a charge or conviction for any new offence, no matter how minor. Only rarely do such forecasts make distinctions on the basis of the seriousness of offences. Yet seriousness may be central to public concerns, and judges are increasingly required by law and sentencing guidelines to make assessments of seriousness. At the very least, information about seriousness is essential for allocating scarce resources for community supervision of convicted offenders. The paper focuses only on murderous conduct by individuals on probation or parole. Using data on a population of over 60000 cases from Philadelphia’s Adult Probation and Parole Department, we forecast whether each offender will be charged with a homicide or attempted homicide within 2 years of beginning community supervision. We use a statistical learning approach that makes no assumptions about how predictors are related to the outcome. We also build in the costs of false negative and false positive charges and use half of the data to build the forecasting model, and the other half of the data to evaluate the quality of the forecasts. Forecasts that are based on this approach offer the possibility of concentrating rehabilitation, treatment and surveillance resources on a small subset of convicted offenders who may be in greatest need, and who pose the greatest risk to society.

     
    more » « less
  4. Objective

    Patients have a poor understanding of outcomes related to total knee replacement (TKR) surgery, with most patients underestimating the potential benefits and overestimating the risk of complications. In this study, we sought to compare the impacts of descriptive information alone or in combination with an icon array, experience condition (images), or spinner on participants’ preference forTKR.

    Methods

    A total of 648 members of an online arthritis network were randomized to 1 of 4 outcome presentation formats: numeric only, numeric with an icon array, numeric with a set of 50 images, or numeric with a functional spinner. Preferences forTKRwere measured before and immediately after viewing the outcome information using an 11‐point numeric rating scale. Knowledge was assessed by asking participants to report the frequency of each outcome.

    Results

    Participants randomized to the icon array, images, and spinner had stronger preferences forTKR(after controlling for baseline preferences) compared to those viewing the numeric only format (P< 0.05 for all mean differences). Knowledge scores were highest in participants randomized to the icon array; however, knowledge did not mediate the association between format and change in preference forTKR.

    Conclusion

    Decision support at the point‐of‐care is being increasingly recognized as a vital component of care. Our findings suggest that adding graphic information to descriptive statistics strengthens preferences forTKR. Although experience formats using images may be too complex to use in clinical practice, icon arrays and spinners may be a viable and easily adaptable decision aid to support communication of probabilistic information.

     
    more » « less
  5. Abstract

    We present BrainNet which, to our knowledge, is the first multi-person non-invasive direct brain-to-brain interface for collaborative problem solving. The interface combines electroencephalography (EEG) to record brain signals and transcranial magnetic stimulation (TMS) to deliver information noninvasively to the brain. The interface allows three human subjects to collaborate and solve a task using direct brain-to-brain communication. Two of the three subjects are designated as “Senders” whose brain signals are decoded using real-time EEG data analysis. The decoding process extracts each Sender’s decision about whether to rotate a block in a Tetris-like game before it is dropped to fill a line. The Senders’ decisions are transmitted via the Internet to the brain of a third subject, the “Receiver,” who cannot see the game screen. The Senders’ decisions are delivered to the Receiver’s brain via magnetic stimulation of the occipital cortex. The Receiver integrates the information received from the two Senders and uses an EEG interface to make a decision about either turning the block or keeping it in the same orientation. A second round of the game provides an additional chance for the Senders to evaluate the Receiver’s decision and send feedback to the Receiver’s brain, and for the Receiver to rectify a possible incorrect decision made in the first round. We evaluated the performance of BrainNet in terms of (1) Group-level performance during the game, (2) True/False positive rates of subjects’ decisions, and (3) Mutual information between subjects. Five groups, each with three human subjects, successfully used BrainNet to perform the collaborative task, with an average accuracy of 81.25%. Furthermore, by varying the information reliability of the Senders by artificially injecting noise into one Sender’s signal, we investigated how the Receiver learns to integrate noisy signals in order to make a correct decision. We found that like conventional social networks, BrainNet allows Receivers to learn to trust the Sender who is more reliable, in this case, based solely on the information transmitted directly to their brains. Our results point the way to future brain-to-brain interfaces that enable cooperative problem solving by humans using a “social network” of connected brains.

     
    more » « less