<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Data Prophecy: Exploring the Effects of Belief Elicitation in Visual Analytics</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>05/06/2021</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10320958</idno>
					<idno type="doi">10.1145/3411764.3445798</idno>
					<title level='j'>ACM Conference on Human Facotrs in Computing Systems</title>
<idno></idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Ratanond Koonchanok</author><author>Parul Baser</author><author>Abhinav Sikharam</author><author>Nirmal Kumar Raveendranath</author><author>Khairi Reda</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Interactive visualizations are widely used in exploratory data analysis, but existing systems provide limited support for confirmatory analysis. We introduce PredictMe, a tool for belief-driven visual analysis, enabling users to draw and test their beliefs against data, as an alternative to data-driven exploration. PredictMe combines belief elicitation with traditional visualization interactions to support mixed analysis styles. In a comparative study, we investigated how these affordances impact participants' cognition. Results show that PredictMe prompts participants to incorporate their working knowledge more frequently in queries. Participants were more likely to attend to discrepancies between their mental models and the data. However, those same participants were also less likely to engage in interactions associated with exploration, and ultimately inspected fewer visualizations and made fewer discoveries. The results suggest that belief elicitation may moderate exploratory behaviors, instead nudging users to be more deliberate in their analysis. We discuss the implications for visualization design.
CCS CONCEPTS• Human-centered computing → Empirical studies in visualization; Visual analytics.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">INTRODUCTION</head><p>Visualization tools have become vital instruments to data science. These interactive analysis systems enable users to explore sets of data and look for patterns that might indicate new insights. However, existing visualization tools typically come only with datadriven interactions, providing no explicit support for confirmatory analyses. In particular, current tools do not provide affordances for users to share their working hypotheses, and test the accuracy of those hypotheses before peeking at the data.</p><p>Statisticians have long recognized a need for both exploratory and confirmatory analyses <ref type="bibr">[48]</ref>, with the choice of method dependent on the question at hand and the status of one's working knowledge. Research in cognitive science has also emphasized the importance of belief-driven reasoning, wherein people attempt to proactively test the fit of their mental models against observable data. For instance, Dunbar showed that scientific discovery usually occurs through a process of conceptual mismatch, whereby an analyst observes a discrepancy between their expectations and the evidence <ref type="bibr">[7]</ref>. It is often by actively seeking to reconcile such mismatches that people begin to make new discoveries <ref type="bibr">[8]</ref>. Similarly, <ref type="bibr">Klein et al.</ref> observe that model-fit testing is key to sensemaking, arguing that most people seek to (dis)confirm and adapt their existing frames, as opposed to developing entirely new frames from scratch, even when faced with novel information <ref type="bibr">[22]</ref>. This research suggests that, to be maximally effective, visualizations must also support a confirmatory approach to analysis, in addition to the traditional role as data-driven sensemaking tools. Addressing this gap could also serve to reduce the incidence of spurious discovery in visualizations <ref type="bibr">[53]</ref>, by fostering a healthy level of skepticism and grounding insights in prior beliefs.</p><p>Researchers have started to acknowledge the need to incorporate one's mental model as an essential aspect to reasoning with visualizations. For example, researchers tested the effect of eliciting prior knowledge from participants, and visualizing it alongside data to encourage reflection. This body of work suggests that knowledge externalization improves data recall <ref type="bibr">[20]</ref>, promotes normative Bayesian reasoning <ref type="bibr">[21]</ref>, and increases the communicative impact of narrative visualizations, if not their persuasiveness <ref type="bibr">[15]</ref>. Yet, these studies were done under highly controlled experimental conditions, and using sparse datasets of a handful of data points. It is still unclear how belief elicitation can impact one's visual analysis</p><p>in realistic, open-ended scenarios. Furthermore, research is needed on how to design functional tools that can scaffold confirmatory analyses, while still providing the traditional suite of visualization interactions people have come to expect.</p><p>Our goal in this work is two-fold. First, we investigate how users structure their visual analysis while interacting with a system that supports belief externalization, as a way of learning about and testing one's knowledge against data. Second, we contribute a perspective on how to redesign exploratory, multi-view visualizations to also support hypothesis-driven analyses. To that end, we present PredictMe, a tool that enables users to sketch their predictions in a variety of charts. These custom interactions are blended with traditional visualization functionalities, allowing for a mix of exploratory and confirmatory analyses in one platform. We report on an exploratory, between-subjects study of participants' cognition and interaction patterns. We compare our design against a control condition of the same tool that lacks the ability to draw expectations. Our results show that, given the opportunity, participants frequently chose to share their data expectations with the system, despite the overhead involved. Analysis of their think-aloud statements showed that they developed more hypotheses before peeking at the data, and were more attentive to flaws in their mental models. However, those same participants inspected fewer visualizations on average, and ultimately developed fewer observations about the data. The results suggest that belief elicitation may have a moderating effect on exploratory behaviors, instead nudging participants to be more deliberate in their queries. We discuss these findings, and address the potential benefits and complications of incorporating belief-driven interactions in visual analytics tools.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">BACKGROUND AND RELATED WORK 2.1 Exploratory versus Confirmatory Analyses</head><p>A hallmark of good science is the ability to attend to unexpected results. Indeed, some of the most prominent breakthroughs in the history of science, such as the discovery of Penicillin <ref type="bibr">[11]</ref>, occurred by chance when scientists saw surprising results, and were subsequently able to reinterpret those findings in new ways.</p><p>To maintain an open perspective, analysts typically prescribe Exploratory Data Analysis (EDA) as an integral step in the data analysis pipeline <ref type="bibr">[14,</ref><ref type="bibr">48]</ref>. EDA is a process of looking for interesting distributions, outliers, and relationships, which can then be used to formulate new hypotheses or devise additional experiments <ref type="bibr">[47]</ref>. Historically, EDA has relied heavily on visualization tools, which provide the sort of flexibility needed. Nevertheless, Tukey, who is largely credited with championing EDA, cautions against using it for "fishing expeditions" <ref type="bibr">[10]</ref>. He notes that accepting findings from EDA as conclusive insights is "destructively foolish" <ref type="bibr">[47]</ref>. This is because a hypothesis or a pattern suggested spontaneously by a dataset is unlikely to be refutable by that same data. Instead, findings from EDA should be considered preliminary, requiring confirmation with an independent data source.</p><p>By contrast, in confirmatory analyses, hypotheses are posited (and ideally preregistered <ref type="bibr">[29]</ref>) before the data is seen. When data is tested against a preconceived prediction or model, and found to conform, that model (and its underlying hypothesis) can be said to be confirmed. Confirmatory analysis is considered the standard inferential method in science; inferences made are generally reliable, as long as quality of the data is controlled and the sample is reasonably representative of the underlying population. The key reliability indicator, however, is that hypotheses are posited prior to peeking at the data, (i.e., before the outcome is known) <ref type="bibr">[19]</ref>. It is possible to view both exploratory and confirmatory analyses as instances of model check: the analyst compares the visualized data to an imagined dataset sampled from an (implicit) reference model <ref type="bibr">[12,</ref><ref type="bibr">16]</ref>. This comparison could then prompt a Bayesian update to revise the reference model, or, alternatively, a classical hypothesis test wherein the difference between the imagined and visualized data is adjudicated using a (visual) test statistic. However, others still maintain that exploration and confirmation should be conceptually separated in order to ensure the robustness of discoveries <ref type="bibr">[6]</ref>.</p><p>The distinction between exploration and confirmation (or lack thereof) is of specific concern for visual analytics. Visualization users appear to frequently accept results generated through EDA as conclusive, leading to spurious findings that may not generalize beyond the sample data at hand. For example, in a startling result, Zgraggen et al. found the majority of discoveries uncovered through interactive visual analysis to be false <ref type="bibr">[53]</ref>. It has been suggested that the way visualization tools are currently designed serves to further blur the boundary between potentially robust confirmatory findings and preliminary, exploratory results <ref type="bibr">[35]</ref>: as users interactively filter, bin, and slice-and-dice their data, they make a myriad inferences with just a few clicks, often without being aware of the effects of this multiplicity on the reliability of inferences <ref type="bibr">[13]</ref>. Zhao et al. devised an "&#120572;-investing" approach to account for multiple comparisons during interactive analysis <ref type="bibr">[54]</ref>. Jo et al. allow users to leave 'safeguard' annotations on uncertain visualizations, indicating that those visualizations need to be rechecked once the complete data is available <ref type="bibr">[18]</ref>. These interventions may reduce the incidence of spurious discovery in visual analytics. However, the lack of clear hypothesis-and model-testing capabilities in visualization tools can still leave people overconfident in their analysis strategy. Preliminary evidence suggests that users could indeed benefit from such affordances <ref type="bibr">[2,</ref><ref type="bibr">36,</ref><ref type="bibr">38]</ref>. Our work addresses this gap by proposing workflows and interactions that can be used for confirmatory and model-driven analyses in visualizations. We also study how the presence of these interactions affects user behavior and analysis patterns.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Sensemaking with Visualizations</head><p>Sensemaking refers to a "class of activities and tasks in which there is a native seeking and processing of information to achieve understanding about some state of affairs" <ref type="bibr">[23]</ref>. Theories of sensemaking have been a recurring theme in visual analytics, and have contributed heavily to the development of the field <ref type="bibr">[4]</ref>. Among the most commonly cited models is Pirolli and Card's <ref type="bibr">[33]</ref>, which comprises the following sensemaking activities: analysts iteratively filter their data, select and highlight relevant evidence, and reorganize that evidence in a 'schema'. A schema can then be used to induce hypotheses to explain the data or to take decisions. Visualization designers have taken inspiration from this model. For example, Jigsaw divides its interface into several components, each with interactions intended to support a specific sensemaking activity (e.g., 'evidence marshaling') <ref type="bibr">[44]</ref>. Shrinivasan and Wijk provide a 'knowledge editor', enabling users to record their hypotheses and conclusions in the form of a concept graph <ref type="bibr">[43]</ref>. Schemaline aids analysts in schematizing temporal events <ref type="bibr">[28]</ref>.</p><p>Although many visualization tools have been custom-designed to mirror empirical sensemaking models, these tools are primarily intended to facilitate 'bottom-up', data-driven sensemaking. By comparison, no tools exist to specifically support top-down, expectationguided visual analyses (e.g., as espoused by Klein et al's data-frame theory <ref type="bibr">[22]</ref>). Some research has sought to develop systems that adapt to user models in real-time. For instance, semantic interaction can deduce conceptual relationships by observing how users manipulate spatial layouts <ref type="bibr">[9]</ref>. This information is then used to evolve the visualization to match analyst beliefs. Such techniques, however, are limited to inferring implicit, low-level features (e.g., pairwise multidimensional distance <ref type="bibr">[50]</ref>), and are primarily meant to influence computational processes running in the background. As such, these techniques do not provide explicit hypothesis-testing affordances that people could use outright to validate their mental models and beliefs. Our work ultimately aims to re-architect visual sensemaking tools to equally support both data-and belief-driven (i.e., confirmatory) analyses.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">Belief Elicitation in Visualization</head><p>Belief elicitation is the process of externalizing implicit knowledge (typically of experts) about some unknown quantity, and distilling that knowledge into a probability distribution <ref type="bibr">[30]</ref>. These distributions are often used as prior models, which are then updated with new (typically empirical) data using a Bayesian framework. Despite the long history, the visualization community has only recently begun to incorporate user beliefs in data graphics. Practitioners have started experimenting with interactions that invite audience to externalize their beliefs by sketching in charts. For example, the New York Times featured a series of visualizations that invited the viewer to predict the impact of the Obama presidency on various socioeconomic indicators <ref type="bibr">[31]</ref>. The viewer sketches the expected trend line by drawing in an initially blank chart. The actual timeseries are then revealed, enabling the viewer to compare the accuracy of their sketch and, accordingly, update their beliefs. Kim et al. studied this kind of interaction in a controlled study, finding that it improved participants' data recall <ref type="bibr">[20]</ref>. They also proposed belief elicitation as an evaluation method by considering the degree to which visualizations promote normative Bayesian update among viewers <ref type="bibr">[21]</ref>. Heyer et al. studied how people adjust their attitudes towards a message experienced through a narrative visualization <ref type="bibr">[15]</ref>. They found that prior elicitation does not significantly impact attitudinal change, even though it is correlated with other knowledge acquisition metrics. Choi et al. conducted a Wizard-of-Oz study to explore whether natural language can be used to specify prior beliefs <ref type="bibr">[2]</ref>. They subsequently developed a tool that allows users to frame hypotheses in natural prose, and accordingly receive visualizations tailored to their beliefs <ref type="bibr">[3]</ref>. Sarma and Kay investigated how Bayesian statisticians set their priors <ref type="bibr">[42]</ref>. They documented varying strategies and philosophies practitioners seem to draw upon when distilling subjective beliefs into prior distributions.</p><p>Empirical work on visual belief elicitation have so far utilized highly controlled experiments, surveys, or interview methods. Though informative, the results may not necessarily translate to visual analytics, where analysts engage in fluid, open-ended sensemaking, and have a choice to either specify priors or proceed in an exploratory fashion. Our work contributes insights on how users might behave in such contexts.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">METHODOLOGY</head><p>Our goal is to broadly understand how users might interact with a visualization tool that supports belief-driven analysis. In this work, we specifically address two research questions:</p><p>&#8226; Given the opportunity to visually externalize their expectations, how often will people use this feature? &#8226; How do users react to seeing their expectations represented alongside data? And how will the ability to test one's predictions affect their visual analytic process?</p><p>To investigate these two questions, we conducted a comparative, exploratory study. We recruited participants who have prior data analysis experience, and tasked them with visually analyzing two data sets that we provided. Participants were randomly assigned to one of two conditions. A Prediction condition consisted of an interface that provides belief elicitation affordances, optionally enabling participants to sketch their predictions into charts, and compare these sketches to visualized data. A second Standard condition provided all of the interactions available in the former condition, but otherwise lacked the prediction functionality. We describe the design of the visualization. We then discuss the study procedures and analysis methodology.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Visualization</head><p>Since there are no established visualization tools that support knowledge externalization, we created a custom-designed tool for this study, which we dub PredictMe. The design of PredictMe was inspired by existing visualization systems, such as Vizdom <ref type="bibr">[5]</ref> and ExPates <ref type="bibr">[17]</ref>, and by results from formative studies on belief elicitation in visual analytics <ref type="bibr">[2]</ref>. The interface allows users to create data views on demand; users drag data attributes from a side panel and release them onto an initially empty canvas to create charts. Multiple charts can be created, resized, and positioned freely within the canvas. Additional attributes can also be added to an existing chart by dragging onto placeholders. The tool supports five visualization types: bar charts, histograms, scatterplots, line graphs, and parallel coordinates plots. Chart type is determined based on the number of attributes and their types. For example, a single qualitative attribute produces a bar chart, whereas a quantitative attribute results in a histogram. Two quantitative attributes are visualized as a scatterplot. Combining a quantitative with a temporal attribute results in a line graph. Lastly, a parallel coordinates plot can be generated by incorporating two or more attributes. In addition to creating charts, users can brush-and-link by selecting data points from one chart and seeing their distribution highlighted in other charts. Figure <ref type="figure">1</ref> shows an overview of the interface. A key difference with existing tools is the ability to sketch one's expectations prior to seeing data. PredictMe then displays those expectations alongside the data. The sketching feature generally works by first presenting the user with initially blank charts: when creating a new chart, users see labeled axes and data ranges, but without actual data points. The user can then optionally sketch into the chart to outline the pattern they expect to observe. The precise sketch interaction is dependent on the chart type: for histograms and bar charts, predictions are specified by adjusting the length of bars, which are initially set at a baseline height. In doing so, the user specifies the frequency of individual bins in a histogram, or the value associated with a qualitative attribute. Figure <ref type="figure">2</ref> illustrates this interaction sequence. For line charts, the user draws with a pencil tool to outline the expected trend for a timeseries. Scatterplots come with a paintbrush that can be used to predict the density of the point cloud. Lastly, in parallel coordinates, the user predicts by specifying intervals on the parallel axes, effectively creating ribbons to designate the expected multi-variate pattern. Figure <ref type="figure">3</ref> illustrates these different sketching styles. In designing these interactions, we took inspiration from Kim et al's taxonomy <ref type="bibr">[20]</ref>, as well as from examples developed by practitioners <ref type="bibr">[1,</ref><ref type="bibr">31]</ref>.</p><p>After entering their expectations, users click a 'See Data' button. This causes the actual data to be revealed in the chart and shown alongside the sketch (see Figures <ref type="figure">1-E</ref> &amp;<ref type="figure">2</ref>). For distinction, expectations are consistently color-coded in violet, whereas data marks are always shown in blue. Specifying expectations is optional: the user may choose to skip this step by immediately clicking 'See Data'. All charts are initially set to a 'Prediction' mode, giving users the opportunity to specify expectations.</p><p>The sketch feature was only available in the Prediction condition. However, to give participants in the Standard condition an equal opportunity to reflect on their prior knowledge, data display is also delayed, with newly created charts shown blank. Participants in the Standard condition similarly had to click 'See Data' to reveal chart contents, even though they could not draw a prediction. This extra step enabled us to capture verbal predictions participants may have uttered prior to being exposed to the data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Participants</head><p>We recruited 24 participants from a large, public university campus. All participants had prior data analysis experience (e.g., using Excel, R, Tableau, or SAP), and represented a range of analytic disciplines, including computer science, statistics, and data science. We compensated participants with a $20 gift card upon completing the  Line charts provide a pencil tool to draw the expected shape of a timeseries. In a scatterplot, the expected point cloud density can be specified using a paintbrush. Lastly, in parallel coordinates, the expected multi-variate pattern is designated by specifying intervals on the vertical axes. Expectations are color-coded in violet to distinguish from data marks (blue). Data points that fall within the expectations are also visually differentiated from those that deviate.</p><p>study. In addition to the 24 participants, we piloted the study with 3 participants whose data were excluded from the analysis. Participants were assigned randomly to one of the two conditions (Prediction or Standard), for a total of 12 participants in each. We refer to participants in the Prediction condition by &#119875; &#119894; and those in Standard by &#119878; &#119895; . Thirteen participants completed the study inperson; they interacted with the visualization through a standard desktop setup (i.e., mouse, keyboard, and a full-HD monitor). For the remainder 11 participants (7 in Standard and 4 in Prediction), the study was conducted remotely (a change prompted by the COVID-19 pandemic). Those latter participants were provided with a web link to the visualization. They completed the study using their own computers, sharing their screen content with the experimenter via Zoom. Notwithstanding the change in format, we maintained identical procedures across the in-person and remote sessions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Procedures</head><p>Our goal was to place participants in an open-ended visual analysis context. We therefore adopt the setup employed in insight-based evaluation methodologies <ref type="bibr">[34,</ref><ref type="bibr">41]</ref>. As such, we did not provide participants with specific tasks or questions to answer. Rather, participants were instructed to freely analyze the provided data, by developing their own hypotheses and lines of question. We told participants that they may share their beliefs (either verbally or through sketch) particularly if they had expectations of what the data might look like, but that they may also skip this step if they wish. Recall that in both conditions, charts are initially blank, which gave subjects in the Standard condition an opportunity to verbally externalize their beliefs.</p><p>Participants were first given a demonstration of the visualization tool using brief example scenarios. During this demonstration, the experimenter showed participants examples of how they might externalize their beliefs. This was done by sketching in the Prediction condition, or by verbalizing in Standard (e.g., "I predict X might increase with Y...") prior to clicking the 'See Data' button. To avoid biasing participants, the demonstration employed a different dataset from those participants were tasked with analyzing. Following the demonstration, participants conducted two separate analysis sessions using two different datasets. The datasets were acquired from kaggle.com, an open source dataset repository. The first dataset contained statistics of student admissions to select US graduate programs, comprising attributes such as the student's GPA, test scores, and research experience, among others. The second dataset comprised statistics about top music songs in the past decade, with attributes such as genre, danceability, loudness, and popularity. The datasets were chosen as they represent common knowledge to a university community (admission process) as well as data about popular culture (music), thus providing participants with attributes they are likely to have some prior knowledge about.</p><p>At the beginning of each analysis session, participants were given a data sheet containing a brief description of each set, including size, data types, and column definitions. Participants were given a few minutes to read the data sheet before starting their analysis. We allocated 30 minutes per dataset, although participants were at liberty to stop earlier if they ran out of ideas. Alternatively, they could extend the session longer if they felt they needed more time for their analysis. An experimenter was present throughout to answer participants' questions and proctor the study. The experiment was audio recorded and the contents of participants' screen were captured in video. We instructed participants to think aloud and verbalize their thought throughout.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4">Analysis, Segmentation, and Coding</head><p>We first transcribed participants' utterances and segmented them using standard verbal protocol analysis methods <ref type="bibr">[46]</ref>. Segments consisted of independent clauses that can be understood on their own. We then grouped related segmented into 'queries'. A single query comprised a self-contained line of analysis with one or more associated visualizations, and with typically multiple verbal statements. The segmentation process resulted in a total of 2,728 segments, and 651 unique queries. The average number of queries per participant was 27.</p><p>To analyze participants' verbal utterances and reactions, we developed a coding scheme using a grounded theory approach <ref type="bibr">[45]</ref>. Two coders inductively coded the segmented data. The coders consulted the video recording to resolve any ambiguities in the process. Throughout, the emerging coding scheme was revised iteratively and discussed regularly with members of the research team. After finalizing the code book, the entire dataset was then re-coded using the final scheme. We subsequently measured coding reliability by having the two coders redundantly and independently code 60 segments (one entire analysis session from a randomly selected participant). Inter-coder agreement was measured at 92.64%, with a Cohen's kappa of 0.9144, indicating excellent agreement between the two coders <ref type="bibr">[26]</ref>.</p><p>The codes were divided into three orthogonal categories: Expectations, Assessments of Data-Expectation Fit, and Reactions. Expectations comprised three codes designating the point at which a participant supplied predictions: before or after inspecting the data, or whether they chose to not provide a prediction for a particular query. Data-Expectation Fit indicates the degree to which a participant's expectation was confirmed or contradicted by data, as self-assessed by the participant. Lastly, Reactions comprised verbal statements uttered either before or after inspecting visualizations. This latter category included insight-related codes, such as Observations and Hypotheses <ref type="bibr">[41]</ref>. We also distinguish between hypotheses verbalized before or after the relevant data is seen by a participant. Lastly, we coded Reactions that are indicative of certain cognitive activities, including Goals, Reasoning, Surprises, and Belief Updates. The complete coding scheme is available in the supplementary materials. We also include the transcribed and coded data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">RESULTS</head><p>We first report on differences in analytic behaviors and insight acquisition across the two conditions. We then analyze variations in participants' interaction patterns. Given that our study is exploratory in nature, we refrain from making generalizable statistical inferences. Instead, we present our results (with confidence intervals) as exploratory findings requiring confirmation in future experiments. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Analytic Behaviors and Insights</head><p>We consider differences in the number of queries, predictions, hypotheses, and observations generated by participants. To mitigate the effects of inter-participant variation, we compare averaged, normalized rates where appropriate, by counting code occurrences per subject and dividing by the total number of coded segments for that subject. 3). In each query, a participant can decide to provide a prediction before seeing the data, state their prediction after seeing the data, or simply explore the data without supplying any prediction. Figure <ref type="figure">5</ref> depicts the average tendency for these three alternatives. On average, 93.6% of queries (CI: 88.8-98.5%) in the Prediction condition included a prediction prior to data revelation, compared to 66.4% (CI: 55.1-77.8%) in Standard. By contrast, participants in the Standard condition were approximately 6 times more likely to predict after inspecting data (14.7% of queries, CI: 7.4-22% versus 2.7%, CI: 0-5.5% in Prediction). Similarly, participants in the Standard condition were 5 times more likely to not specify predictions (18.6%, CI: 13.1-24.2% versus 3.7%, CI: 0-7.3% in Prediction).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.2">Assessing Data-Expectations Fit:</head><p>Participants making predictions (either by sketching or by verbalizing their expectations) usually follow with an assessment of how accurate their predictions were. We identified and coded three types of self-assessments: Accurate, Partial Match, and Mismatch. A forth category (No Assessment) indicates no explicit assessment. Figure <ref type="figure">4</ref> depicts the average frequency of these codes across the two conditions. Participants in the Prediction condition declared a Mismatch between their expectations and the data more frequently (41.7% of queries, CI: 31.7-51.7%) compared to those in the Standard condition (27.2%, CI: 17.1-37.2%). As an example of a Mismatch, P10 used a histogram to test their knowledge of the TOEFL scores distribution. Upon inspecting the data, the participant observed that "actual scores [in the 110 range] are higher than what [they had] predicted. " By contrast, participants in the Standard condition stated that their predictions were accurate in 33% (CI: 22.3-43.7%) of the time, compared to only 23.5% (CI: 16-31%) in Prediction. For example, after inspecting the numbers for male and female singers, S2 stated: "as I have guessed, there are more male than female singers. " Similarly, there were more Partial Matches in Standard than in the Prediction condition (21.5%, CI: 14.7-28.3% in Prediction versus 27.9%, CI: 18.6-37.1% in Standard). A Partial Match indicates that, at least, some aspects of the prediction were realized. For example, participant S2 hypothesized that personal statement ratings have an identical effect on the chance of admission as do letters of recommendation. They later discovered that, while there were similarities, there were also differences in patterns that did not seem to align with their mental model: "the trend is similar, but with letter of recommendation ratings, the range is very wide compared to statement of purpose ratings. "</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.3">Hypotheses &amp; Post-Data Hypotheses:</head><p>Hypotheses occur when a participant verbalizes a clear conjecture before seeing see the data. One key criteria for coding a statement as a hypothesis is the inclusion of an explanation, such as a justification for an expected correlation or a causal mechanisms through which one attribute influences another. For instance, participant P8 stated while sketching their expectation in a scatter plot: "These two variables [positive mood and dance-ability] should be correlated. If you are happy, you'd want to dance to the music. " On the other hand, a Post-Data Hypothesis occurs when the verbalized conjecture is stated after the participant had seen the relevant data, typically as an explanation to something that had not necessarily been expected. For instance, participant P3 stated: "I actually believed if the University rating is good then chances to get a higher CGPA are more. I did not expect this result. " Figure <ref type="figure">6</ref>-top compares the rate of pre-and post-data hypotheses. On average, Hypotheses amounted to <ref type="bibr">19</ref>  compared to 14.9% (CI: 11.7-18%) in Standard. Post-Data Hypotheses also occurred more frequently in Prediction than in Standard, even though the difference amounts to merely 1% of reactions (6.4%, CI: 1.9-10.8% in Prediction versus 5.2%, CI: 3.9-6.5% in Standard). The confidence interval for that former estimates are especially wide, suggesting wide variation among participants.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.4">Goals.</head><p>At the onset of a query, participants sometimes chose to verbalize a specific goal they have in mind. Goals can be seen as 'questions' the participant sought to answer, but without concrete expectations to be considered hypotheses. For example, participant S3 stated: "I want to see what genre is the most popular," before exploring the relationship between music genre and popularity. On average, 6.8% (CI: 2.8-10.9%) of reactions were coded as Goal in the Prediction condition compared to 8.4% (CI: 4.9-12%) in Standard (see Figure <ref type="figure">6</ref>-top-right).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.5">Observations.</head><p>Observations occur when a participant attempts to draw an insight while inspecting a visualization. As such, and by definition, observations occur solely after the data is revealed. As an example, participant S2 explored the number of singers by gender throughout the last decade, observing "an increase in the number of female singers from 2010 to 2017." Figure <ref type="figure">6</ref>-top shows the mean observations rate. Participants in the Standard condition made more frequent data-driven remarks, with 31.1% (CI: 27-35.2%) of their reactions coded as Observation, compared to 16.4% (CI: 13-19.8%) in Prediction.</p><p>4.1.6 Reasoning, Surprises, and Belief Updates. After inspecting the data, some participants provided rationale to substantiate or explain their conclusions. For instance, participant P9 discovered that students with a research experience seem to have a higher chance of being admitted to a graduate school. The participant subsequently provided a reason for this observation, stating that "since it's a graduate school, top universities would probably expect some research experience from applicants prior to joining their program. " Figure <ref type="figure">6</ref>-bottom illustrates the rates of statements coded as Reasoning. Participants in the Prediction condition were roughly twice as likely to provide rationale to support their discoveries than those in Standard (7.2%, CI: 3.5-11% versus 3.5%, CI: 2.1-5% of verbal reactions). In addition to providing rationale, participants could also update their belief to incorporate any new information they had uncovered. As an example, Participant P6 previously expected songs with lower 'loudness' to be more popular. After observing data to the contrary, they stated that "low noise songs are not at all popular. That's the different thing which I learnt. " We did not find a difference between the two conditions in terms of Belief-Updates (4.8%, CI: 2.9-6.7% of verbal reactions in Prediction versus 4.2%, CI: 2-6.3% in Standard). In a few occasions, participants explicitly expressed surprise at the data. For instance, after finding out that songs by both male and female singers had roughly equal loudness, P9 said, "Surprisingly, they are the same." On average, 4.2% (CI: 2.1-6.4%) of reactions in the Prediction condition were coded as Surprise, compared to 2.5% (CI: 0.5-4.5%) in Standard. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Interaction Patterns</head><p>We report differences in how participants utilized the interface. We focused on indicators that can be used as proxies to gauge participants' stance (i.e., confirmatory versus exploratory). Specifically, we consider the following metrics: number of views created, frequency of brushing-and-linking, and the amount of time spent looking at or predicting the data. These events were identified and coded manually from the video recordings.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.1">Number of views:</head><p>We counted the number of charts participants created as an indicator to the breadth of their analysis. Recall that participants had the freedom to create as many charts as they needed during a particular line of analysis. Having two or more views affords an opportunity to look for multi-variate relationships, either through visual comparison alone or by brushing-and-linking. Figure <ref type="figure">7</ref>-left shows the average number of views created per query in the two conditions. On average, participants in the Prediction condition utilized 1.1 (CI: 1-1.2) views compared to 1.4 (CI: 1.2-1.5) in Standard. The latter group were thus more likely to look at multiple charts and, by extension, potentially consider a larger number of attributes in their queries.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.2">Brushing-and-linking:</head><p>A standard visualization feature that has come to be associated with exploratory analysis is brushing-andlinking <ref type="bibr">[39,</ref><ref type="bibr">52]</ref>. We measured the rate of brushing to understand how this feature might be used in a system that emphasizes beliefdriven analysis. Figure <ref type="figure">7</ref>-right shows the percentage of queries in which brushing was activated at least once. Participants in Standard utilized this feature five times more frequently compared to those in the Prediction condition (33.6%, CI: 24.6-42.6% versus 7.3%, CI: 1.1-13.4%). This may indicate a higher tendency to look for relationships across multiple views in the former. It may also reflect the fact that those in Standard were more likely to create multiple views, and hence activate the brush. Collectively, however, these two metrics (number of views and the rate of brushing) may indicate a higher propensity for data-driven exploration in the Standard condition.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.3">Analysis time:</head><p>Lastly, we measured the time spent by participants on each query. On average, participants in both conditions spent virtually equal amounts of time addressing a single query </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">DISCUSSION</head><p>The results suggest marked differences in behavior and interaction patterns across the two conditions. We discuss the emerging variations, highlighting implications for design where possible.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1">Exploration versus Confirmation</head><p>The distinction between exploratory and confirmatory analyses is important for proper inference <ref type="bibr">[10,</ref><ref type="bibr">27,</ref><ref type="bibr">47]</ref>. However, for users of interactive visualizations, it is often quite difficult to distinguish between the two styles of analysis <ref type="bibr">[35]</ref>. Eliciting prior beliefs can help both users and systems discriminate between exploratory and confirmatory activities. Extant work suggests that sketching data expectations into charts is intuitive for most users <ref type="bibr">[15,</ref><ref type="bibr">20]</ref>. However, these earlier studies have been conducted under highly constrained settings and on very small datasets. A natural follow-up question is whether this kind of interaction might work in an open-ended, visual analytics context. Our study sheds light on this question. The results show that participants utilized the PredictMe feature in the majority of their queries. Specifically, 93.6% of queries in the Prediction condition came with concrete data expectation, which were externalized in the form of a graphical sketch. By comparison, in only 66.4% of queries in the Standard condition did participants verbalize their expectations prior to inspecting chart contents (recall that participants in Standard lacked the ability to sketch, but were otherwise prompted and given the opportunity to verbally state their beliefs, should they want to). Those same participants were also more likely to state their beliefs after seeing the data (14.7% of queries). By contrast, only 2.7% of expectations were verbalized post-data exposure in the Prediction conditions. Design implication: Our results suggest that belief elicitation through sketching is viable in the context of visual analytics, and could perhaps become a standard feature of interactive visualization systems. While such interaction can be burdensome in multi-view environments, as users would need to repeatedly sketch their beliefs in multiple charts, our study suggests that analysts may still embrace this feature. The PredictMe feature could, in turn, provide a way to help people discriminate between confirmatory and exploratory queries-the latter are distinguished by charts that lack concrete expectations, or having expectations that are formed after the data is seen. It is important to note, however, that our study does not distinguish between expectations that reflected substantive beliefs and those that might represent a participant's 'best guess'. Several participants commented during the study that they were unsure about their predictions. Thus, in addition to capturing beliefs, designers may also prompt users to specify the confidence in their predictions. This information can help differentiate true hypotheses from guessing. The former could be labeled as confirmatory with potential to generate robust conclusions from a visualization, whereas the latter may be flagged as exploratory, requiring further confirmation by independent sources.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">Hypothesizing versus HARKing</head><p>A related behavioral difference between the two conditions is the number of Pre-versus Post-Data Hypotheses. Recall that the former represent hypotheses a participant verbalizes before exposing the relevant data, whereas the latter reflect attempts to hypothesize after the results are known (sometimes referred to HARKing <ref type="bibr">[19]</ref>). Participants in the Prediction condition exhibited higher rates of Pre-Data Hypotheses (19.7% of total reactions) than in the Standard condition (14.9%). It appears that the prediction feature may have encouraged participants to frame their hypotheses prior to inspecting data, which could indicate more willingness to adopt a normative, confirmatory stance. That said, the rate of HARKing in the two conditions was quite similar, which suggests that both groups engaged in exploratory analyses, conceiving hypotheses after encountering patterns that seem interesting. Although HARKing is often seen as problematic <ref type="bibr">[27]</ref>, it is a perfectly reasonable outcome of exploratory analysis. However, it is vital to distinguish between hypotheses that are posited a priori from those that are formulated to fit observed data <ref type="bibr">[40]</ref>. To that end, giving people the opportunity to predict may serve to establish such distinction in visual analytics.</p><p>There is also evidence that participants in the Standard condition engaged in more exploratory behavior. For instance, the rate of Observations, which correspond to post-hoc patterns interpreted while examining the data, is approximately twice as high in the Standard condition (31.1%) as in Prediction (16.4%). Similarly, there were more brushing-and-linking interactions in Standard (33.6% of queries) than in Prediction (7.3%). Since brushing is often classified as an exploratory activity <ref type="bibr">[37,</ref><ref type="bibr">52]</ref>, the difference may reflect a focus on EDA in Standard. We speculate that those exploratory tendencies where moderated by the PredictMe feature.</p><p>Design implication: While it is not necessary nor desirable to restrict EDA in visual analytics, an opportunity exists to design more balanced systems that place equal emphasis on exploratory and confirmatory sensemaking. Belief elicitation could encourage participants to incorporate more confirmatory activities in their visual analysis. With proper distinction, balancing these two styles of analysis may allow for more normative visual inference. Externalizing prior beliefs could in turn reduce people's tendency to overinterpret the data. In effect, sketching one's predictions may serve a similar purpose to regularization in Bayesian inference and machine learning <ref type="bibr">[25,</ref><ref type="bibr">35]</ref>; by deliberately limiting learning to 'regular' data features that are within a well-informed prior distribution, one reduces the risk of overfitting and, potentially, the incidence of false discovery.</p><p>Interactive visualization systems can also be designed to actively facilitate proper inference. For example, systems could track user hypotheses, along with their history of data exposures. This analytic provenance can then be audited (either manually or by the system) to discriminate between hypotheses that were 'preregistered' before the results are known and those that were formed after. With this information, systems can provide feedback on the reliability of discoveries made in interactive analyses.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3">Reflections on Prior Beliefs</head><p>Externalizing beliefs and receiving visual feedback on the accuracy of those beliefs has been found to promote reflection in communicative visualizations <ref type="bibr">[20]</ref>. Our results suggest that those effects may generalize to visual analysis. Participants appeared to engage in this kind of reflection more frequently when given the opportunity to sketch their beliefs. Specifically, those in the Prediction condition declared that their beliefs did not match the data at a rate that is approximately 50% higher than in Standard. A possible explanation is that the former group, having created a concrete representation of their working knowledge, could more easily relate those beliefs to the data. By contrast, participants in the Standard condition were convinced that their beliefs were accurate 33% of the time, compared to only 23.5% in Prediction. Participants who predicted the data also expressed Surprise at roughly twice the rate. On the other hand, we found minimal differences in the rate of Belief Update between the two conditions. Such statements would reflect active attempts by participants to reformulate their knowledge or amend their beliefs in light of new or contradictory data.</p><p>Overall, belief elicitation may lead to more active processing of visualizations-an effect that appears to hold for communicative <ref type="bibr">[15,</ref><ref type="bibr">20]</ref> as well as analytical visualizations, as per this study. However, we saw no evidence that visualizing belief-data gaps would translate to outright conceptual change, as we had speculated based on early research in cognitive science <ref type="bibr">[7]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.4">Breadth of Analysis</head><p>While there appears to be cognitive benefits to externalizing one's beliefs in analytical visualizations, there are also potential side effects to be considered. Among those is a reduction in the number of unique queries; participants in the Prediction condition addressed 22.6 queries on average, whereas those in Standard managed 33.1. Those who externalized their priors also created fewer visualizations on average, with 1.1 charts per query in Prediction compared to 1.4 in Standard. There were also fewer brushing-and-linking events in Prediction. These interactions are often considered essential to exploratory visualization <ref type="bibr">[32,</ref><ref type="bibr">52]</ref>, with heightened exploration typically encouraged as a desirable benchmark <ref type="bibr">[24]</ref>. It seems, however, that prior elicitation may have a dampening effects on these behavioral and interaction markers. This effect could be attributable to the extra effort in drawing one's expectations in the Prediction condition or, alternatively, may reflect a deeper change to one's analysis behavior. For instance, a recent study suggests that being driven by a hypothesis may inadvertently reduce one's propensity to detect unexpected patterns in data <ref type="bibr">[51]</ref>.</p><p>On the other hand, belief externalization appears to encourage more thoughtful interaction with a visualization. For example, participants in the Prediction condition spent approximately equal amounts of time predicting (39.5 seconds on average) and looking at the data (46.3 seconds). Qualitatively, we observed participants carefully inspecting charts in the Prediction condition, paying close attention to outliers that deviate from their expectations. However, the increased focus on individual visualizations may impede wider exploration. This in turn could prevent people from noticing unexpected relationships or features. We find evidence of this phenomenon in the rate of Observations, which was approximately half as much in the Prediction condition as in Standard. Participants whose beliefs were elicited seemed more concerned with how their priors related to the data, than in discovering new patterns they had not thought about.</p><p>Design implication: A challenge for data analysts is to maintain a degree of skepticism while being open to seeing new patterns. An exploratory stance can help surface unexpected insights, but, at its extreme, may cause one to see spurious structures in random noise. A confirmatory approach, on the other hand, aids analysts in asking relevant questions and testing plausible hypotheses, but an emphasis on prior knowledge could also lead to confirmation bias. Designers of visual analytics tools have traditionally adopted a laissez-faire approach, providing analysts with maximum flexibility, but leaving them free to adopt their own strategies. We suggest that designers should think about how to actively foster a balanced analytic experience with their interaction design. A potential research avenue is to create models that can infer analyst intents, and accordingly provide feedback on their performance. Prior work, for instance, has proposed techniques for detecting certain cognitive biases in real-time <ref type="bibr">[49]</ref>. Similarly, it may be possible to utilize analyst prior beliefs to classify their behaviors on an exploratoryconfirmatory spectrum. With such classification, it may be possible to provide tailored feedback. For examples, systems can nudge users to explore outside the purview of their existing knowledge, if they seem to be following a purely confirmatory approach, and vice versa when they appear to adopt an overly aggressive exploratory strategy.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">LIMITATIONS AND FUTURE WORK</head><p>Our study provides a first look onto the effects of belief elicitation in open-ended visual analysis. However, there are several limitations that should be taken into considerations when interpreting our findings. First, this study is exploratory in nature; we specifically utilized a grounded-theory approach to observe participants and quantify their emerging analytic activities. Our findings are thus primarily data-driven and, therefore, should be considered preliminary. The generalizability of these insights should be validated in future confirmatory studies. Second, although our findings suggest differences in analytic behaviors between the two experimental conditions, the effects on the discovery process are still unclear. In particular, we did not seek to evaluate the correctness of insights reported by participants. We speculate that belief elicitation, combined with appropriate feedbacks, can decrease the incidence of false discovery in visual analytics. However, this and other hypothesized effects with respect to inference should be evaluated in future studies. Third, our subjects were limited by features available in PredictMe. For instance, the prototype did not allow participants to predict conditionally (e.g., by predicting for a subset of the data). Relatedly, our prototype did not enable participants to express their priors in the form of probability distributions as is typical in normative Bayesian inference. These limitations may have affected the way participants externalized their beliefs or their willingness to use this feature. Future work is needed to improve our design and test the effects of such improvements.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7">CONCLUSION</head><p>Interactive visualization tools are almost exclusively designed for exploratory data analysis. This narrow focus on data-driven sensemaking has led to little support for hypothesis-driven (i.e., confirmatory) analyses. We introduced PredictMe, a fully functional visualization tool that incorporates belief elicitation, in addition to supporting a range of traditional visualization features. We sought to understand how users behave in this kind of visual analytic environment. In an exploratory study, we compared this design to a Standard condition that mimics how existing visualizations work. Our results show noticeable differences in user behavior between the two conditions. Analysis of participants cognitive and interaction patterns suggest that users adopt a distinct analytic style, when given the opportunity to externalize and test the accuracy of their beliefs. This shift is marked by an increased confirmatory behavior and decreased exploration. Our findings indicate benefits but also suggest side effects and challenges to incorporating belief elicitation in general-purpose visual analytic tools. We discussed the implications for visualization design, and proposed future research directions.</p></div></body>
		</text>
</TEI>
