• Keine Ergebnisse gefunden

Weighing the visual evidence – Experiments on the perception of data graphs

N/A
N/A
Protected

Academic year: 2022

Aktie "Weighing the visual evidence – Experiments on the perception of data graphs"

Copied!
83
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Daniel Reimann

Weighing the visual evidence – Experiments on the perception of data graphs

Dissertation

Psychologie

(2)

Weighing the visual evidence – Experiments on the

perception of data graphs

Dissertation zur Erlangung des Doktorgrades der Naturwissenschaften (Dr. rer. nat.)

an der Fakultät für Psychologie der FernUniversität in Hagen

vorgelegt von Daniel Reimann

Hagen – 09.06.2020

Erstgutachter: Prof. Dr. Robert Gaschler Zweitgutachter: Prof. Dr. Roman Liepelt Disputation: 23.09.2020

(3)

Acknowledgements

First and foremost, I would like to thank my supervisor Robert Gaschler. Thank you so much for your continued support throughout the present dissertation project. Further thanks go to my collaborators Christine Blech, Nilam Ram, and André Schulz for providing significant contributions and interesting discussions. Additionally, I thank all my colleagues for creating such a pleasant and productive working atmosphere. Finally, I am grateful to have had the support from family and friends.

(4)

Publications

All manuscripts are submitted to peer-reviewed journals and currently under review.

1. Reimann, D., Blech, C., & Gaschler, R. (submitted). Visual model fit estimation in scatterplots and distribution of attention: Influence of slope and noise level

2. Reimann, D., Blech, C., Ram, N., & Gaschler, R. (submitted). Visual model fit estimation in scatterplots: Influence of amount and decentering of noise

3. Reimann, D., Schulz, A., & Gaschler, R. (submitted). Homophily at a glance – Visual homophily estimation in network graphs is robust under time constraints

(5)

Abstract

Data graphs are common in science and everyday life, providing a foundation for scientific conclusions and decision-making. Studies on the perception of graphs can improve the quality of these aspects. Two important tasks, visual estimation of model-data fit in scatterplots and visual estimation of homophily in network graphs, have not been well studied. In Experiments 1 – 4, basic aspects of visual estimation of model-data fit were investigated. Experiment 1 addressed the influence of noise and slope on fit estimation and deployment of attention. Noise had a stronger influence than slope and most of the fixations fell to the center of the plot. Experiment 2 showed that the model-data fit pattern in the center of the plot had a stronger weight in fit estimation than the left or right end of the plot.

Experiment 3 revealed that the relationship between statistical deviation of data points from the model line (noise) and subject misfit followed a negatively accelerated curve. Experiment 4 indicated that, compared to noise, peoples’ judgements are only slightly affected by a systematic deviation of the data points above or below the line. In Experiment 5, we

investigated whether laypeople are able to visually assess the degree of homophily in network graphs and how time pressure affects their judgements. The results indicate that they are able to assess it and that most of the information an individual can extract is gathered within the first 5 seconds. This research on the perception of data graphs has implications for science and public life.

Keywords: perception of data graphs, model-data fit, scatterplots, homophily assessment, network graphs

(6)

Table of contents

Introduction ... 6

Model-data fit in scatterplots ... 10

Homophily in network graphs ... 13

Summary and Discussion ... 14

Model-data fit in scatterplots ... 14

Homophily in network graphs ... 16

General Discussion ... 17

Stimulus type ... 17

Graphs without vs with content ... 18

Conclusion ... 19

References ... 20

Manuscript 1 ... 30

Manuscript 2 ... 54

Manuscript 3 ... 60

(7)

Introduction

Observation is considered as the most pervasive and fundamental practice of all modern sciences (Daston & Lunbeck, 2011). While early scientists like Galileo observed the world directly, for example by using a telescope to study objects in the universe (Brewer, 2012), current researchers often perceive their object of study indirectly through data graphs.

Evidence from science studies (Smith, Best, Stubbs, Archibald, & Roberson-Nay, 2002) suggests that the frequent usage of graphs serves as a key indicator of scientificity.

Accordingly, disciplines that are perceived as harder (i.e., more quantitative-oriented and more natural-science-oriented) devote more space in their journal articles to graphs and less space to tables with numerical information. The idea of replacing conventional tables of numbers with graphical methods has already been pursued with the invention of common graphs such as bar graph, line graph, and pie chart, by William Playfair (Tufte, 1983). Edward Tufte, one of the most influential writers in the field of data visualization, argued that well- designed graphs are the most powerful method to analyze and communicate statistical information. According to Ware (2020), a major benefit of data visualization is that, assuming it is presented well, large amounts of data can be quickly interpreted by the viewer. A further advantage of graphs, mentioned by the author, is the possibility to perceive unexpected patterns in the data. This approach is closely related to the field of exploratory data analysis in the vein of John Tukey (1977).

The beneficial properties of graphs can be derived from insights in multimedia research on processing of texts and pictures (e.g., Schnotz & Bannert, 2003; Zhao, Schnotz, Wagner, &

Gaschler, 2020). Whereas texts are descriptive representations that consists of symbols describing an object (e.g., definition of furniture), pictures are depictive representations that usually consists of iconic signs (e.g., picture of a chair). These two forms of representation have different functions and strengths: Descriptions support the construction of mental models more and have a higher representational power because they can be more abstract. Depictions, on the other hand, have a higher inferential power because they enable a better read off of information. Schnotz and Lowe (2008) mentioned that in data graphs, which are also called

“logical pictures”, the direct similarity to the object has been abandoned in favor of extracting abstract information that cannot be observed directly. In a recent publication (Schnotz &

Wagner, 2018), the definition of depictions has been extended to signs that are associated with their referent by similarity (e.g., realistic pictures) or by more abstract structural

(8)

commonality (e.g., graphs). Based on these considerations, it seems that data graphs utilize representational and inferential power simultaneously.

The study by Smith et al. (2002) suggested that perception of graphs is crucial for the researchers who analyze the data on the one hand, and on the other hand, for the information consumers of the graphs, such as other researchers, science journalists, and policy makers.

However, data graphs are not only utilized in scientific journals. Visual perception of data has long since become an integral part of everyday life and the trend toward more visual information is accelerating (Ware, 2020). The greatest consumption of graphs might occur when empirical results are presented to the public in television, newspapers, and social media.

Examples include the results of political elections, public reports on climate data, and epidemiological reports.

In line with the central assumption of Gibson's theory of affordances (1977) that visual perception of the environment leads to action, data graphs often build a foundation for decisions across the diverse fields of application. In science, such decisions might relate to theoretical conclusions about a particular subject. In practical cases, the consequences of graphical inferences can be very severe, such as deciding whether to evacuate a town before a hurricane strike (Padilla, Creem-Regehr, Hegarty, & Stefanucci, 2018). In this light, an accurate and fast perception of data graphs is crucial. Studies on the perception of graphs are important to avoid false conclusions and poor decisions, since judgements by the visual system can sometimes be inaccurate and systematically biased (Godau, Vogelgesang, &

Gaschler, 2016). Research on perception of graphs can inform about the ability of individuals to read graphs (graph literacy, Galesic & Garcia-Retamero, 2011) and can result in guidelines for graph design (e.g., Doherty & Anderson, 2009; Wilkinson, 2012).

An important distinction needs to be made between type of graph (e.g., bar graph) and task (e.g., comparing two values). A single graph can function well for one task but poorly for another task. For example, Wilkinson (2012) pointed out that bar graphs are convenient for comparing two values (comparative judgements) but less useful for proportion-of-whole statements (absolute judgements). While a perceptual bias might make a specific graph format unsuitable for one task, it might be irrelevant for a different task (cf. Ali & Peebles, 2013;

Godau et al., 2016). Hence, empirical research is needed to determine which data graph format suits which purpose. This endeavor is not only pursued in psychology, but also in other fields like statistics and computer science (Wilkinson, 2012). One practical reason for this might be the question of best choice in design properties and default settings of graphical outputs in statistical software. Importantly, considerations regarding the best design are not

(9)

always only based on studies on data graphs in particular, but also on the broad literature of human perception in general. Ware (2020) noted that every month several hundred papers are published, and that a high number of these articles have some relevance for data graphs.

A very useful contribution of the computer science community is the development of graph-specific taxonomies of tasks (e.g., Lee, Plaisant, Parr, Fekete & Henry, 2006 for network graphs; Sarikaya & Gleicher, 2018, for scatterplots). Such lists of frequently conducted tasks provide a good orientation of how people interact with a graph. However, often little is known on how an individual performs a task. Questions of this kind are particularly suitable for perceptual psychology. Some studies of that field revealed systematic task-specific biases in data graphs. For instance, Godau et al. (2016) showed that individuals tend to underestimate the mean across depicted values in bar charts.

Despite the vast amount of research on perception of data graphs across the different fields, there are two tasks relating to different types of data graphs that have been unjustifiably neglected in prior studies: (1) visual assessment of model-data fit in scatterplots and (2) visual assessment of homophily in network graphs. Figure 1 shows an example graph for each task. Fit estimations require the viewer to evaluate how well the data points fit to the specified model. Homophily, the tendency to connect more likely to similar individuals (McPherson, Smith-Lovin, & Cook, 2001) can be detected by comparing the number of links between clusters to the number of links within clusters (Meulemans & Schulz, 2015). Both tasks are not explicitly listed in the abovementioned task taxonomies but can be, as we shall see, relevant for experts and laypeople.

Visual model-data fit estimations and homophily assessment build the core elements of the present work. In sum, five experiments are reported. Manuscript 1 and Manuscript 2 include two experiments each on fit estimations. Manuscript 3 addresses the task of homophily assessment. Table 1 provides a brief overview of all experiments. In the following sections, I will highlight the relevance of each task and the rationale for the research program.

Afterwards, the main findings of the experiments will be discussed along with their theoretical implications. A general discussion will follow, including practical implications, the limitations of the conducted studies, and an outline for future research.

(10)

Figure 1. Example graphs for visual model-data fit estimation in scatterplots (left) and visual assessment of homophily in network graphs (right).

Table 1

Overview of the five experiments

Manuscript &

Experiment

Data graph Task Independent Variables Dependent Variables

Manuscript 1 Experiment 1

Manuscript 1 Experiment 2

Manuscript 2 Experiment 1

Manuscript 2 Experiment 2

Manuscript 3 Experiment 5

Scatterplot

Scatterplot

Scatterplot

Scatterplot

Node-Link Diagram

Model-Data Fit

Model-Data Fit

Model-Data Fit

Model-Data Fit

Homophily Assessment

Noise,

Slope of regression line

Noise,

Location of highest deviation Noise

Amount of noise, Decentering of noise

Duration of stimulus presentation, Layout

Estimated fit,

Deployment of attention Estimated fit

Estimated fit

Estimated fit

Accuracy of judgements

Note. The duration of stimulus was varied between subjects. All other independent variables were within-subject factors.

(11)

Model-data fit in scatterplots

Scatterplots have been labeled as the most useful invention in the history of data graphs (Friendly & Denis, 2005). The variety of possible tasks and designs was described by Sarikaya and Gleicher (2018). In their task taxonomy for scatterplots, the authors split 12 tasks into the three categories: (1) object-centric (for example identifying an object), (2) browsing (for example finding a cluster), and (3) aggregate-level (for example determining level of correlation). Besides observed data, scatterplots can also include predictions from a model (usually in form of a line) and thus allow for visual comparison between data and model. Visual correlation estimations (Doherty & Anderson, 2009) and determining the trend line (Collyer, Stanley, & Bowater, 1990) in scatterplots are related tasks. However, in many cases it is relevant to determine how well an already specified model (for instance resulting from a quantitative theory) fits to new data (Schunn & Wallach, 2005).

Visually assessing the fit between model and data in addition to statistical procedures is a task highly recommended in statistical literature. Wickham, Cook, and Hoffman (2015) pointed out that visual model descriptions help individuals to understand how a model summarizes data. It helps to understand how well the model-data fit is and whether the fit is uniformly good, or variably good within specific areas. The need for graphical fit inspections has been communicated to diverse professional disciplines. For example, the Engineering Statistics Handbook by the American National Institute of Standards and Technology (Guthrie, 2020) states that graphical methods for model validation have an advantage over numerical methods because they readily show a broad range of complex aspects between model and data. Numerical methods, in contrast, would often tend to be narrowly focused on a particular aspect of the relationship. A further benefit of graphical presentations of fit is that they do not require much statistical knowledge and can therefore be easier to understand for laypeople.

While many sources show the relevance of the task, less is known about how individuals perform the task. A few psychologists addressed the issue of perception of model- data fit. Brewer (2012) discussed the role of theory in perception on the basis of examples in history of science. In the selected cases, researchers directly observed nature and were influenced by existing theories. For example, Galileo mistakenly saw the rings of the Saturn as moons, because of the assumption that planets have moons. Yet Brewer pointed out that these examples are not representative of how researchers look at nature nowadays, as this is mainly done through the lens of data graphs. The author concluded that, because current

(12)

science mainly relies on data, we should now focus on the role of theory when perceiving and evaluating data. Adding the theoretical prediction to a plot is one way for taking a theory into account. More explicitly, Schunn and Wallach (2005) illustrated different visual goodness-of- fit displays in the context of cognitive simulation models that are used to explain human behavior. Overlaying model and data within one graph, as opposed to separate graphs, seems to enhance accuracy of point predictions. However, little is mentioned about the specific process of visual assessment, and empirical studies are absent. This lack of knowledge is in contrast to the broad knowledge on statistical procedures concerning modeling, which has been the subject of entire books (e.g., Burnham & Anderson; 2002, Hastie, Tibshirani, &

Friedman, 2009; Kaplan, 2009). Even for a simple linear model, several numerical measures exist such as R², representing the explained variance by the model, and RMSE, the root mean square error (Schunn & Wallach, 2005).

If visual methods are seen as important, additional tools to statistical methods (Wickham et al., 2015), we should know how well and how these tools work. Furthermore, the study by Smith et al. (2002), indicates that visual fit estimations are also important for the reader of scientific articles and thus, for scientific practice and communication. In fact, scatterplots including a representation of a theory have been displayed in several articles (e.g., Evans, Brown, Mewhort, & Heathcote, 2018; Gallistel, Fairhurst, & Balsam, 2004; Lee &

Anderson, 2001; Logan, 1988, 1992; Palmeri, 1997, 1999; Rickard, 1997, 2004). Therefore, it is likely that readers not only consider the numerical information in the text for their conclusions, but also the displayed information in the graphs.

With this in mind, the aim of the presented experiments was to gain knowledge about visual fit estimations. Since little is known, the research program approached the topic with fundamental basics, ranging from perception and attention to elements of psychophysics.

Taking into account the recommended principle of simplicity in theory building (Occam's razor, e.g., Chater & Vitányi, 2003), we focused on a simple linear model. This type served as a standard example in the literature on model-data fit (e.g., Roberts & Pashler, 2000; Schunn

& Wallach, 2005). The presented scatterplot in Figure 1 (left side) illustrates such a typical graph with a model line which we also used in our experiments. Noise was manipulated by varying the vertical distance of the data points to the model line. For each trial within the experiments, the participants saw one scatterplot and had to rate how well the data points fitted to the line on a scale from 0 (very bad) to 100 (very good). Across the experiments, there were 30 to 42 stimuli for each person.

(13)

In Experiment 1, the focus was on two fundamental and apparent features in visual displays of model and data: We varied noise and slope. We investigated how strongly people weigh noise and slope in fit estimations. Furthermore, eye-tracking has been used to capture deployment of attention. The experiment can also provide information about the relationship of visual estimations to frequently used statistical indices. For example, if the visual estimations are not affected by slope, the judgements would lean toward RMSE, because this index only takes the differences between observed and predicted values into account. A higher influence of slope, on the other hand, would favor R².

The stimuli in Experiment 1 had a homoscedastic fit pattern, i.e. the model-data fit at the beginning was equally good as the pattern at the end of the line. One main finding was that people had a tendency to look at the center of the plot. In Experiment 2, we tested whether this center bias leads to a higher weighing of the pattern in the center, when patterns of different areas vary. Therefore, we compared fit estimations of scatterplots that had more noise in the left side, the center, or the right side of the plot. In general, the experiment takes into account a common situation that models could have variably good fits within specific areas (Wickham et al., 2015).

Experiment 3 addressed the basic topic in psychophysics – the relationship between stimuli intensity and its perception (e.g., Fechner, 1860) in respect to data graphs containing a graphical representation of model and data. We quantified the relationship between the deviation of data points to the model (noise) and the perceived misfit between data and model.

While for different statistical fit indices, the function with which stronger deviance is translated into stronger misfit is a known and in part the central debated characteristic (e.g., Burnham & Anderson; 2002; Hastie et al., 2009; Kaplan 2009), the cost of misfit in visual fit estimation has not been investigated in depth. Psychophysics suggests that this cost function should be logarithmic. The experiment was set up to test this assumption on a qualitative level and to determine the specific parameters. The latter allows to gauge how strongly the visual cost function differs from a linear relationship between (a) the distance between data and theory line, within the graph, and (b) estimated (mis)fit.

Schunn and Wallach (2005) mentioned that visual displays of fit are also useful for diagnosing systematic biases in model predictions. In quantitative research, models with larger and symmetrical noise can be superior to models with lower but decentered noise.

Systematic over- or underestimation of a model has been an important issue in diverse contexts. For example, in regard to the question of the best function for the law of practice (Heathcote, Brown, & Mewhort, 2000), a systematic underestimation for the asymptote of

(14)

power functions was found. Experiment 4 provides information on how people perceive plots where the model line leads to systematic over- or underestimations. Besides level of noise, we also manipulated its horizontal symmetry (i.e., data points were shifted above or below the line).

While Experiments 1-4 dealt with how people compare graphically presented data with graphically presented theory in scatterplots, Experiment 5 introduces a different form of weighing the visual evidence. In this Experiment, participants visually estimated the extent of evidence for a specific property in (graphically represented) network data: homophily.

Homophily in network graphs

In social network analysis, homophily refers to the tendency of individuals to connect with similar people (McPherson, et al., 2001). Node-link diagrams can be used to visualize a network with people (nodes) and their connections (links). When nodes belong to different clusters (for example teams), the degree of homophily can be determined by comparing the number of links between nodes of the same cluster (same-cluster links) with the number of links between nodes of different clusters (cross-cluster links, Meulemans & Schulz, 2015).

The task of homophily assessment can be derived from task taxonomies for network graphs. Lee et al. (2006) list in their taxonomy four categories: topology-based tasks, attribute-based tasks, browsing tasks, and overview tasks. Topology-based tasks include aspects of connectivity such as identifying clusters and connected components. The overview approach is suited for attaining estimated values quickly and can be used to perform some topology tasks as well. Consequently, the task taxonomy of the authors not only provides relevant features for homophily assessment (comparing degrees of connectivity), but also implies the possibility of different strategies of homophily assessment. While the topology- based tasks likely involve controlled information processing (e.g., Schneider & Shiffrin, 1977), the overview strategy might predominantly rely on automatic/heuristic processing.

Estimating homophily, or more generally the degree of connectivity in networks, is an important task in many fields. For example, in the context of disease transmission, such information can help to make decisions about community-specific interventions (isolations etc.). Practical examples like these raise two questions. First, are laypeople able to visually assess the degree of homophily in network visualizations? So far, previous research only tested people with a background in mathematics or computer science (Meulemans & Schulz, 2015). Second, how do time constraints affect judgements of homophily? Often, decisions need to be made quickly. Little is known on how observers extract relevant information under

(15)

* Unpublished eye tracking data could not support this assumption based on the saccades: Angle of saccades was not affected by the slope of the line. Potentially, the line does not have to be fixated in order to compare it to data points.

time-pressure. The aim of Experiment 5 was to answer these questions. Figure 1 illustrates an example node-link diagram of the experiment. For each trial of experiments, the participants saw one diagram for either 5, 10, or 15 seconds. The task was to rate the degree of homophily on a scale from 0% (only cross-cluster links) to 100% (only same-cluster links). The number of stimuli was 40 for each participant.

Summary and Discussion

In the following sections, the main findings of the experiments in relation to their theoretical context are described. The first section is about model-data fit (Manuscript 1 and Manuscript 2) and the second section includes the work on homophily (Manuscript 3).

Model-data fit in scatterplots

The four Experiments on visual fit estimations in scatterplots revealed various insights.

As expected, Experiment 1 showed that the fit estimations between the model line and the data points were higher for plots with lower noise. Additionally, fit estimates were higher for plots with a steeper slope. An analysis at the individual level revealed that each participant rated the steepest slope higher than the shallowest slope. The influence of the slope regardless of noise indicates that people’s estimations do not simply follow the statistical RMSE estimation. Instead, estimations tended towards the explained variance. It is possible that participants used the shortest distance (perpendicular distance) between line and data points*.

Studies in which people needed to determine the trend line by eye showed that the constructed lines rather minimized the perpendicular distance instead of the vertical distance (Collyer et al., 1990; Mosteller, Siegel, Trapido & Youtz, 1981).

The eye tracking data of Experiment 1 revealed that most of the fixations fell into the center of the scatterplot and that this pattern was stronger for plots with shallower slopes. The attention to the center (center bias) is a well-known effect in literature on scene viewing (Bindemann, 2010; Tatler, 2007; Renswoude, Berg, Raijmakers, & Visser, 2019). A common explanation for this effect was the idea that it is simply a consequence of salient image features placed in the center of an image. This was disapproved in a study by Tatler (2007) who showed that the effect remains despite varying objects of interest in their location.

Instead, the center bias is rather explained by an optimal viewing position to effectively explore the environment (Tatler, 2007; Renswoude et al., 2019).

(16)

Experiment 2 suggested that the center bias leads to a higher weighing of the model fit pattern in the center. When the highest noise was placed in the center area, as opposed to the outer areas, estimations of fit were lower. Thus, different areas of a scatterplot seem to have a different impact on subjective fit estimations. This insight contributes to the claim by Tatler (2007) that future research should focus on the implications of the center bias. An interesting perspective on the effect of this bias on fit estimations can be derived from a view on perception by O’Regan (2011). In his conception, seeing is active “sampling” and interaction with the environment. Accordingly, similar to touching parts of an object with the fingertip, a viewer only effectively sees parts of a scene that are actively “manipulated” with the eyes.

Applying this idea to Experiment 2, participants could have predominately manipulated the central area of the plots. The fit pattern in this part, however, was not representative for the whole and could thus lead to a bias. Creating an analogy to the statistical concept of a sampling bias, where properties of the selected sample are not representative for the population, one could speak of a visual sampling bias. Further implications for other types of stimuli are described in the general discussion. By incorporating research like the center bias from scene perception (Tatler, 2007) and conceptions of seeing (O’Regan, 2011) into the perception of scatterplots, the idea by Ware (2020), that many articles on perception in general have some relevance to the perception of data graphs, is well illustrated.

Experiment 3 showed that fit estimations in scatterplots underlie known regularities in psychophysics (e.g., Fechner, 1860). We found a logarithmic relationship between the statistical deviation of data points to the model line (noise) and the perceived misfit between data points and model line. For lower noise, increase of subjective deviation of the data points to the model line was three times as high as for higher noise. The results have a practical implication concerning model comparisons. If individuals are more sensitive to changes in fit for lower noise, decisions between models with bad fits should rely less on visual fit estimations and more on statistical coefficients. This is in line with the assumption by Schunn and Wallach (2005) that the human visual system is not particularly accurate to assess small differences in model-data fit.

Experiment 4 took up the idea by Schunn and Wallach (2005) of visually assessing the fit in order to diagnose systematic biases in model predictions. The results indicated only a small influence of decentering (i.e., when the data points are above or below the line, perceived fit is reduced) but a strong influence of noise. The impact of decentering was weaker than desired. In quantitative theories, models with high but symmetrically distributed noise can be superior to models with lower but asymmetric noise. The weak effect of

(17)

decentering in Experiment 4 might also be affected by the strength of manipulation used:

Even at the highest level of decentering, a few points were still close to the line. Placing the points further away from the line, to an extent that it shifts all points away from it, could have caused a stronger effect.

Overall, the results of the presented research provide significant contributions to reduce the stated imbalance between the broad statistical knowledge on model-data fit estimations on the one side (Burnham & Anderson; 2002, Hastie et al., 2009; Kaplan, 2009) and the lack of empirical evidence on visual fit estimations on the other side (Brewer, 2012;

Schunn & Wallach, 2005).

Homophily in network graphs

The results of Experiment 5 indicate that laypeople are able to visually assess the degree of homophily from node-link diagrams. Given the high social relevance of network data, the results are reassuring. Importantly, most of the information a viewer can extract from the graphs are gained within the first five seconds. Additional time provides only little increase in the accuracy of the judgements. Relating this finding to the task taxonomy for network data by Lee et al. (2006), there is evidence for the usage of the overview task. Lee et al.

particularly emphasized that individuals could use this strategy of roughly estimating aspects of connectivity in situations with little time. In addition, in line with work by Meulemans and Schulz (2015), the bipartite layout seems to lead to judgements with higher accuracy, compared to the polarized layout. It seems that that the Gestalt laws of perceptual organization (Wertheimer, 1923) account for this advantage. The findings on time and layout have consequences for practical situations. Using the bipartite layout over polarized layouts for homophily judgements is recommended. An accurate perception forms the basis for the most reasonable approach in practical decisions. Additionally, since we now know that laypeople can perceive homophily in network graphs, it would not be a wrong decision to show those graphs to the public. In the course of this, one should bear in mind that most of the information can be extracted within the first seconds. If many graphs need to be shown in presentations, it is not necessary to invest several minutes for each graph.

(18)

General Discussion

In sum, the presented work contributes to a deeper understanding of how people weigh visual evidence – specifically with respect to (a) fit estimations in scatterplots and (b) homophily assessment in network graphs. The two tasks have been neglected in prior studies on the perception of data graphs. The experiments on fit estimations provide information of the task discussed by Wickham et al., (2015), Brewer (2012), and Schunn and Wallach (2005). The experiment on homophily estimations extends the work of Meulemans and Schulz (2015). On account of the scientific and social relevance of the investigated tasks, one theoretical implication of the experiments could be the inclusion of fit estimations and homophily assessment in future task taxonomies for scatterplots and network graphs. In the following sections, limitations and an outlook for future research will be described.

Stimulus type

In order to ensure generalization across stimuli (cf. Wells & Windschitl, 1999) in the conducted experiments, several point patterns for the scatterplots were generated, and for the network graphs, different sizes, degrees of homophily, and layouts were used. However, due to the diversity of the graphs in general, the variation of used stimuli was still restricted. In the case of scatterplots, only models with straight model lines were used. Many models in quantitative theories, for example, in the domains of learning and forgetting (Evans et al., 2018; Heathcote et al., 2000; Wixted & Ebbesen, 1997), have a nonlinear shape. Furthermore, there were only a limited number of data points in the graphs. In many practical cases, people see more data points and more diverse distributions. Regarding the network graphs, one limitation is that only clusters of equal sizes were used. Constellations of clusters with similar number of nodes can be relevant in many situations with rather artificial group arrangements, such as depicting the relation between different teams. For other cases, however, unequal sizes need to be considered, such as the assessment of homophily between one cluster representing a social minority and another cluster representing the majority. Future research should therefore focus on more complex theories, more data points and different patterns, as well as network graphs with clusters of different sizes.

Regarding the center bias in the context of data graphs, it would be important to know whether this tendency also occurs in other type of graphs and under which conditions it is problematic or not. One crucial aspect could also be the task of the viewer. As indicated in the introduction, some task taxonomies (e.g., Sarikaya & Gleicher, 2018) distinguish between

(19)

object-centric tasks and tasks on the aggregate-level. It is likely that tasks of the latter type are more sensitive to negative influences of the center bias. When information that needs to be taken into account for judgement is spread all over the plot, the tendency to look at the center might come with a different weighing of information in the center (as Experiment 2 suggested). Future studies should therefore focus on further combinations of graphs and tasks, where the center bias can lead to biased conclusions. Possible examples are judging the overall degree of uncertainty in graphs with multiple confidence intervals, the overall variance in graphs with multiple boxplots, or the average mean value across multiple bars in a bar chart.

Graphs without vs with content

In contrast to most applications of data graphs in realistic situations in science and science communication, where people are usually aware of what the data is about, the data graphs used in the present experiments had no semantic content. Neither the variables in the scatterplots nor the nodes and links in the network-graphs had a thematic meaning. This approach of leaving out the content is common in many studies on the perception of data graphs (e.g., Doherty & Anderson, 2009) and was already being used in early research on graphs by Cleveland and McGill (1984). An advantage of this method is that the gained insights are not restricted to a specific topic, allowing for a more data-driven perception (e.g., Freedman & Smith, 1996).

There are plausible reasons that, despite possible content-related experiences and expectations of viewers, insights from research without content can be applied to graphs with content. For example, work on Gestalt laws (Wertheimer, 1923) indicated, that even spending years of experience with a topic does not immunize against the principles of perceptual grouping. Gestalt laws have a significant role for data graphs (e.g., Ware, 2020) and are discussed in Manuscript 3 to explain the better performance of the bipartite layout in network- graphs. Furthermore, visual illusions such as the Müller-Lyer-Illusion (Müller-Lyer, 1889) do not disappear with knowledge of the phenomenon.

On the other hand, there is some evidence that expectations do play a role when people are looking at data graphs. For instance, it has been demonstrated that using theory-relevant axis labels in scatterplots that trigger prior knowledge and expectations bias correlation estimations (cf. Doherty & Anderson, 2009; Freedman & Smith, 1996). Hence, viewers of graphs who are convinced that at a specific variable could be well predicted by the model

(20)

could therefore come to better fit judgements. Similarly, viewers of network-graphs could be biased in their judgements by presumptions about the connectivity of groups.

However, whether adding content to a graph really changes perception or just changes the drawn inferences is not always clear and relates to a recent debate (cf. Firestone & Scholl, 2016) about the existence of top-down effects of cognition on perception. Whereas a large amount of research (e.g., Balcetis & Dunning, 2010; Bruner & Goodman, 1947; Phelps, Ling

& Carrasco, 2006; Proffitt, 2006) claimed that our beliefs, motivations, emotions, and bodily states determine what we directly see, Firestone and Scholl (2016) argued that the common arguments within that research underlie a variety of pitfalls such as not taking into account the difference between perception and judgement.

Either way, a valuable approach for future research in the context of data graphs could be to analyze how sensitive specific types of graphs and tasks are to top-down effects. Brewer (2012) suggested that the influence of top-down effects increase with weak bottom-up information. It is plausible that some object-centric tasks, such as reading off the value of a single bar in a bar chart, could be less sensitive to prior beliefs than recognizing patterns, such as assessing the degree of cluster separation in scatterplots (cf. Valdez, Ziefle, & Sedlmair, 2018). As a result, the more sensitive graphs and tasks are to top-down effects, the less they should be used as a foundation for decision-making.

Conclusion

Overall, the presented research can be a first step to fill the gap of knowledge of two important tasks in the perception of data graphs: visual model-data fit estimation in scatterplots and visual assessment of homophily in network graphs. Observation is considered a fundamental practice of all modern sciences (Daston & Lunbeck, 2011) and current researchers make observations indirectly through data graphs (Smith at al., 2002; Brewer, 2015). Hence, studies on the perception of data graphs like the present work have the potential to improve a fundamental practice of current science and its consumers: weighing the visual evidence.

(21)

References

Note: * indicate references that were only quoted in the synopsis text

*Ali, N., & Peebles, D. (2013). The effect of gestalt laws of perceptual organization on the comprehension of three-variable bar and line graphs. Human Factors, 55(1), 183–

203. https://doi.org/10.1177/0018720812452592

Anderson, N. D., & Gleddie, C. (2013). Comparing sensitivity to facial asymmetry and facial identity. I-Perception, 4(6). https://doi.org/10.1068/i0604

*Balcetis, E., & Dunning, D. (2010). Wishful seeing: More desired objects are seen as closer.

Psychological Science, 21(1), 147–152. https://doi.org/10.1177/0956797609356283 Bergstrom, C. T., & West, J. D. (2018). Why scatter plots suggest causality, and what we can

do about it. ArXiv, abs/1809.09328. https://arxiv.org/abs/1809.09328

*Bindemann, M. (2010). Scene and screen center bias early eye movements in scene viewing.

Vision Research, 50(23), 2577–2587. https://doi.org/10.1016/j.visres.2010.08.016 Bobko, P., & Karren, R. (1979). The perception of pearson product moment correlations from

bivariate scatterplots. Personnel Psychology, 32(2), 313–325.

https://doi.org/10.1111/j.1744-6570.1979.tb02137.x

Bogen, J., & Woodward, J. 1992. Observations, theories and the evolution of the human spirit, Philosophy of Science 59, 590–611. https://doi.org/10.1086/289697 Brewer, W. F. (2012). The theory ladenness of the mental processes used in the scientific

enterprise: Evidence from cognitive psychology and the history of science. In R. W.

Proctor & E. J. Capaldi (Eds.), Psychology of science: Implicit and explicit processes Psychology of science: Implicit and explicit processes (pp. 289–334).

New York, NY: Oxford University Press.

https://doi.org/10.1093/acprof:oso/9780199753628.003.0013

*Bruner, J. S., & Goodman, C. C. (1947). Value and need as organizing factors in perception.

The Journal of Abnormal and Social Psychology, 42(1), 33–44.

https://doi.org/10.1037/h0058484

*Burnham, K. P., & Anderson, D. R. (2002). Model selection and multimodel inference.

(2nd ed.). Springer Verlag. New York.

(22)

Centola, D. M. (2013). Homophily, networks, and critical mass: Solving the start-up problem in large group collective action. Rationality and Society, 25(1), 3–40.

https://doi.org/10.1177/1043463112473734

Cleveland, W. S., Diaconis, P., & McGill, R. (1982). Variables on scatterplots look more highly correlated when the scales are increased. Science, 216, 1138–1141.

https://doi.org/10.1126/science.216.4550.1138

Cleveland, W. (1984). Graphs in scientific publications. The American Statistician, 38(4), 261–269. https://doi.org/10.2307/2683400

*Cleveland, W., & McGill, R. (1984). Graphical perception: Theory, experimentation, and application to the development of graphical methods. Journal of the American Statistical Association, 79(387), 531–554. https://doi.org:10.2307/2288400

*Chater, N., & Vitányi, P. (2003). Simplicity: A unifying principle in cognitive science?

Trends in Cognitive Sciences, 7(1), 19–22. https://doi.org/10.1016/S1364- 6613(02)00005-0

Cohen, J. (1988). Statistical power analysis for the behavioral sciences. New York:

Routledge, https://doi.org/10.4324/9780203771587

*Collyer, C. E., Stanley, K. A., & Bowater, C. (1990). Psychology of the scientist: LXIII Perceiving scattergrams: Is visual line fitting related to estimation of the correlation coefficient? Perceptual and Motor Skills, 71(2), 371–378.

https://doi.org/10.2466/PMS.71.5.371-378

*Daston, L., & Lunbeck, E. (2011). Histories of scientific observation. Chicago (Ill.):

University of Chicago press.

Doherty, M. E., Anderson, R. B., Angott, A. M., & Klopfer, D. S. (2007). The perception of scatterplots. Perception & Psychophysics, 69(7), 1261–1272.

https://doi.org/10.3758/BF03193961

Doherty, M. E., & Anderson, R. B. (2009). Variation in scatterplot displays. Behavior Research Methods, 41, 55–60. https://doi.org/10.3758/BRM.41.1.55

Duclos, R. (2015). The psychology of investment behavior: (De)biasing financial decision- making one graph at a time. Journal of Consumer Psychology, 25(2), 317–325.

https://doi.org/10.1016/j.jcps.2014.11.005

(23)

Evans, N. J., Brown, S. D., Mewhort, D. J. K., & Heathcote, A. (2018). Refining the law of practice. Psychological Review, 125(4), 592–605.

https://doi.org/10.1037/rev0000105

Fechner, G.T. (1860). Elemente der Psychophysik. Bd. I & II. Leipzig: Breitkopf & Härtel

*Firestone, C., & Scholl, B. (2016). Cognition does not affect perception: Evaluating the evidence for “top-down” effects. Behavioral and Brain Sciences, 39, 1–77.

https://org.doi:10.1017/S0140525X15000965

*Freedman, E., & Smith, L. (1996). The role of data and theory in covariation assessment:

Implications for the theory-ladenness of observation. The Journal of Mind and Behavior, 17(4), 321–343.

Friendly, M., & Denis, D. (2005). The early origins and development of the scatterplot.

Journal of the History of the Behavioral Sciences, 41, 103–130.

https://doi.org/10.1002/jhbs.20078

*Galesic, M., & Garcia-Retamero, R. (2011). Graph literacy: A cross-cultural comparison.

Medical Decision Making, 31(3), 444–457.

https://doi.org/10.1177/0272989X10373805

Gallistel, C. R., Fairhurst, S., & Balsam, P. (2004). The learning curve: Implications of a quantitative analysis. Proceedings of the National Academy of Sciences of the United States of America, 101(36), 13124–13131.

https://doi.org/10.1073/pnas.0404965101

Gaschler, R., Marewski, J. N., & Frensch, P. A. (2015). Once and for all—How people change strategy to ignore irrelevant information in visual tasks. The Quarterly Journal of Experimental Psychology, 68(3), 543–567.

https://doi.org/10.1080/17470218.2014.961933

*Gibson, J. J. (1977). The theory of affordances. In R. Shaw & J. Bransford (Eds.),

Perceiving, acting, and knowing: Toward an ecological psychology (pp. 67–82).

Hillsdale, NJ: Erlbaum.

Gneezy, U., & Rustichini, A. (2000). Pay enough or don’t pay at all. Quarterly Journal of Economics, 115(3), 791–810. https://doi.org/https://academic.oup.com/qje/issue

(24)

Godau, C., Vogelgesang, T., & Gaschler, R. (2016). Perception of bar graphs – A biased impression? Computers in Human Behavior, 59, 67–73.

https://doi.org/10.1016/j.chb.2016.01.036

Green, L., & Mehr, D. R. (1997). What alters physicians’ decisions to admit to the coronary care unit? The Journal of Family Practice, 45(3), 219–226.

*Guthrie, W. F. (2020). Process Modeling. In NIST/SEMATECH e-Handbook of Statistical (4). Retrieved from: https://www.itl.nist.gov/div898/handbook/pmd/pmd.htm Haider, H., & Frensch, P. A. (1999). Information reduction during skill acquisition: The

influence of task instruction. Journal of Experimental Psychology: Applied, 5(2), 129–151. https://doi.org/10.1037/1076-898X.5.2.129

Haider, H., Frensch, P. A., & Joram, D. (2005). Are strategy shifts caused by data-driven processes or by voluntary processes? Consciousness and Cognition: An

International Journal, 14(3), 495–519.

https://doi.org/10.1016/j.concog.2004.12.002

Hansen, S. M., Haider, H., Eichler, A., Godau, C., Frensch, P. A., & Gaschler, R. (2015).

Fostering formal commutativity knowledge with approximate arithmetic. PloS one, 10(11), e0142551. https://doi.org/10.1371/journal.pone.0142551

Harrison, L., Yang, F., Chang, R., & Franconeri, S. (2014). Ranking visualizations of correlation using weber’s law. IEEE Transactions on Visualization & Computer Graphics, 20(12), 1943–1952. https://doi.org/10.1109/TVCG.2014.2346979 Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data

mining, inference and prediction (2nd ed.). New York: Springer.

http://dx.doi.org/10.1007/978-0-387-84858-7

Heathcote, A., Brown, S., & Mewhort, D. J. K. (2000). The power law repealed: The case for an exponential law of practice. Psychonomic Bulletin & Review, 7, 185–207.

https://doi.org//10.3758/BF03212979

Henry, N., Fekete, J.-D., & McGuffin, M. J. (2007). NodeTrix: A hybrid visualization of social networks. IEEE Transactions on Visualization & Computer Graphics, 13(6), 1302–1309. https://doi.org/10.1109/TVCG.2007.70582

(25)

Holten, D. (2006). Hierarchical Edge Bundles: Visualization of adjacency relations in hierarchical Data. IEEE Transactions on Visualization & Computer Graphics, 12(5), 741–748. https://doi.org/10.1109/TVCG.2006.147

Holten, D., Isenberg, P., van Wijk J. J., & Fekete, J. (2011). An extended evaluation of the readability of tapered, animated, and textured directed-edge representations in node- link graphs," 2011 IEEE Pacific Visualization Symposium, 2011, 195–202.

Huang, W., Eades, P., & Hong, S.-H. (2009). Measuring effectiveness of graph visualizations:

A cognitive load perspective. Information Visualization, 8(3), 139–152.

https://doi.org/10.1057/ivs.2009.10

*Kaplan, D. (2009). Statistical modeling: a fresh approach. Retrieved from http://project- mosaic-books.com/?page_id=13

Kobourov, S. G., McHedlidze, T., & Vonessen, L. (2015). Gestalt principles in graph drawing. Lecture Notes in Computer Science Graph Drawing and Network Visualization, 558–560. https://doi.org/10.1007/978-3-319-27261-0_50 Kubovy, M., & van den Berg, M. (2008). The whole is equal to the sum of its parts: A

probabilistic model of grouping by proximity and similarity in regular patterns.

Psychological Review, 115(1), 131–154. https://doi.org//10.1037/0033- 295X.115.1.131

Lauer, T. W., & Post, G. V. (1989). Density in scatterplots and the estimation of correlation.

Behaviour & Information Technology, 8, 235–244.

https://doi.org/10.1080/01449298908914554

*Lee, F. J., & Anderson, J. R. (2001). Does learning a complex task have to be complex?: A study in learning decomposition. Cognitive Psychology, 42(3), 267–316.

https://doi.org/10.1006/cogp.2000.0747

Lee, B., Plaisant, C., Parr, C.S., Fekete, J.-D., & Henry, N. (2006). Task taxonomy for graph visualization. BELIV, 1–56. https://doi.org/10.1145/1168149.1168168

*Logan, G. D. (1988). Toward an instance theory of automatization. Psychological Review, 95(4), 492–527. https://doi.org/10.1037/0033-295X.95.4.492

*Logan, G. D. (1992). Shapes of reaction-time distributions and shapes of learning curves: A test of the instance theory of automaticity. Journal of Experimental Psychology:

(26)

Learning, Memory, and Cognition, 18(5), 883–914. https://doi.org/10.1037/0278- 7393.18.5.883

McPherson, M., Smith-Lovin, L., & Cook, J. M. (2001). Birds of a feather: Homophily in social networks. Annual Review of Sociology, 27, 415–444.

https://doi.org/10.1146/annurev.soc.27.1.415

Meulemans, W., & Schulz, A. (2015). A tale of two communities: Assessing homophily in node-link diagrams. In E. Di Giacomo & A. Lubiw (Eds.), Lecture Notes in Computer Science. Graph Drawing and Network Visualization, 9411, 489–501.

Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-27261- 0_40

Meyer, J., & Shinar, D. (1992). Estimating correlations from scatterplots. Human Factors, 34, 335–349. https://doi.org/10.1177%2F001872089203400307

Meyer, J., Taieb, M., & Flascher, I. (1997). Correlation estimates as perceptual judgments.

Journal of Experimental Psychology: Applied, 3(1), 3–20.

https://doi.org//10.1037/1076-898X.3.1.3

*Mosteller, F., Siegel, A., Trapido, E., & Youtz, C. (1981). Eye fitting straight lines. The American Statistician, 35(3), 150–152. https://doi.org//10.2307/2683983

*Müller-Lyer, F. (1889). Optische Urteilstäuschungen. Archiv für Physiologie Suppl. 263–270

*O’Regan, J. K. (2011). Why red doesn’t sound like a bell: Understanding the feel of consciousness. Oxford: Oxford University Press.

Osborne, J. W., & Waters, E. (2002). Four assumptions of multiple regression that researchers should always test. Practical Assessment, Research & Evaluation, 8(2).

https://doi.org/10.7275/r222-hv23

Pachella, R. G. (1974). The interpretation of reaction time in information-processing research.

In B. H. Kantowitz (Ed.), Information processing: Tutorials in performance and cognition (pp. 41–82). Hillsdale, NJ: Erlbaum.

Padilla, L. M., Creem-Regehr, S. H., Hegarty, M., & Stefanucci, J. K. (2018). Decision making with visualizations: A cognitive framework across disciplines. Cognitive Research: Principles and Implications, 3, 29. https://doi.org/10.1186/s41235-018- 0120-9

(27)

*Palmeri, T. J. (1997). Exemplar similarity and the development of automaticity. Journal of Experimental Psychology: Learning, Memory, and Cognition, 23(2), 324–354.

https://doi.org/10.1037/0278-7393.23.2.324

Palmeri, T. J. (1999). Theories of automaticity and the power law of practice. Journal of Experimental Psychology: Learning, Memory, and Cognition, 25(2), 543–551.

https://doi.org/10.1037/0278-7393.25.2.543

Pitt, M. A., Myung, I. J., & Zhang, S. (2002). Toward a method for selecting among computational models for cognition. Psychological Review, 109, 472–491.

https://doi.org/10.1037/0033-295x.109.3.472

*Phelps, E. A., Ling, S., & Carrasco, M. (2006). Emotion facilitates perception and

potentiates the perceptual benefits of attention. Psychological Science, 17(4), 292–

299. https://doi.org/10.1111/j.1467-9280.2006.01701.x

*Proffitt, D. R. (2006). Embodied perception and the economy of action. Perspectives on Psychological Science, 1(2), 110–122. https://doi.org/10.1111/j.1745-

6916.2006.00008.x

Raab, M., & Gigerenzer, G. (2015). The power of simplicity: A fast-and-frugal heuristics approach to performance science. Frontiers in Psychology, 6. 1672.

https://org.doi:10.3389/fpsyg.2015.01672.

Rensink, R. A., & Baldridge, G. (2010). The perception of correlation in scatterplots.

Computer Graphics Forum, 29(3), 1203–1210. https://doi.org/10.1111/j.1467- 8659.2009.01694.x

Rensink, R. A. (2014). On the prospects for a science of visualization. In W. Huang (Ed.), Handbook of Human Centric Visualization: Theories, Methodologies, and Case Studies. New York: Springer. pp. 147–175

Rensink, R. A. (2017). The nature of correlation perception in scatterplots. Psychonomic Bulletin & Review, 24, 776–797. https://doi.org/10.3758/s13423-016-1174-7

*Renswoude, D. R., Berg, L., Raijmakers, M. E. J., & Visser, I. (2019). Infants’ center bias in free viewing of real-world scenes. Vision Research, 154, 44–53.

https://doi.org/10.1016/j.visres.2018.10.003

(28)

*Rickard, T. C. (1997). Bending the power law: A CMPL theory of strategy shifts and the automatization of cognitive skills. Journal of Experimental Psychology: General, 126(3), 288–311. https://doi.org/10.1037/0096-3445.126.3.288

*Rickard, T. C. (2004). Strategy execution in cognitive skill learning: An item-level test of candidate models. Journal of Experimental Psychology: Learning, Memory, and Cognition, 30(1), 65–82. https://doi.org/10.1037/0278-7393.30.1.65

Roberts, S., & Pashler, H. (2000). How persuasive is a good fit? A comment on theory testing.

Psychological Review, 107, 358–367. https://doi.org/10.1037/0033-295x.107.2.358 Rule, N. O., Ambady, N., & Hallett, K. C. (2009). Female sexual orientation is perceived

accurately, rapidly, and automatically from the face and its features. Journal of Experimental Social Psychology, 45(6), 1245–1251.

https://doi.org/10.1016/j.jesp.2009.07.010

Sarikaya, A., & Gleicher, M. (2018). Scatterplots: tasks, data, and designs. IEEE Transactions on Visualization & Computer Graphics, 24(1), 402–412.

https://doi.org/10.1109/TVCG.2017.2744184

Schneider, W., & Shiffrin, R. M. (1977). Controlled and automatic human information processing: I. Detection, search, and attention. Psychological Review, 84(1), 1–66.

https://doi.org/10.1037/0033-295X.84.1.1

Schnotz, W., & Bannert, M. (2003). Construction and interference in learning from multiple representation. Learning and Instruction, 13, 141–156.

https://doi.org/10.1016/S0959-4752(02)00017-8

*Schnotz, W., & Lowe, R. (2008). A unified view of learning from animated and static graphics. In R. Lowe & W. Schnotz (Eds.), Learning with animation: Research implications for design. (pp. 304–356). Cambridge University Press.

*Schnotz, W., & Wagner, I. (2018). Construction and elaboration of mental models through strategic conjoint processing of text and pictures. Journal of Educational

Psychology, 110(6), 850–863. https://doi.org/10.1037/edu0000246

Schunn, C., & Wallach, D. (2005). Evaluating goodness-of-fit in comparison of models to data. In W. Tack (Ed.), Psychologie der Kognition: Reden und Vorträge anlässlich der Emeritierung von Werner Tack (pp. 115–154). Saarbrücken, Germany:

University of Saarland Press.

(29)

Smith, L. D., Best, L. A., Stubbs, D. A., Johnston, J., & Archibald, A. B. (2000). Scientific graphs and the hierarchy of the sciences: A Latourian survey of inscription practices. Social Studies of Science, 30, 73–94.

https://doi.org/10.1177/030631200030001003

Smith, L. D., Best, L. A., Stubbs, D. A., Archibald, A. B., & Roberson-Nay, R. (2002).

Constructing knowledge: The role of graphs and tables in hard and soft psychology.

American Psychologist, 57, 749–761. https://doi.org/10.1037/0003-066X.57.10.749 Tatler, B. W. (2007). The central fixation bias in scene viewing: Selecting an optimal viewing position independently of motor biases and image feature distributions. Journal of Vision, 7 (14): 4, 1–17. https://doi.org/10.1167/7.14.4

*Tufte, E. R. (1983). The visual display of quantitative information, Cheshire, Conn.:

Graphics Press.

*Tukey, J. W. (1977). Exploratory data analysis. Reading, MA: Addison-Wesley

*Valdez, A. C., Ziefle, M., & Sedlmair, M. (2018). Priming and anchoring effects in visualization. IEEE transactions on visualization and computer graphics, 24 (1), 584–594. https://doi.org/10.1109/TVCG.2017.2744138

Wagemans, J. (1997). Characteristics and models of human symmetry detection. Trends in Cognitive Sciences, 1(9), 346–352. https://doi.org/10.1016/S1364-6613(97)01105-4 Wagenmakers, E.-J. (2003). How many parameters does it take to fit an elephant? Journal of

Mathematical Psychology, 47, 580–586. https://doi.org/10.1016/S0022- 2496(03)00064-6

*Ware, C. (2020). Information visualization: Perception for design. Elsevier Science &

Technology.

*Wells, G. L., & Windschitl, P. D. (1999). Stimulus sampling and social psychological experimentation. Personality and Social Psychology Bulletin, 25(9), 1115–1125.

https://doi.org/10.1177/01461672992512005

Wertheimer, M. (1923). Untersuchungen zur Lehre von der Gestalt. II. Psychologische Forschung, 4, 301–350. https://doi.org/10.1007/BF00410640

Wickham, H., Cook, D., & Hofmann, H. (2015). Visualizing statistical models: removing the blindfold. Statistical Analysis and Data Mining, 8(4), 203–225.

https://doi.org/10.1002/sam.11271

(30)

*Wilkinson, L. (2012). Graphic displays of data. In H. Cooper, P. M. Camic, D. L. Long, A.

T. Panter, D. Rindskopf, & K. J. Sher (Eds.), APA handbook of research methods in psychology, Vol 3: Data analysis and research publication. (pp. 73–100). American Psychological Association. https://doi.org/10.1037/13621-004

Wixted, J. T., & Ebbesen, E. B. (1997). Genuine power curves in forgetting: A quantitative analysis of individual subject forgetting functions. Memory & Cognition, 25(5), 731–739. https://doi.org/10.3758/BF03211316

Yang, F., Harrison, L. T., Rensink, R. A., Franconeri, S. L., & Chang, R. (2019). Correlation judgment and visualization features: a comparative study. IEEE Transactions on Visualization & Computer Graphics, 25(3), 1474–1488.

https://doi.org/10.1109/TVCG.2018.2810918

*Zhao, F., Schnotz, W., Wagner, I., & Gaschler, R. (2020). Texts and pictures serve different functions in conjoint mental model construction and adaptation. Memory &

Cognition, 48(1), 69–82. https://doi.org/10.3758/s13421-019-00962-0

(31)

Manuscript 1

Reimann, D., Blech, C., & Gaschler, R. (submitted). Visual model fit estimation in scatterplots and distribution of attention: Influence of slope and noise level

(32)

Visual model fit estimation in scatterplots and distribution of attention:

Influence of slope and noise level

Daniel Reimann, Christine Blech, Robert Gaschler

Author Note

Daniel Reimann, Christine Blech, Robert Gaschler, FernUniversität in Hagen, Hagen, Germany.

Correspondence concerning this article should be addressed to Daniel Reimann,

Department of Psychology, FernUniversität in Hagen, Universitätsstraße 33, D-58097 Hagen, Germany. E-mail: daniel.reimann@fernuni-hagen.de

(33)

Abstract

Scatterplots are ubiquitous data graphs and can be used to depict how well data fit to a quantitative theory. We investigated which information is used for such estimates. In

Experiment 1 (N = 25), we tested the influence of slope and noise on perceived fit between a linear model and data points. Additionally, eye tracking was used to analyze deployment of attention. Visual fit estimation might mimic one or the other statistical estimate: If

participants were influenced by noise only, this would suggest that their subjective judgment was similar to RMSE. If slope was relevant, subjective estimation would mimic variance explained. While the influence of noise on estimated fit was stronger, we also found an influence of slope. As most of the fixations fell into the center of the scatterplot, in Experiment 2 (N = 51) we tested whether location of noise affects judgment. Indeed, high noise influenced the judgment of fit more strongly if it was located in the middle of the

scatterplot. Visual fit estimates seem to be driven by the center of the scatterplot and to mimic variance explained.

Keywords: fit estimations, perception, data graphs, scatterplots

(34)

Introduction

In many professional activities, such as quantitative research and engineering, people need to determine how well data points fit to a theoretical prediction. For instance, in science studies (e.g., Brewer, 2012), it has been pointed out that researchers hardly observe their object of interest directly and hardly ever check their theoretical predictions against direct observations. Instead they make observations by using data graphs for gaining an overview on relevant characteristics of the object of interest and for testing predictions. Bogen and

Woodward (1992) suggested that it is data rather than perceptual beliefs that plays a central evidential role to current science. Thus, while the early study of perception and psychophysics had been driven by the challenges early astronomers faced (e.g., Brewer, 2012), turning away from direct observation and engaging with processed data instead might suggest that

perceptual limitations no longer limit science. However, Brewer and others argued that we now need to focus on better understanding how evidence is perceived against the background of theories in data graphs. Using graphs – as opposed to tables with numbers – allows us to harvest the computational power of the visual system to apprehend relations with little effort (e.g., Schnotz & Bannert, 2003). In some cases, however, the estimates the visual system provides are systematically biased (cf. Godau, Vogelgesang, & Gaschler, 2016).

Given the substantial share data graphs take in conducting and communicating research (cf. Smith, Best, Stubbs, Archibald, & Robertson-Nay, 2002; Smith, Best, Stubbs, Johnston,

& Archibald, 2000), we need to better understand how people use data graphs to weigh scientific evidence and theories. The Smith et al. studies suggest that data graphs in part replace data being presented numerically in tables. Particularly in the natural sciences, readers of articles might by a large share use data graphs to judge the fit between theory and data.

Such a visual judgment of fit (in addition to indices) is warranted. For instance, in the literature on skill acquisition different variants of chunking-based learning could be pinned down to the prediction of a power-law vs. a negatively accelerated exponential learning curve

(35)

(e.g., Evans, Brown, Mewhort, & Heathcote, 2018; Heathcote, Brown, & Mewhort, 2000).

Exclusively considering fit indices could cause the observer to overlook important

information that would otherwise be apparent in the graph: One theory might systematically overestimate or underestimate the asymptote (cf. Palmeri, 1999).

Visual model descriptions are recommended in statistical literature to enable subjective fit estimations as adjuncts to numerical summaries (e.g., Wickham, Cook, & Hofmann, 2015).

Many studies (e.g., Evans et al., 2018) use scatterplots to visually present a quantitative theory and its associated data. Scatterplots have been described as the most useful invention in the history of data graphs and have found their way into the public sphere through public media (Bergstrom & West, 2018; Friendly & Denis, 2005).

While there is a long tradition of research on fit indices quantifying how well data fit a theoretical prediction (e.g., Pitt, Myung, & Zhang, 2002; Roberts & Pashler, 2000;

Wagenmakers, 2003), we know little on how fit is estimated by viewers when prediction and data are displayed visually in scatterplots. In case of a simple linear model, first insights might be derived from studies on how individuals estimate correlations from scatterplots.

Some studies suggested that people use the shortest (perpendicular) distance between data point and the regression line (90° angle) rather than the vertical distance (parallel to the y-axis) for estimations of correlations (e.g., Meyer, Taieb, & Flascher, 1997; Yang, Harrison, Rensink, Franconeri, & Chang, 2019). This visual approach is in contrast to most statistical procedures that usually rely on the vertical distances. A range of studies (cf. Doherty &

Anderson, 2009) identified properties of a scatterplot which are unrelated to the statistical correlation but influence its perceived strength. Accordingly, factors influencing the judgments include properties of the axis (scaling, theory-relevance of the labels), the point cloud (density, shape, size and number of the points, presence of outliers), and the regression line (mere presence, slope). Several of these aspects can be manipulated by the many design choices in scatterplots (cf. Sarikaya & Gleicher, 2018).

(36)

The influence of slope on correlation estimation was investigated by Meyer and Shinar (1992) and Meyer et al. (1997). Scatterplots with shallower slopes consistently generated higher estimates. The authors explained this effect as a side effect of their approach in manipulating the slope. In order to change the slope of the regression line, they changed the scales of the axes (without changing the data), which led to a lower density of the point cloud for steeper slopes. Consequently, in scatterplots with a steeper slope, the vertical distances of the data points to the line increased.

Another feature that can affect perception of correlation is the mere presence of the regression line. Several studies (Meyer & Shinar, 1992; Meyer et al. 1997) demonstrated that its presence can lead to higher estimations of association. The regression line might serve as a perceptual center that increases perceived correlation. Due to the consistent finding that correlations are usually underestimated (Bobko & Karren, 1979; Cleveland, Diaconis, &

McGill, 1982; Lauer & Post, 1989; Rensink, 2017), adding the regression line as a default setting in scatterplots is often recommended (e.g., Doherty & Anderson, 2009).

All mentioned studies on correlation provide valuable information about the perception of data in regard to an important numerical measure of goodness of fit (r). In many practical situations of perceiving scatterplots with model lines, viewers are not instructed to view the graphs in relation to a particular fit coefficient, thus allowing for a rather intuitive grasp of fit.

This is also relevant for laypeople who may not be familiar with statistical coefficients. For example, instead of estimating the variance explained by the model relative to overall variance, one might only be interested in how much the data points deviate from the model line (noise). This would be in line with the root mean square error (RMSE) (e.g., Schunn &

Wallach, 2005).

Taken together, insights from correlation research may only have restricted validity for perception of model-data fit in general. Yet, they may generate some ideas for research. The aim of the present study was to investigate if and how strongly slope and noise affect

Referenzen

ÄHNLICHE DOKUMENTE

Moreover in order to examine other distances in graphs (or more formally, molecular graphs), Klein and Randi´c [3] considered the resistance distance be- tween vertices of a graph

As the unlabelled group did not know that the data was about wind energy supply and thus, stances towards wind energy could not affect the graph evaluation, fluctuation

Bob’s strategy is to force Alice to leave an uncolored vertex v dangerous, such that he can color the remaining uncolored neighbors of v with b new distinct colors which

If we want to proof that a certain graph is not in the minimal basis of irreducible graphs for the Klein surface, we will most of the times try to find a cycle including the

Two different approaches, reconstructive oral history and digitised analysis, are discussed with a view to understanding the contribution of overseas trained doctors to

each vertex stores linked list of incident edges (outgoing edges in directed graph). edges are not

In this thesis, two main topics stand central: the effects of action on visual perception and motor learning of a-typical movements. Here, the general and most important conclusions

Owing to the broad scope of the paper, we present the main features (strengths and weaknesses) of the techniques as far as they were discussed by the authors of the papers