← Back

DNA interaction and cytotoxicity studies of ruthenium(III) complexes containing 3-(benzothiazol-2-yliminomethyl)-naphthalen-2-ol ligand

UC Riverside UC Riverside Electronic Theses and Dissertations Title Developmental Changes in Prioritization of Visual Attention to Features and Goal-Directed Manual Actions Permalink https://escholarship.org/uc/item/7pw2j191 Author Kadooka, Kellan Publication Date 2022 Peer reviewed|Thesis/dissertation eScholarship.org Powered by the California Digital Library University of California UNIVERSITY OF CALIFORNIA RIVERSIDE Developmental Changes in Prioritization of Visual Attention to Features and Goal-Directed Manual Actions A Dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Psychology by Kellan Kadooka September 2022 Dissertation Committee: Dr. John M. Franchak, Chairperson Dr. Chandra Reynolds Dr. Rebekah Richert Copyright by Kellan Kadooka 2022 The Dissertation of Kellan Kadooka is approved: Committee Chairperson University of California, Riverside Acknowledgments I would like to express my gratitude to my advisor, John Franchak. This would not be possible without your wisdom, support, patience, and pragmatism. Thank you. iv Dedicated to my family and friends who supported me. To my parents: Words cannot express my gratitude for your encouragement, unconditional love, and all the lessons you’ve taught me along the way. To my friends: Thank you for pushing me to the finish line, especially Juan and Jaz. To Koa: You are the best boy and we did it! v ABSTRACT OF THE DISSERTATION Developmental Changes in Prioritization of Visual Attention to Features and Goal-Directed Manual Actions by Kellan Kadooka Doctor of Philosophy, Graduate Program in Psychology University of California, Riverside, September 2022 Dr. John M. Franchak, Chairperson Across three studies, this dissertation evaluates how different influences on visual attention in dynamic scenes develop over infancy and childhood. Features of a visual scene are typically considered either bottom-up (visually salient factors) or top-down (meaningful semantic factors) influences. The first study (Chapter 2) tested whether influences of visual attention developmentally shift from bottom-up to top-down. Attention to visually salient locations and faces was measured across a wide age range and a wide set of video stimuli to operationalize bottom-up and top-down influences. Results indicate that attention does not shift from primarily bottom-up to top-down; attention to salient areas and faces were similar across ages for most stimulus videos. In considering the dynamic nature of attention, the second study (Chapter 3) measured developmental change in attention to hand and handobject actions. By measuring attention in ways that are sensitive to movement of features in the scene over time, I found age-related increases in looking to hands and hand-object actions. Age differences suggest attention develops by more mature observers increasingly prioritizing meaningful information from moment to moment. The final study (Chapter 4) vi investigated the role of comprehension of actions while 4-year-olds viewed a video of a person performing a sequence of goal-directed movements. I manipulated children’s prior visual experience with a novel action to assess whether comprehension of manual action changes visual attention, hypothesizing that previous experience viewing the action sequence would bolster children’s comprehension. However, results showed no differences in attention to hand-object actions regardless of prior visual experience. Influences of visual attention are complex, but refining our perspective and methodological approach is important for characterizing the development of visual attention. vii Contents List of Figures xi List of Tables xiii 1 Introduction 1.1 Visual attention in adults . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Visual attention in infants and children . . . . . . . . . . . . . . . . . . . . 1.2.1 Developmental changes in attention to bottom-up and top-down features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.2 Comprehension influences attention . . . . . . . . . . . . . . . . . . 1.2.3 Adult synchrony . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Development of attention towards meaning in hands . . . . . . . . . . . . . 1 2 8 8 12 13 15 2 Developmental changes in infants’ and children’s attention to faces and salient regions vary across and within video stimuli 18 2.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.2 Developmental changes in infants’ and children’s attention to faces and salient regions vary across and within video stimuli . . . . . . . . . . . . . . . . . . 20 2.2.1 Faces and salient locations attract adults’ attention . . . . . . . . . 22 2.2.2 Evidence for and against a global developmental change in visual attention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.3 Current study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.4 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.4.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.4.2 Stimuli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.4.3 Apparatus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.4.4 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.4.5 Data processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.5.1 No consistent age differences in face looking or gaze saliency across stimuli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.5.2 Within-stimulus variability moderates age differences in visual attention 40 viii 2.6 2.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 2.6.1 Lack of global changes in attention . . . . . . . . . . . . . . . . . . . 49 2.6.2 Development of visual attention involves changes in prioritizing features 53 2.6.3 Implications for attention development and media viewing . . . . . . 57 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3 Attention to hands during manual actions account for developmental increases in attentional synchrony 59 3.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.2 Attention to hands during manual actions account for developmental increases in attentional synchrony . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.2.1 Developmental changes in synchronization of attention . . . . . . . . 62 3.2.2 Age-related changes in attention to hands and hand-object actions . 65 3.2.3 Current Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 3.3.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 3.3.2 Stimuli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 3.3.3 Apparatus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 3.3.4 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 3.3.5 Data processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 3.4.1 Attentional synchrony with adults increases with age . . . . . . . . . 78 3.4.2 Hand synchrony predicts adult synchrony . . . . . . . . . . . . . . . 80 3.4.3 Attentional hand-object synchrony predicts adult synchrony . . . . . 81 3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 3.5.1 Limitations and future direction . . . . . . . . . . . . . . . . . . . . 86 4 Attentional synchrony when viewing manual actions 4.1 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Attentional synchrony when viewing manual actions . . . . . . . . . . . . . 4.3 Developmental increases in adult synchrony . . . . . . . . . . . . . . . . . . 4.4 Comprehension drives prioritization of meaning . . . . . . . . . . . . . . . . 4.5 Attending to and comprehending manual actions . . . . . . . . . . . . . . . 4.6 Manipulating comprehension and attention in manual actions through action experience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7 Current Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.2 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.3 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.4 Data Processing and Measures . . . . . . . . . . . . . . . . . . . . . 4.9 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.10 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.10.1 Limitations and future direction . . . . . . . . . . . . . . . . . . . . ix 88 89 90 91 93 94 96 97 98 98 99 101 104 105 106 109 5 Conclusions 112 References 120 x List of Figures 2.1 2.2 2.3 2.4 2.5 2.6 3.1 Changes in face looking as a function of age for all seven videos. Linear and logarithmic functions are plotted for each stimulus video. . . . . . . . . . . 37 Changes in gaze saliency as a function of age for all seven videos. Linear and logarithmic functions are plotted for each stimulus video. . . . . . . . . . . 39 Windowed analyses of age differences in face looking over the duration of all 7 videos. Age was analyzed as a continuous variable, but for illustration purposes age was averaged into three groups (infants: 6-24 months; children: 2-10 years; adults: 18-22 years). Colored vertical bars represent strength and direction of correlation between age and face looking for every window. Darker colors indicate stronger correlations. No data are plotted for the first 5 windows of Video 5 because no faces were present during that portion of the video. Insets depict examples of 3 individual windows to show a negative correlation, positive correlation, or no correlation between age and face looking with age represented as a continuous predictor. . . . . . . . . . 43 (A) Observed distribution and (B) randomized null distribution of correlations between age and face looking for each window aggregated across videos. Vertical black lines mark the 95% range of expected correlations in the null distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Windowed analyses of age differences in gaze saliency by video stimulus. Age was analyzed as a continuous variable, but for illustration purposes age was averaged into three groups (infants: 6-24 months; children: 2-10 years; adults: 18-22 years). Colored vertical bars represent strength and direction of correlation between age and gaze saliency for every window. Darker colors indicate stronger correlations. . . . . . . . . . . . . . . . . . . . . . . . . . . 46 (A) Observed distribution and (B) randomized null distribution of correlations between age and gaze saliency for each window aggregated across videos. Vertical black lines mark the 95% range of correlations in the null distribution. 48 Exemplar screenshots and brief description of the five stimuli videos . . . . xi 72 3.2 3.3 3.4 3.5 4.1 4.2 4.3 Exemplar visualization of times when hands and hand-object actions are visible for two agents in Video 3. Blue horizontal bars indicate when hands are visible for each hand. Green horizontal bars indicate when hand-object actions are visible for each hand. . . . . . . . . . . . . . . . . . . . . . . . . Changes in adult synchrony as a function of age. For comprehensibility, logarithmic functions are plotted for each stimulus video . . . . . . . . . . . Relationship between hand synchrony and adult synchrony. Each circle represents a single participants observation for one video stimulus. The overall effect of hand synchrony on adult synchrony is plotted in black. For additional comprehensibility, correlations are plotted for each stimulus video . . Relationship between hand-object synchrony and adult synchrony. Each circle represents a single participants observation of one video stimulus. The overall correlation is plotted in black. . . . . . . . . . . . . . . . . . . . . . . Items Used in Target Action . . . . . . . . . . . . . . . . . . . . . . . . . . . Example frame from the Stimulus Video . . . . . . . . . . . . . . . . . . . . Relationship between hand-object synchrony and adult synchrony when viewing target action video . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii 76 79 81 82 100 101 107 List of Tables 2.1 2.2 2.3 3.1 3.2 3.3 Sample size (n) and smallest effect size (r ) that could be detected with 80% power for each video . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Regression parameters for linear and logarithmic age-related changes in face looking and gaze saliency for each stimulus video. . . . . . . . . . . . . . . . Generalized Estimating Equation Wald’s χ2 for effects of window, age, and age×window for each video stimulus. . . . . . . . . . . . . . . . . . . . . . . Percentage of time hands and hand-object actions were visible for each video Comparison of linear mixed-effect model predicting attentional synchrony with adults (adult synchrony) from log-transformed age and hand synchrony. Random effects of subject and video estimate standard deviation (SD) of parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Linear mixed-effect model comparison predicting attentional synchrony with adults (adult synchrony) from three models. Log-transformed age, hand synchrony, and hand-object synchrony as fixed effects. Random effects of subject and video estimate standard deviation (SD) of parameters. . . . . . xiii 29 40 47 76 80 83 Chapter 1 Introduction This dissertation examines the development of visual attention in the context of viewing dynamic visual scenes. Visual attention is cognitive process involved in the selection of visual information from the environment (Oakes & Amso, 2018). I will begin by reviewing two broad categories of attentional influences that are defined by theories of visual attention: low-level bottom-up factors and semantically-related top-down factors. Some accounts of developmental change in visual attention suggest that there is an age-related shift from bottom-up to top-down factors, whereby young infants’ attention is primarily influenced by bottom-up factors but later in development becomes primarily influenced by top-down factors. However, alternative accounts suggest that instead of a general shift, what develops is refinement in understanding how different features change in importance from moment to moment. Studying how attention relates to meaning in scenes is a difficult methodological problem. Consequently, I discuss how the use of a particular measure, attentional synchrony, which is the convergence of eye gaze in time and space across multiple individuals, can reveal 1 developmental changes in attention that are sensitive to the dynamic nature of meaning. I conclude by proposing that hands and hand-object actions are meaningful features that potentially influence visual attention. 1.1 Visual attention in adults The visual world is complex and dynamic. Yet, only a small fraction of the visual world can receive our attention at any moment. In order to allocate overt visual attention— that is, where our eyes are pointed—we must choose where to direct eye gaze from moment to moment to center the high-acuity region of vision (fovea) to information of interest. A central issue in theories of visual attention and this dissertation concerns how we guide our attention when viewing complex, dynamic scenes—scenes where the contents are continually changing—and the developmental progression of attention allocation within dynamic scenes. When there are multiple places to look at any moment, observers must choose to look at one location at the expense of visually attending to other locations, potentially missing important information conveyed at unattended locations. Thus, it is important to understand which features in a scene (whether bottom-up or top-down) influence where observers choose to look, because it underlies their ability to gather information from within a scene. How these influences change over development may potentially explain differences in what infants, children, and adults glean when watching dynamic scenes based on differences in visual attention. Theories of visual attention describe two main categories of features that influence human gaze. Bottom-up features influence attention by standing out from the scene based 2 on low-level visual properties of the stimulus, like color or motion (Borji & Itti, 2013; Itti & Koch, 2000). Top-down features receive attention due their semantic meaning based on knowledge, experience, or goals (Yarbus, 1967; Tatler, Hayhoe, Land, & Ballard, 2011). I will first describe these factors as they pertain to adult attention before moving on to a developmental account of attention to both types of influence. Bottom-up features Bottom-up features are typically defined as simple image properties contained within a visual stimulus. For instance, contrast (Parkhurst & Niebur, 2004), color (Jost, Ouerhani, Wartburg, Müri, & Hügli, 2005), and motion (Mital, Smith, Hill, & Henderson, 2011) are features that attract attention by “popping out” from the rest of the stimulus. Collectively, bottom-up features contribute to visual salience. Using biologically-inspired algorithms to compute these features from pixel-level image data, researchers can create predictive ‘maps’ that represent relative values of ‘pop out’ in the stimuli. Values of color, intensity (amount of luminance), and orientation (direction of lines and edges) allow for each pixel to be evaluated relative to the pixels in the rest of the image. Areas of the map with a greater salience value are predicted to receive more attention than areas with low salience based on bottom-up attention. Indeed, when comparing adult eye movements to salience maps of scene images, adults tend to look at locations that contain higher salience (Peters, Iyer, Itti, & Koch, 2005; Parkhurst & Niebur, 2004). Salience maps are most predictive of attention during adults’ first few fixations when inspecting a photograph (Parkhurst & Niebur, 2003). However, dynamic scenes (videos) consist of ever-changing information, so observers must actively distribute eye gaze to attend to new information. By calculating 3 frame-to-frame changes in the intensity and orientation within a video stimulus, salience maps can generate predictions that include values of flicker and motion, respectively. In studies using dynamic stimuli, attention is predicted by salience maps at better-than-chance levels (Franchak, Heeger, Hasson, & Adolph, 2016; T. J. Smith & Mital, 2013). Bottom-up features are considered exogenous factors, which means they occur externally to the observer. In fact, one methodological benefit of using salience maps is that all features can be computed with no intervention from a human (Mital et al., 2011; Parkhurst & Niebur, 2004). Some researchers argue that bottom-up features are sufficient for modeling attention during free viewing (Itti & Koch, 2000; Zelinsky & Bisley, 2015). Free viewing is an experimental task in which participants are asked to simply view a stimulus with no specific instructions or objective. Modeling attention in free viewing is of particular interest as all three studies contained within this dissertation require participants to freely view stimulus videos. However, bottom-up features may not sufficiently explain attention in all free viewing tasks because computational salience from pixel level data cannot account for semantic information that might guide attention (Võ & Henderson, 2009). For instance, comparing semantically compatible and semantically incompatible scenes shows how prior knowledge guides looking. A semantically compatible scene could show a fork on a dinner table, while a semantically incompatible scene could show a toothbrush on a dinner table. An observer’s knowledge of the semantic mismatch of a toothbrush on a dinner table may violate expectations and influence more attention towards the incompatibility. In studies that record eye movements when viewing scenes that contain semantically compatible and incompatible 4 information, salience is not able to account for increased attention to incompatible objects that violate observers’ expectations (Helo, van Ommen, Pannasch, Danteny-Dordoigne, & Rämä, 2017; Võ & Henderson, 2009). Such failures of a purely bottom-up approach have led others to theorize that top-down features are more influential on visual attention, which I will review in the next section. Top-down features Attention to top-down features is driven by semantic relevance based on the knowledge, goals, and experience of an observer. Because observers attribute meaning to these features, they are considered endogenous rather than exogenous factors. As described briefly in the prior section, semantic information plays a role in free viewing tasks (Võ & Henderson, 2009). Beyond semantic compatibility, attention during free viewing is often allocated to socially-relevant features like bodies, hands and facial regions (Birmingham, Bischof, & Kingstone, 2008, 2009; Franchak et al., 2016; Frank, Vul, & Johnson, 2009; Foulsham, Walker, & Kingstone, 2011; Võ, Smith, Mital, & Henderson, 2012). Social features are considered top-down because they are meaningful due to the information that they convey regardless of any explicit task demands. Socially-relevant features provide information about a social agent’s intentions, actions, emotions, communication and may serve as a ‘default’ area of attention when free viewing (Birmingham et al., 2009). An observer’s tasks and goals are another top-down influence on attention that attribute meaning to areas in a scene that are relevant to the observer. In eye tracking studies where participants are asked to complete screen-based tasks involving memorization or searching, visual behavior is deployed to serve the given task, for instance, looking longer 5 at objects during memorization (Castelhano, Mack, & Henderson, 2009). In real-world tasks that require participants to perform more complex actions, attention is allocated spatially and temporally to the relevant locations of the task being completed (Ballard & Hayhoe, 2009; Land, 2009). For instance, when the task involves making a sandwich, eye movements cluster around task-relevant locations such as current and future objects that are manipulated by hands (Hayhoe, Shrivastava, Mruczek, & Pelz, 2003). From this perspective, goals and task demands can influence visual attention towards areas that are meaningful for planning and execution of motor behaviors. But even when just watching others perform manual actions, observers tend to look at similar task-relevant areas, despite not engaging in the action, because observers recognize the goals of other agents (Flanagan & Johansson, 2003). Collectively, features that signal semantic relevance within a scene are indicators of meaning. A relatively novel approach to predicting adult attention involves creating ‘meaning maps’ based on the semantic relevance of many smaller overlapping regions, or patches, of a static scene (Henderson & Hayes, 2018). Meaning maps are generated by crowd-sourced semantic ratings of image patches. When applied to the image, these ratings create a spatial distribution of what adults have rated as the most semantically meaningful. When those ratings are combined into a meaning map across the entire image, patches with higher meaning ratings are more likely to attract eye movements. More recently, Rehrig and colleagues (2020) have generated ‘grasping maps’ by crowd-sourcing ratings of ‘graspability’ for small patches of the scene instead of more generic “meaning” ratings. These ‘grasping maps’ can accurately predict adults’ visual attention to scenes in which they have been 6 asked to describe possible actions. Meaning and grasping maps support the notion that visual attention is guided by what observers deem to be meaningful. While meaning maps have not been directly applied to dynamic videos, it is likely that a driving force in visual attention is to select areas that convey the most meaning. Combining bottom-up and top-down features The prior sections indicate that both bottom-up and top-down features can predict adult visual attention. However, it is important to compare these influences and to understand how they may interact. Firstly, that both bottom-up and top-down features can predict attention suggests that both feature types overlap. For instance, while faces may typically be termed a top-down feature because they convey meaning; salience calculations indicate that face regions often contain high levels of visual salience (Henderson, Brockmole, Castelhano, & Mack, 2007; Torralba, Oliva, Castelhano, & Henderson, 2006; Wass & Smith, 2015). The overlap between bottom-up and top-down is exacerbated with the inclusion of dynamic channels in salience algorithms. Top-down features like objects and people are likely to move within a scene. When salience algorithms are applied to dynamic scenes, there is often overlap in what is considered salient and semantically meaningful (Mital et al., 2011). The overlap makes it difficult to delineate the individual contribution of each if bottom-up and top-down features are not mutually-exclusive influences on attention. Secondly, direct comparisons between salience maps and top-down models of adult attention typically indicate that top-down models outperform salience models. Adult attention is better predicted using models that are based on faces (Frank et al., 2009; Rider, Coutrot, Pellicano, Dakin, & Mareschal, 2018) and meaning (Henderson & Hayes, 7 2018). However, research from my collaborators and I beyond this dissertation indicate that salience is still important. For instance, when adults viewed a narratively incoherent vignette with scenes out of order, salience was a stronger predictor of attention compared to the attention of adults who viewed a narratively coherent video (Jing, Kadooka, Franchak, & Kirkorian, in press). Attention to salience may be a strategy to find meaning by reverting to low-level visual information when top-down features are less available. Moreover, redundancy of bottom-up and top-down factors may help viewers to allocate attention more effectively by allowing for cue combination. Recently, we found that bottom-up salience and top-down biases towards the center of a screen, in combination, cue adults to look at faces (Franchak & Kadooka, 2022). 1.2 Visual attention in infants and children 1.2.1 Developmental changes in attention to bottom-up and top-down features In the developmental attention literature, there is prevalent support for an age- related shift in the influences of attention from bottom-up factors to top-down factors, starting during infancy (Frank, Amso, & Johnson, 2014; Kwon, Setoodehnia, Baek, Luck, & Oakes, 2016; Frank et al., 2009; Gluckman & Johnson, 2013; Amso, Haas, & Markant, 2014; Rider et al., 2018). Theories of visual attention development suggest the shift involves improvements in voluntary control of gaze, which is the ability to intentionally orient attention (Colombo, 2001; Oakes & Amso, 2018). More specifically, young infants’ attention is viewed as obligatorily ’stimulus-driven’, meaning that areas of high visual salience cap- 8 ture infants’ attention (Colombo, 2001; Stechler & Latz, 1966). Age-related improvements in voluntary control allow infants to increasingly inhibit attention to salient features, and instead voluntarily select where to look. If their attention is not captured by bottom-up features, infants can then choose to direct visual attention towards top-down features that are defined by infants’ own goals and prior knowledge (Oakes & Amso, 2018). Importantly, if development changes whether attention is involuntarily captured by bottom-up features versus free to voluntarily attend to meaningful, top-down features, it suggests that differences in attention to features should be consistently observed regardless of the stimulus. I refer to this idea as the ‘Global Shift Hypothesis’ in Chapter 2, where ’global’ means that infants should consistently attend to bottom-up features regardless of the stimulus, whereas older participants should consistently attend more to top-down features regardless of the stimulus. Yet, this hypothesis has not been exhaustively tested. Although the prediction states that over development observers should increasingly attend to topdown features generally, past work testing developmental changes in top-down attention have focused on measuring attention to faces, specifically (Amso et al., 2014; Franchak et al., 2016; Frank et al., 2009; Kwon et al., 2016; Rider et al., 2018). By only considering faces as a top-down feature, development of attention to other semantically-meaningful top-down features is ignored. As discussed previously, there are also methodological issues with considering attention to faces as purely a top-down feature since faces are also visually salient (Wass & Smith, 2015). Consequently, evidence for the Global Shift Hypothesis is mixed. 9 Supporting the Global Shift Hypothesis, general increases in attention to faces are reported in the first year of life and beyond. Kwon and colleagues (Kwon et al., 2016) found that the attention of 4-month-olds was biased towards salient images in an image array, however, by 8 months, attention was preferentially allocated to faces even when more salient targets were available. In dynamic videos, a similar pattern has been observed in which a model based on salience is better able to predict the attention of 3-month-olds, but the attention of 6-month-olds, 9-month-olds, and adults was better predicted by a facelooking model (Frank et al., 2009). Both findings support the idea that younger infants’ attention is captured by salient areas, but older infants can inhibit looking at salient areas to voluntarily look towards more meaningful areas (e.g., faces). Improvements in faceprocessing abilities in the first year of life may aid in discrimination and selection of faces (Farzin, Hou, & Norcia, 2012; Pascalis, de Haan, & Nelson, 2002). As described by Colombo (2001), infants younger than one year old undergo improvements in orienting and voluntary control of attention. Developmental changes in attention abilities may bias attention to faces by helping to select semantically-relevant features. However, not all published studies have found consistent findings in support of a shift from bottom-up to top-down attention with age. In two separate studies, infants ranging from 3-30 months old and children from 6-8 years old exhibited age-related decreases in looking at faces for particular stimuli (Frank et al., 2009; Stoesz & Jakobson, 2014). The introduction of multiple agents in the scene moderated the influence of face looking for younger observers. Franchak et al. (2016) reported a similar effect of scene content, in which scenes with multiple agents (compared with a single agent) changed face looking behavior 10 for adults, but not young infants. Taken together, age-related biases in attention to features do not appear consistent and may be idiosyncratic to the content of a scene. Wider testing of diverse stimuli is needed to determine the presence of developmental changes that are invariant to stimuli effects (i.e., ’global’). I address the question of a global shift in Chapter 2. The Global Shift Hypothesis is motivated by the theory that voluntary control of attention becomes increasingly endogenous (Colombo, 2001; Kwon et al., 2016; Oakes & Amso, 2018), but a global shift is not the only way that increasingly endogenous attention may manifest. However, predicting a global shift in looking to different types of features may be an oversimplification of how attention is allocated. The results of the study in Chapter 2 suggest this is the case. The studies in Chapters 3 and 4 explore an alternative hypothesis: Increasing voluntary control of attention over development may lead to increases in prioritizing meaningful areas. That is, meaningfulness in a scene is not solely defined by a specific type of feature at all times. Faces may convey meaning at one second, but a gesturing hand may convey more meaning in the next second. Thus, better voluntary control of attention should manifest as increases in attention to meaning, which may appear as many different types of features. An observer with mature visual attention should be able to flexibly attend to any of the features that convey meaning from moment to moment. This presents a challenge to measuring meaning, given that singular features are unable to serve as locations of expected meaning for the entire duration of a stimulus or across many different stimuli. In the following sections, I provide evidence in support of this alternative approach to developmental changes in attention and how meaning may be measured. 11 1.2.2 Comprehension influences attention If meaningful features guide attention, then developmental differences in under- standing semantic information should impact visual attention. For instance, an infant that does not comprehend the meaning or semantic relevance of a TV remote may view a cluttered living room differently than an adult. Here, I consider the role of comprehension as an influence on attention that might change over development. Observers distribute their attention towards information that conveys semantic relevance or meaning. Attention to scenes that violate expectations of semantic relevance can reveal the semantic expectations of an observer. For instance, adults comprehend the patterns of regularities in our environment which help direct their gaze (Helo et al., 2017). When presented with scenes containing irregularities (e.g. a bar of soap on a kitchen table), adults but not 2-year-olds were sensitive to the irregularity and looked longer to the irregular objects. Preschool-aged children watching incomprehensible TV show content exhibit shorter durations of looking and lower levels of overall attention compared to comprehensible scenes (Anderson, Lorch, Field, & Sanders, 1981). In this case, incomprehensible content is less meaningful and should receive less attention. However, the ability to modulate attention based on meaning requires a bare minimum of noticing differences in the first place. Infants less than a year old spend the same amount of time looking to comprehensible and incomprehensible scenes (Pempek et al., 2010). Along with increases in voluntary control of attention, developmental changes in the ability to comprehend information about a visual scene may drive observers to better attend to meaningful information. 12 1.2.3 Adult synchrony In characterizing how visual attention develops, measurements of adult visual at- tention allow for comparisons along the expected age-related changes in attention to features that are expected to be meaningful. As an example, if a researcher expects developmental changes in attention to meaningful objects, the researcher must create, code, and calculate a measurement to index attention to objects and determine whether there are developmental changes. The difficulty of a single-feature approach is that comparisons require a different measurement for each feature and potentially new coding schemes for each stimulus. Further, concerns of validity are warranted for measurements that could be biased by a researcher’s judgements of when a feature is meaningful. Prior sections in this review describe effects of idiosyncrasies in stimuli content (Frank et al., 2009; Franchak et al., 2016) and comprehension (Helo et al., 2017; Pempek et al., 2010) on attention. Taken together, adult visual attention is characterized by many different dynamic features but all in service toward guiding attention to information that the observer determines is most meaningful from moment to moment. In response to the methodological challenge of indexing meaning, I propose the use of adult synchrony. Adult synchrony is the spatiotemporal consistency between an observer compared to adults based on a correlation of their eye movements when watching the same stimulus. By comparing an observer to adults, adult synchrony scores provide a metric of attention that captures how adult-like an observer’s eye gaze is. Correlations closer to one indicate more adult-like attention. Franchak et al. (2016) showed that age predicted adult synchrony. Younger infants (6 months) had lower adult synchrony scores, whereas toddlers 13 (24 months) showed greater synchrony with adults. This suggests that adult synchrony can measure the progression towards more mature gaze by capturing the extent to which observers prioritize where to look in a way that is similar to adults. Why is there an age-related increase towards greater adult synchrony? Adults may share similar concepts of what is important and meaningful in a dynamic scene. As a consequence, adult synchrony is sensitive to the spatiotemporal changes in meaning that occurs in a video from moment to moment. A face may be meaningful to look at briefly, but the meaningfulness may shift toward a held object in the next moment. A model based purely on a single feature, like looking to faces, would not capture this change in meaning. A consistent finding in adult eye behaviors studies is that adults tend to look towards similar locations in time when viewing dynamic stimuli (Dorr, Martinetz, Gegenfurtner, & Barth, 2010; Franchak et al., 2016; Hart et al., 2009; Mital et al., 2011; Shepherd, Steckenfinger, Hasson, & Ghazanfar, 2010; Wang, Freeman, Merriam, Hasson, & Heeger, 2012). Highly correlated adult eye movements are consistent even across wide ranges of stimuli involving live-action, animated, and professionally produced films (Dorr et al., 2010; Gannon & Grubb, 2022; Goldstein, Woods, & Peli, 2007; Shepherd et al., 2010; Franchak et al., 2016; Rider et al., 2018). Age-related increases towards more consistent adult gaze (less spatial variability) is supported by a few studies, (Frank, Vul, & Saxe, 2012; Kirkorian, Anderson, & Keen, 2012) but we are among the first with Franchak et al. (2016) to specifically use adult synchrony as a general similarity index of attention and test factors that may predict developmental changes in attention. 14 Adult synchrony provides a measurement that is specific to a single stimuli and is agnostic to constructs that are imposed by the researchers. The role of multiple features can be tested against one developmental change (i.e. towards greater synchrony) to determine the individual contribution of each feature in accounting for age-related increases in adultlike attention. Adult synchrony, as a product of prioritization of meaning, also leads to interesting predictions about how adult synchrony may change as a result of improvements in comprehension. As infants and children gain perceptual and motor experiences that improve comprehension, adult synchrony may also improve as they learn to prioritize meaningful information in ways that are similar to adults. In the next section, we apply this concept within the context of goal-directed manual actions. 1.3 Development of attention towards meaning in hands In Chapters 3 and 4, I test two features related to hands as predictors of adult synchrony: hands and hand-object actions. First, attention may be allocated to hands because hands can convey meaningful information related to social relevance (Fausey, Jayaraman, & Smith, 2016; Frank et al., 2012), non verbal communication (Bertenthal & Boyer, 2012; Tomasello, Carpenter, & Liszkowski, 2007), joint attention, (Yu & Smith, 2013; Deak, Krasno, Triesch, Lewis, & Sepeta, 2014), and manual action (Flanagan & Johansson, 2003). Second, there are developmental changes in children’s comprehension of hands over the first few years of life as they learn to comprehend pointing (Bertenthal & Boyer, 2012), hand cues in early word learning (de Villiers Rader & Zukow-Goldring, 2010) and more specifically about information conveyed during hand-object action: object 15 properties (P. Tseng, Bridgeman, & Juan, 2012), goals (Woodward, 2009), and affordances (Klatzky, Pellegrino, McCloskey, & Doherty, 1989). Improvements in comprehension could be reflected in more adult-like prioritization of hands and hand-object actions. However, sensitivity to the information conveyed by hands, does not necessarily mean that infants and children will select hands as meaningful locations of attention, especially dynamic scenes with various features competing for attention. The selection of hands as a predictor is driven by prior studies indicating that increased comprehension of action should lead to more attention to hand locations (Bertenthal & Boyer, 2012; P. Tseng et al., 2012; Woodward, 2009). Although other aspects of action understanding might also drive observers’ attention, a practical benefit of using hands is that they are easy to spatially define in an image. For example, the attention of actors within a scene may cue areas for joint attention, but determining where actors were looking would be challenging to reliably identify in a video for every frame. Hands are physical entities and hand-object actions are defined by the time when hands are in physical contact with movable objects. We predicted that observers who synchronize their attention to hands and hand-object actions, spatially and temporally, would correspond with observers who synchronized attention in an adult-like way. I propose that infants’ and children’s developing action understanding should produce greater attentional synchrony with adults when viewing manual actions. Prior work with adults shows that when observers comprehend the action they are watching, their eye gaze moves to the same locations at the same time as the actor performing the action (Flanagan & Johansson, 2003; Kochukhova & Gredebäck, 2010). The relationship 16 between where a person looks and what a person understands is reciprocal. In Chapter 4, I will test whether providing perceptual opportunities to improve comprehension leads to increases in children’s adult synchrony. Perceptual opportunities may increase knowledge about an action and lead to greater attention at the locations that are meaningful. In daily life, perceptual opportunities occur in the ways we explore and interact with our environment. In training studies, the role of perceptual opportunities is understood to improve infants (and adults) ability to discern regularities and goal information contained with actions (Sommerville, Woodward, & Needham, 2005; Monroy, Gerson, & Hunnius, 2017). However, it is not known if improvements in action comprehension are also changing eye movements to look at specific locations at specific times. 17 Chapter 2 Developmental changes in infants’ and children’s attention to faces and salient regions vary across and within video stimuli 18 2.1 Abstract Visual attention in complex, dynamic scenes is attracted to locations that contain sociallyrelevant features, such as faces, and to areas that are visually salient. Previous work suggests that there is a global shift over development such that observers increasingly attend to faces with age. However, no prior work has tested whether this shift is truly global, that is, consistent across and within stimuli despite variations in content. To test the global shift hypothesis, we recorded eye movements of 89 children (6 months to 10 years) and adults while they viewed seven video clips. We measured the extent to which each participant attended to faces and to salient areas for each video. There was no evidence of global age-related changes in attention: Neither feature showed consistent increases or decreases with age. Moreover, windowed analyses within each stimulus video revealed significant moment-to-moment variations in the relation between age and each visual feature (via a bootstrapping analysis). For some time windows, adults looked more often at both feature types compared to infants and children. However, for other time windows the pattern was reversed—younger participants looked more at faces and salient locations. Lack of consistent directional effects provides strong evidence against the global shift hypothesis. We suggest an alternative explanation: Over development, observers increasingly prioritize when and where to look by learning to track which features are relevant within a scene. Implications for the development of visual attention and children’s understanding of screen-based media are discussed. 19 2.2 Developmental changes in infants’ and children’s attention to faces and salient regions vary across and within video stimuli The visual world is dynamic. Since we cannot “rewind” or “pause” events in real life, we must look at the right place at the right time to glean the most important information. Poor visual acuity in the peripheral areas of the visual field means that humans must make eye movements to direct the high-acuity fovea towards the informative areas in a scene from moment to moment (Land & Fernald, 1992; Westheimer, 1982). What influences where observers look, and how do those influences change over development? Two influences that have been widely studied in the developmental literature are socially-relevant features (e.g., faces) and visually-salient features. Faces influence visual attention by drawing gaze towards socially-meaningful locations that convey information such as affect, attention, and speech (for a review, see (Bruce, 1993)). Visually-salient features attract gaze to locations whose appearance (e.g., color, motion) stands out from the surrounding scene (Borji & Itti, 2013; Itti & Koch, 2000; Itti, Koch, & Niebur, 1998; Itti & Baldi, 2005). Developmental changes in attention from visually salient features to meaningful areas in a scene, such as faces, could be indicative of a shift in attentional biases. We will refer to this as the global shift hypothesis, and review the evidence in greater detail below. However, little scrutiny has been given to whether developmental changes in attention to faces and salient areas are truly global. A global developmental change in attention should be found consistently across and within stimuli that vary in content. Yet, prior 20 developmental studies of free viewing often present only a small number of stimuli of short duration and/or aggregate looking measurements over an entire stimulus rather than test the consistency of attention patterns across and within stimuli. To address these limitations and test the consistency of age-related changes in attention, we measured infants’ and children’s (6 months to 10 years) and adults’ eye movements across and within a wide set of stimuli with diverse content. Using a sufficiently large data set, we measured attention to faces and visually-salient locations to examine whether developmental changes in attention to each type of feature were global—that is consistent across and within stimuli. Understanding whether and how visual features’ influence on attention changes over development has broad significance. For instance, atypical patterns of looking to faces has been implicated in identifying infants and children who are at risk for Autism Spectrum Disorder (Klin & Jones, 2008; Klin, Jones, Schultz, Volkmar, & Cohen, 2002). Using a diverse stimulus set can inform on whether there are developmentally-normative changes in face looking that are independent of stimulus variations. It is also important to understand attention development in the context of viewing screen-based media. Watching TV shows and videos/DVDs is pervasive: 35% of children aged 0-2 are exposed to screen media on a daily basis, and those that are exposed average 42 minutes of viewing per day (Rideout, 2017). Viewing becomes more common and more extensive with age: 67% of children aged 2-4 exposed to screen media each day with an average duration of 159 minutes. Despite the purported educational benefits of media intended for infants and children, there are well-documented limits on what children actually learn (Wartella, Richert, & Robb, 2010). Studying how visual features influence looking behavior has potential implications for 21 understanding how media should be designed to improve children’s learning of educational content. 2.2.1 Faces and salient locations attract adults’ attention Adults distribute their attention to socially-relevant locations, such as people’s bodies (Foulsham et al., 2011), eyes (Birmingham et al., 2009), and locations relevant to the goals of others’ motor actions (Ballard & Hayhoe, 2009; Land, 2009). Faces are a particularly strong feature that captures adults’ attention (Birmingham et al., 2009; Franchak et al., 2016; Frank et al., 2009; Shepherd et al., 2010). Heightened attention to faces and facial features is found for both static (photo) and dynamic (video) stimuli (Birmingham et al., 2008, 2009; Võ et al., 2012; Yarbus, 1967), even when there is no specific viewing task (Birmingham et al., 2009), suggesting that socially-relevant features serve as a “default” location of interest. Adults also look towards visually-salient features, which capture attention based on their appearance. Locations that are colorful (Jost et al., 2005), high contrast (Parkhurst & Niebur, 2004; Reinagel & Zador, 1999), and contain motion (Mital et al., 2011) attract eye gaze regardless of whether the location is meaningful because those areas “pop out” by having a different visual appearance from the surrounding scene. For example, a lone painting hung askew will catch the eye when placed on a wall of properly-leveled artwork because of its unique orientation (rather than the content depicted in the painting). To quantify the degree to which a location in a scene differs in appearance from its surroundings, biologically-inspired computational saliency models have been devised to calculate relative saliency of locations based on different feature channels (Borji & Itti, 2013; Itti & Koch, 22 2000; Itti et al., 1998; Itti & Baldi, 2005). Comparing model predictions to adult gaze patterns confirms that fixated locations tend to have higher visual saliency compared with non-fixated locations when viewing both static images and dynamic scenes (Parkhurst & Niebur, 2004; Peters et al., 2005; T. J. Smith & Mital, 2013). 2.2.2 Evidence for and against a global developmental change in visual attention Consistent with the global shift hypothesis, several developmental studies have found age-related increases in looking at faces in static images (Amso et al., 2014; Gluckman & Johnson, 2013; Kwon et al., 2016). Infants younger than 6 months spent more time looking towards salient images of objects, but older infants attended to faces despite the presence of non-face images with greater visual salience (Kwon et al., 2016). A similar trend has been found in studies using dynamic stimuli (Frank et al., 2014, 2009; Rider et al., 2018). Frank and colleagues (2009) showed that 3-month-old infants’ eye movements when watching an animated clip were better predicted by a low-level salience model, but eye movements of 6-month-olds, 9-month-olds, and adults were better predicted by a face-looking model. A study comparing children (6-14 years) and adults watching videos found that face models were better or equal to salience models (depending on the stimulus) at all ages, however, face models were more predictive of adults’ attention compared with children’s attention (Rider et al., 2018). Increases in face-looking rates with age—particularly in the first year of life—parallel developmental improvements in infants’ visual search skill (Frank et al., 2014) and infants’ ability to discriminate and process faces (Farzin et al., 2012; Pascalis et al., 2002). 23 However, an increase in face looking does not necessarily entail a corresponding decrease in attention to areas with high saliency. Although several studies indicate decreasing influences of saliency on attention with age (Helo, Pannasch, Sirri, & Rämä, 2014; Kwon et al., 2016; Açik, Sarwary, Schultze-Kraft, Onat, & König, 2010), others find that saliency models are more predictive of adults’ gaze compared with infants and children (Rider et al., 2018; Franchak et al., 2016; Frank et al., 2009). One explanation is that faces tend to have higher saliency than irrelevant locations in a scene (Torralba et al., 2006; Wass & Smith, 2015). Thus, a global developmental increase in looking at faces may or may not be accompanied by a change in looking to salient regions depending on the correspondence between salience and faces in a given stimulus. Variations in the salience of faces in different videos—such as when comparing videos intended for child and adult audiences (Wass & Smith, 2015)—further motivates the need to test the consistency of age-related changes in attention across a wider set of stimuli. Despite evidence in support of the global shift hypothesis, there are a few conflicting results. Frank, Vul, and Saxe (2012) found different age-related changes in 3- to 30-month-olds’ attention to faces depending on scene content: Age predicted an increase in looking at faces for scenes that contained close-ups of children but predicted a decrease in face looking for scenes that included wide shots with multiple agents. Similarly, Franchak and colleagues (2016) found variations between infant and adult eye movements depending on scene content. For scenes with one agent, adults’ gaze was predicted by both saliency and looking to the actor’s face. However, for scenes with multiple agents, adults suppressed looking to salient areas, looked at the main actor’s face, but rarely looked at the other actors. 24 In contrast, young infants looked at moderately-salient locations and infrequently looked at the main actor’s face regardless of how many agents were in view. By 24 months, toddlers’ viewing patterns were adapted to scene content in a similar way to adults. Lastly, Stoesz and Jakobson (2014), found that the addition of more actors in a scene led to decreases in face looking that were more pronounced for children than for adults. Given the moderating role of scene content, support for the global shift hypothesis requires testing whether age differences in looking to faces and salient locations are invariant to differences in stimuli. However, prior studies have primarily tested only a single stimulus video or a small set of stimuli, which limits the ability to detect a global pattern across diverse content (Franchak et al., 2016; Kirkorian et al., 2012; Frank et al., 2009, 2014). Furthermore, the studies above suggest that different priority may be given to different features from scene to scene within a stimulus video. However, most studies of visual attention report average measures over the entire duration of a stimulus (Frank et al., 2009; Kirkorian et al., 2012; Rider et al., 2018) or compare a few select scenes or scene types (Franchak et al., 2016; Frank et al., 2012, 2014; Stoesz & Jakobson, 2014). Finer temporal granularity—that is, determining whether there are age differences on shorter time windows—could reveal whether age differences in attention are robust to variation in scene content within a stimulus. 2.3 Current study The goal of the current study was to test the global shift hypothesis by assessing age differences in attention to faces and visual saliency across and within scenes with varying 25 content. Because visual attention changes throughout infancy and childhood (Colombo, 2001; Oakes & Amso, 2018), we tested participants across a wide age range (6 months to 10 years and college-age adults). To our knowledge, no prior work has examined changes in looking to faces and salient locations in video stimuli that spans from infancy to adulthood, making this dataset unique. Participants watched seven 2-minute video clips from various child-friendly media while eye movements were recorded. We chose videos with diverse content to determine whether changes in visual attention to specific feature types are global, that is, invariant across stimuli. Exemplar screenshots and descriptions of the seven stimuli used in this study are available on Databrary (https://nyu.databrary.org/volume/1007). We calculated the proportion of time spent looking at faces in each scene (face looking) and the visual saliency of areas attended by each participant (gaze saliency) based on calculations from a saliency model (Harel, Koch, & Perona, 2006; Itti & Baldi, 2005). First, we tested whether there were consistent age-related changes in attention to faces and salient areas across the stimulus set. If there is a global shift towards looking more often at faces, we predict a consistent age-related increase in face looking for each of the seven videos. We made no specific prediction concerning global changes in gaze saliency given that saliency of faces may vary. An additional consideration is how best to represent the trajectory of age-related differences in visual attention. Previous work found rapid increases in face looking during infancy followed by a modest rate of change for older children and adults (Amso et al., 2014). For this reason, we calculated logarithmic in addition to linear functions to model age differences. 26 A second set of analyses tested how consistently age-related changes in attention to faces and salient locations exist over changes within each video. Rather than defining ad hoc scenes of interest as in past work (Franchak et al., 2016), we objectively and exhaustively tested temporal changes in eye movements by using a sliding window analysis. For each stimulus video, we defined 10-s windows every 5 s, resulting in 22 windows. Face looking and gaze saliency were calculated within each window for each participant to capture differences in attention as the scene changes. Evidence of truly global age-related changes in attention would entail greater face looking with age that is invariant over time within a video. Alternatively, if changes in scene content alter the importance of different features over time, age differences in gaze saliency and face looking may vary across windows. For example, in one window (or a few successive windows) adults may attend towards faces more so than children (a positive correlation between age and face looking). At a different time window faces might be less important to the scene, resulting in adults looking less often at faces compared to younger participants (a negative correlation between age and face looking). Such variation in the direction of age differences in face looking from moment to moment would provide evidence against the global shift hypothesis. It is important to note that saliency models are developed and evaluated by using databases of adult eye behaviors, raising a potential concern that saliency models may not be equally valid when applied to infants’ and children’s data. However, a recent comparison of saliency model performance found that the model used in the current study is one of the best models for predicting both adult and infant gaze to static images across seven evaluative metrics (Mahdi, Su, Schlesinger, & Qin, 2017). Since no past work has evaluated different 27 dynamic saliency models for modeling infant eye movements, we cannot rule out that a model tuned to infants and/or children would perform better. We also note that different models capture different visual features. The model we chose uses flicker and motion on a pixel level to measure dynamic change in a scene, but other models use visual entropy or a Bayesian representation of surprise (Itti & Baldi, 2009) and could provide different insights of developing attention. 2.4 Method 2.4.1 Participants Our goal was to analyze continuous effects of age from 6 months to 10 years. To ensure that sufficient data were collected across that entire range, participants were recruited from 7 narrower age ranges: 6-to 11-month-olds; 12-to 17-month-olds; 18-to 23-montholds; 2-to 4-year-olds; 4-to 6-year-olds; 8-to 10-year-olds; and college-aged adults. Infant age ranges were spaced closer together compared with child age ranges because past work indicated rapid developmental change in infancy followed by more gradual changes during childhood (Amso et al., 2014). We defined an a priori stopping rule based on data quality: Run participants until each stimulus has data of sufficient quality from 10 participants within each age range. Eye movement data were considered insufficient and excluded on a stimulus-by-stimulus basis if any of the two criteria were met: 1) the participant’s eye gaze data were missing (e.g., eye occlusion, looking away) for > 50% of the frames of a video, or 2) eye gaze data were missing for any single continuous period of > 30 s. Due to these exclusion criteria, participants in the final sample provided data for between 2 and 7 28 stimulus videos. For example, if an infant attended the first three videos but then refused to watch the remainder, the infant contributed data to 3/7 stimuli. At minimum, 10 adults and 60 children (6 months to 10 years) would be required. However, it was necessary to run additional younger participants because they were less likely to stay engaged through the entire session and consequently failed to contribute data to all 7 stimuli. Beyond the minimum of 10 per age range, we were required to run an additional ten 6-to 11-montholds, three 12-to 17-month-olds, three 18-to 23-month-olds, one 2-to 4-year-old, one 4-to 6-year-old, and one 8-to 10-year-old to ensure that each stimulus had sufficient data. Each video had data ranging from 76 to 87 participants. Table 2.1 displays the final sample size for each of the seven videos and shows the smallest age effect (r ) that could be detected at 80% power. Based on these effect size calculations, the study was adequately powered to detect medium effects of age. Table 2.1: Sample size (n) and smallest effect size (r ) that could be detected with 80% power for each video Video 1 Video 2 Video 3 Video 4 Video 5 Video 6 Video 7 n 85 87 83 75 83 87 82 r .29 .29 .30 .31 .30 .29 .30 The final sample consisted of 79 children ranging in age from 6-months-old to 10years-old (42 female) and 10 college-aged adults (5 female). All participants in the final sample had normal or corrected-to-normal vision with no color blindness or history of familial color blindness. Families were recruited from the Riverside County area. Participating 29 children were identified by their caregivers as Black/African American (n = 1), American Indian/Alaskan Native (n = 5), non-Hispanic White (n = 21), Hispanic or Latino(a)/White (n = 31), and more than one race (n = 21). Adults were college undergraduates recruited from the departmental participant pool and received course credit for participation. Adult participants identified as American Indian/Alaskan Native (n = 1), non-Hispanic White (n = 2), Asian (n = 3), and Hispanic or Latino(a)/White (n = 4). Families received $10 and a small gift or book for participating. The study procedure conforms to the US Federal Policy for the Protection of Human Subjects and was approved by the Institutional Review Board of the University of California Riverside under protocol HS-16-126: “Development of visual exploration while watching videos”. Participants (or their caregivers) signed an informed consent document after hearing the details of the study. Children aged 2-4 years gave verbal assent and children aged 5-10 years provided written assent. Eleven additional participants were tested (9 infants/children and 2 adults) but their data were excluded completely due to issues affecting the entire experimental session: failed to complete the experiment due to fussiness/inattention to all seven videos (9 children), falling asleep (1 adult), and distraction (checking a mobile phone instead of looking at the stimuli, 1 adult). 2.4.2 Stimuli Seven child-friendly videos were selected to present stimuli with diverse content: three Sesame Street videos, three music videos, and one children’s science demonstration video. Each video was 2 min in duration with limited graphical elements and no cuts (i.e., each stimulus was presented as a continuous shot). Beyond these criteria, the selected videos 30 varied in a number of ways: the number of agents on screen, the types of actions performed, the presence of non-human agents, and the presence of non-agentive movement. Each stimulus video, overlaid with data from infants and adults, is available to view on Databrary (https://nyu.databrary.org/volume/1007). To isolate the role of visual information on visual attention, audio cues that would inform gaze location (Coutrot & Guyader, 2014) were removed by replacing the original audio tracks with children’s instrumental music. Every participant received the same pairing of music and video. 2.4.3 Apparatus Each stimulus video was presented at 30 Hz on a 43.2 cm (diagonal) wide-screen monitor at a viewing distance of 60 cm. Stimulus videos subtended a visual angle of 31°×19°. The monitor was affixed to an adjustable arm and equipped with an Eyelink 1000 Plus remote eye tracker (SR Research Ltd.). Eye movements (right eye only) were recorded with a temporal resolution of 500 Hz. 2.4.4 Procedure Participants sat in a viewing room that was separated from the experimenter by a hanging curtain. A target sticker was placed on the forehead to facilitate the eye tracker detecting the observers’ eyes. Infants sat in a high chair with a 5-point harness to reduce body movement. Infants’ caregivers sat behind infants and were instructed to refrain from interacting with infants, pointing at the screen, or speaking. Children and adults sat in a chair (with a booster seat for younger children). 31 At the beginning of the study, the experimenter adjusted the monitor and calibrated the eye tracker. For infants, an attention-getting video played while the experimenter adjusted the monitor. A 5-point calibration routine was used for participants of every age and was followed by a 5-point validation check. Validation data were used to calculate the average error in degrees of visual angle between the target location and estimated point of gaze location. The calibration process was repeated if validation indicated an average error of less than 1.5° of visual angle. As described by Wass, Smith, and Johnson (2013), infant eye tracking data is often lower quality compared with older participants, which impacts both accuracy (disparity between reported and actual point of gaze) and precision (disparity between successive samples of reported point of gaze). Accuracy averaged M = 0.54° (SD = 0.26) across age. The correlation between age and average spatial errors was marginally significant (r = -.204, p = .055) with older participants having higher accuracy. However, when comparing average visual error by age groups, there was only a difference of 0.2° of visual angle between 6-to 11-month-olds (M = 0.66°, SD = .29) and adults (M = 0.43°, SD = .15), suggesting that differences in accuracy would have a minimal effect on analyses. Precision for each participant was calculated following a published method (Wass, Forssman, & Leppänen, 2014) using data from each video for which the participant contributed data. Precision averaged M = 1.68° (SD = .32) and was not significantly correlated with age (r = -.044, p = .680). After calibration and validation, participants were shown the 7 stimulus videos in a randomized order. Adults and children were instructed to simply watch the videos. Each 32 stimulus video was preceded by a gaze-contingent target in the middle of the screen that required a fixation for > 250 ms to trigger the video to start. 2.4.5 Data processing Because of concerns about the validity of fixation detection algorithms when ap- plied to younger participants with less robust data (Wass et al., 2014), raw eye tracking data were used to measure gaze behaviors. Data were extracted as a time series of horizontal and vertical gaze coordinates for each observer for each of the 7 stimuli. Time points were excluded if gaze locations exceeded the screen boundaries or were otherwise missing (eyes closed, turned away from screen, and eye occlusions). Face looking The proportion of time spent looking at the faces of agents was obtained using dynamic area of interest (AOI) analyses. For each video frame, Dataviewer software (SR Research Ltd.) was used to draw elliptical AOIs around the heads of each humanoid agent (i.e., human actors and Muppet characters) as they moved in the scene. Face looking was defined when the gaze location fell within the boundary of a face AOI. To compare across stimuli which had varying amounts of times with faces present on screen—and between participants who had different amounts of missing data—face looking rates were calculated for each participant by dividing the number of samples looking at faces by the number of samples with faces present during which the participant had valid (non-missing) data. 33 Gaze saliency Gaze saliency was calculated to determine the relative saliency of visually attended locations in comparison to the rest of the scene as in past work (Franchak et al., 2016; Tatler et al., 2011; T. J. Smith & Mital, 2013). Video frames were converted to images at the rate of presentation (30 Hz). Using the algorithm of Itti and Baldi (2005) as implemented in the GBVS toolbox (Harel et al., 2006), the relative salience of each pixel was calculated for each frame based on a combination of five feature maps (contrast, orientation, color, flicker, and motion). Dynamic features—flicker and motion—were calculated by comparing differences in successive video frames. For example, pixel changes that occur as a character moved from off-screen to on-screen would indicate greater flicker and motion relative to an otherwise still background. Image feature maps were weighted equally to create a composite saliency map, integrating static and dynamic features. Each pixel within the map was assigned a rank between 1 to 100 which reflected its saliency relative to the other pixels in the video frame; the most salient pixel ranked 100. An example of an heatmap showing the saliency of different regions on a video frame is available on Databrary (https://nyu.databrary.org/volume/1007). For every frame of each video, the average saliency rank of pixels was calculated within a 1.2° diameter circle around the point of gaze. Larger gaze saliency scores indicate that the participant looked at a relatively more salient location within the frame. 34 2.5 Results The first set of analyses assessed the global shift hypothesis by measuring age- related changes in gaze towards faces and salient features across the seven stimulus videos. The second set of analyses tested the global shift hypothesis by measuring age-related differences in attention from moment to moment within each stimulus video. 2.5.1 No consistent age differences in face looking or gaze saliency across stimuli To assess the global shift hypothesis, we tested across videos for consistent age- related increases in face looking and consistent changes (either increases or decreases) in gaze saliency. We used generalized estimating equations (GEE) to model the age-related changes. Like a regression model, GEEs estimate change in criterion variables (i.e., gaze saliency and face looking) from predictors (i.e., age as a continuous variable and video as a categorical factor). The main advantage of using GEEs is that they can handle participants contributing varying amounts of data in a repeated measure, whereas an ANCOVA would require excluding participants who do not contribute to every level of the repeated measure (i.e., participants who did not watch all 7 videos—likely infant observers—would be excluded). We tested two GEE models for each visual feature: A model with age as a linear (continuous) predictor and a model with log-transformed age as a continuous predictor; both included video as a categorical predictor. Follow-up analyses examined each video separately with regression models testing for changes in looking according to age. 35 Face looking The proportion of time spent looking at faces varied widely between videos (Figure 2.1), ranging from M = .017 (SE = .001) for Video 5 to M = .831 (SE = .009) for Video 1. The overall range in face-looking rates speaks to the diversity of the content. Face looking was high in Video 1, which depicted multiple agents playing a song together, but was low in Video 5, which focuses on a series of mechanical events (human agents play a peripheral role). However, contrary to the global shift hypothesis there was no uniform age-related increase in face looking across these diverse videos (regardless of age model, linear or logarithmic). A GEE model with linear age showed a significant effect of video (Wald’s χ2 = 5534.56, p ¡ .001) and a significant linear age×video interaction (Wald’s χ2 = 21.66, p = .001), but no main effect of linear age. In contrast, the logarithmic model did find a significant main effect of log age (Wald’s χ2 = 6.16, p = .013) in addition to the significant effect of video (Wald’s χ2 = 245.31, p ¡ .001). However, the significant logarithmic age parameter was negative, opposite to the global shift hypothesis prediction. Further, a significant age×video interaction (Wald’s χ2 = 34.11, p ¡ .001) moderated the main effect of age, casting doubt on a global age effect in face looking across videos. To further explore the moderating effect of stimulus video on age-related changes for individual videos, regressions were fit using linear and logarithmic age to predict face looking separately for each video (Table 2.2). Only Video 4 demonstrated significant agerelated change; linear (R2 = .091, p = .009) and logarithmic (R2 = .240, p ¡ .001) age significantly predicted face looking. Surprisingly—and contrary to the global shift hypothesis— negative changes in Video 4 indicate that face looking decreased with age. Moreover, the 36 Proportion of face looking 1.00 Video 1 Video 2 Video 3 Video 5 Video 6 Video 7 0.75 Linear 0.50 Logarithmic 0.25 0.00 Video 4 0 5 10 15 20 Age (in years) Figure 2.1: Changes in face looking as a function of age for all seven videos. Linear and logarithmic functions are plotted for each stimulus video. lack of significant age effects across the remaining videos runs contrary to the prediction of a global increase in attention to faces. Lastly, past research identifying the global shift hypothesis has examined changes in face looking over infancy. An additional analysis ruled out that a global shift would be found when testing only participants younger than 18 months. The linear age GEE with the restricted age range indicated no significant effect of linear age, but did indicate a significant effect of video (Wald’s χ2 = 441.52, p ¡ .001) and a significant linear age×video interaction (Wald’s χ2 = 43.41, p ¡ .001). The analysis using logarithmic age revealed a similar result. The model found no significant effect of log age but there was a significant effect of video (Wald’s χ2 = 39.58, p ¡ .001) and a significant log age×video interaction (Wald’s χ2 = 37.17, p ¡ .001). 37 Gaze saliency Similar analyses were performed for orienting to salient locations. For each participant, a composite gaze saliency score for each video was calculated by: 1) averaging the saliency ranks of pixels within a 1.2 diameter of the participant’s point of gaze on every frame, and then 2) averaging across all frames in each video. Across age and stimuli, observers on average looked towards relatively more salient areas of the scene with a grand mean gaze saliency rank (out of 100) of M = 81.53 (SD = 7.93). Although consistently high, gaze saliency differed between the seven videos, with mean ranks (collapsing across age) ranging from 66.01 to 85.32. However, as is evident from inspecting the graphs in Figure 2.2, there were no consistent linear or logarithmic age-related changes in gaze saliency across videos. The linear age GEE model confirmed a significant effect of video (Wald’s χ2 = 628.28, p ¡ .001), but did not find significant age or age×video effects. Similarly, the logarithmic age model showed a significant effect of video (Wald’s χ2 = 50.51, p ¡ .001) and failed to find a main effect of age. However, there was a significant age×video interaction in the logarithmic model (Wald’s χ2 = 12.63, p = .049), suggesting that age differences in looking at salient regions depended on the stimulus. To further explore the age-related changes in gaze saliency for individual videos, regressions were fit using linear and logarithmic age to predict gaze saliency separately for each video (Table 2.2). For five of the stimuli, neither linear nor logarithmic changes in gaze saliency were found as a function of age. Two stimuli indicated significant fit with a logarithmic function, Video 2 (R2 = .055, p = .029) and Video 5 (R2 = .126, p = .001). Both videos revealed age-related increases in looking to salient areas, but effect sizes were 38 Gaze saliency 100 Video 1 Video 2 Video 3 Video 5 Video 6 Video 7 75 Linear 50 Logarithmic 25 0 Video 4 0 5 10 15 20 Age (in years) Figure 2.2: Changes in gaze saliency as a function of age for all seven videos. Linear and logarithmic functions are plotted for each stimulus video. modest. In summary, no main effect of either linear or log-transformed age was found, which reflects a lack of a global age-related change across videos. As with face looking, we ruled out that global changes would be found when restricting the analyses to participants ¡ 18 months. The linear age GEE found no significant effect of age, but did find a significant effect of video (Wald’s χ2 = 46.35, p ¡ .001) and a significant linear age×video interaction (Wald’s χ2 = 16.42, p ¡ .012). The GEE model using logarithmic age showed no significant effects of log age, but there were significant effects of video (Wald’s χ2 = 13.03, p = .042) and a age×video interaction (Wald’s χ2 = 14.50, p = .024). 39 Table 2.2: Regression parameters for linear and logarithmic age-related changes in face looking and gaze saliency for each stimulus video. Face looking Linear Video 1 2 3 4 5 6 7 *p < .05 2.5.2 b -0.046 0.136 -0.061 -0.301 -0.016 -0.155 -0.038 R2 .002 .019 .004 .091* ¡.001 .024 .001 Gaze saliency Log b -0.131 0.033 -0.035 -0.490 -0.059 -0.073 -0.088 R2 .017 .001 .001 .240* .003 .005 .008 Linear b 0.049 0.166 -0.204 -0.177 0.146 0.006 0.059 R2 .002 .027 .042 .031 .021 ¡ .001 .003 Log b 0.008 0.234 -0.058 -0.147 0.354 0.086 0.198 R2 ¡ .001 .055* .003 .022 .126* .007 .039 Within-stimulus variability moderates age differences in visual attention Next, we tested for consistency in age differences in looking towards faces and salient areas over time within each stimulus. One possibility is that age-related increases in looking to faces occur during particular moments within the videos (and null effects of age at other moments), consistent with the global shift hypothesis. A second possibility is that the direction of age differences to each feature changes as a function of time (e.g., adults looking more at faces/salient areas compared to infants/children at one time and less at faces/salient areas at another time). Such inconsistent age differences would provide strong evidence against the global shift hypothesis. To test these possibilities, we used a sliding window analysis to measure the differences in attention to visual features as a function of age at different points of time within each video. Each 2-min video was segmented into 10-s windows that were distributed evenly throughout the video. The first window started at the beginning of the video, and each subsequent window was placed 5 s after the start of 40 the previous window resulting in 22 overlapping windows. Face looking and gaze saliency were recalculated for each participant within every 10-s window. We calculated separate GEEs for each video to predict attention to each visual feature (face looking, gaze saliency) based on window (as a factor) and age (as a continuous predictor). Significant age×window interactions would suggest that attention to visual features differed by age over the course of the video. Such interactions would indicate features had a differential, age-dependent influence on visual attention at different points in a video as the scene content changes. Significant age×window interactions were followed up with separate correlations between age and visual features to determine the direction and strength of the age difference within each window. The consistency of the direction of age correlations was of key interest in differentiating between the possibilities above. Because there was greater evidence in the prior section for logarithmic effects of age, we used logtransformed age in the models testing these possibilities and in follow-up correlation tests. We tested parallel models using linear age; however, we omitted those results for brevity because there were no substantive differences that would affect the interpretation of the findings. Changes in scene content within a video moderated the direction of age effects on face looking For all videos, the relationship between age and face looking varied significantly from window to window. Figure 2.3 shows the fluctuating relationship between age and face looking across windows of each video. To better illustrate the differential effects of age, separate lines are plotted to show face looking values for infants (6- to 24-months), children 41 (2- to 10-years), and adults. However, in GEE models and correlations, logarithmic age was analyzed as a continuous variable: In Figure 2.3, the inset figures illustrate the continuous functions underlying three exemplar windows. The direction and strength of correlations between age and face looking are represented in Figure 2.3 based on color shading over each time window. As evidence of the changes from moment to moment, windows within each stimulus show age effects that vary both in strength and direction (red bars indicate less looking to faces with age whereas blue bars indicate more looking to faces with age). Seven GEE models were calculated (one for each video) to test for the effects of age, window and interactions between age and window on face looking. Table 2.3 shows that for all seven videos, there were significant age×window interactions, indicating varying age-related differences in face looking from moment to moment. We also found a significant effect of window for all 7 of the GEE models indicating significant mean level fluctuations in face looking as the scene content changed within these videos, irrespective of age. Lastly, only one main effect of log-transformed age was found (Video 4), as was observed in the stimulus-level analysis of face looking in the previous section. Follow-up analyses explored age×window interactions by measuring the correlation between logarithmic age and face looking for each time window (e.g., the r values depicted in Figure 2.3). Figure 2.4A shows a frequency distribution of every correlation between age and face looking for each window aggregated across the seven videos. Two findings emerge from examining the distribution of correlations across videos. First, the presence of both positive and negative correlations indicate that there are both times in which adults look more to faces than infants but also other windows when infants attend more towards 42 Proportion of face looking 1.00 Video 1 Video 2 Video 3 Video 5 Video 6 Video 7 Video 4 0.75 Age Group Adult 0.50 Child 0.25 10 15 Window Video 6, Window 12: Negative age difference 1.00 0.75 1.00 Video 6, Window 19: Positive age difference 0.75 0.50 5 10 −0.3 −0.6 15 Age (in years) 20 Video 7, Window 12: No age difference 1.00 0.50 0.25 0 0 0.75 0.50 0.25 0.00 0.3 20 Proportion of face looking 5 Proportion of face looking 0 Proportion of face looking 0.00 Infant r-value 0.6 0.25 0.00 0.00 0 5 10 15 Age (in years) 20 0 5 10 15 Age (in years) 20 Figure 2.3: Windowed analyses of age differences in face looking over the duration of all 7 videos. Age was analyzed as a continuous variable, but for illustration purposes age was averaged into three groups (infants: 6-24 months; children: 2-10 years; adults: 18-22 years). Colored vertical bars represent strength and direction of correlation between age and face looking for every window. Darker colors indicate stronger correlations. No data are plotted for the first 5 windows of Video 5 because no faces were present during that portion of the video. Insets depict examples of 3 individual windows to show a negative correlation, positive correlation, or no correlation between age and face looking with age represented as a continuous predictor. faces than adults. The second finding from these correlations is that a relative minority of windows show statistically-significant correlations. This indicates that infants, children, and adults more often prioritized faces in a similar rather than a different way. However, given the total number of windows in which correlations were calculated, the probability of spurious correlations is high. To estimate the expected range of 43 correlation values due to chance, we created a bootstrapped null distribution by randomly re-assigning the age labels to each eye movement time series (Figure 2.4B). Ages were randomly shuffled within videos but not for each window in order to preserve the temporal ordering between windows for any given participant. Age correlations with face looking were recalculated for the randomly shuffled data. This was repeated 1000 times to produce a null distribution of correlations between age and face looking. Figure 2.4B shows this distribution of randomized correlations with vertical lines indicating the range in which 95% of the correlations occurred. Next, we determined how many correlations in the observed data were more extreme compared to the 95% range from the null distribution (arrows on the x-axis of Figure 2.4A). Only 5% of the correlations should fall outside of this range by chance, however, in the observed data 26.8% fell outside of the range. Using the ’multicon’ package in R (Sherman & Serfass, 2015), we conducted a randomization test that confirmed the number of significant correlations found was greater than chance. The test indicated that there was significant difference between the expected number of significant correlations (M = 7.43, SE = 4.04) and the 40 observed statistically significant correlations, p ¡ .001. This indicates that the prevalence of significant correlations is not spurious and points to real age-related differences in orienting to faces. Moreover, of the 40 significant correlations, 28 were negative and 12 were positive, providing further evidence against the notion of a global increase in face looking with age. 44 20 10 0 −0.6 −0.3 Frequency 20000 2.5% B. Randomized correlations 0.0 r-value 0.3 0.6 0.3 0.6 97.5% Frequency A. Distribution of observed age by face looking correlations 10000 0 −0.6 −0.3 0.0 r-value Figure 2.4: (A) Observed distribution and (B) randomized null distribution of correlations between age and face looking for each window aggregated across videos. Vertical black lines mark the 95% range of expected correlations in the null distribution. Age-related differences in gaze saliency from moment to moment Similar to the patterns observed in face looking, for most stimulus videos, the relationship between age and gaze saliency varied significantly from window to window. Figure 2.5 shows the changing relation between age and gaze saliency across windows in each of the videos. Again, for illustrative purposes, separate lines are plotted to show gaze saliency means for infants (6-24 months), children (2-10 years), and adults; however, 45 logarithmic age was analyzed as a continuous variable in GEE models. As with face looking, variation in the relation between age and face looking over windows provides evidence of age-related changes from moment to moment. Gaze saliency 100 Video 1 Video 2 Video 3 Video 5 Video 6 Video 7 75 Age Group Adult 50 Child 25 0 Video 4 Infant 0 5 10 15 Window r-value 0.6 0.3 0 −0.3 −0.6 20 Figure 2.5: Windowed analyses of age differences in gaze saliency by video stimulus. Age was analyzed as a continuous variable, but for illustration purposes age was averaged into three groups (infants: 6-24 months; children: 2-10 years; adults: 18-22 years). Colored vertical bars represent strength and direction of correlation between age and gaze saliency for every window. Darker colors indicate stronger correlations. Similar to face looking, seven GEE models were calculated to test for effects of age, window, and age×window interactions on gaze saliency. As shown in Table 2.3, significant age×window interactions were found for all seven videos, indicating varying age-related differences in gaze saliency from moment to moment. In addition to the significant interactions, there were significant main effects of window for 6/7 videos indicating mean level differences in gaze saliency over time irrespective of age. Finally, as seen in the previous analyses, Videos 2 and 5 showed main effects of log age. 46 Table 2.3: Generalized Estimating Equation Wald’s χ2 for effects of window, age, and age×window for each video stimulus. Face looking Video 1 2 3 4 5 6 7 *p < .05 Window 57.88* 310.42* 35.48* 121.95* 38.39* 188.12* 122.98* Age 1.20 .20 .22 21.82* .12 .38 1.71 Gaze saliency Window×Age 56.47* 179.47* 36.68* 103.71* 39.71* 113.27* 142.09* Window 45.20* 116.96* 29.21 75.95* 116.04* 123.38* 188.89* Age .001 7.88* .00 .15 11.53* 1.79 3.72 Window×Age 44.31* 75.16* 41.74* 77.49* 120.89* 115.81* 114.90* We explored the age×window interactions by examining the distribution of all correlations between age and gaze saliency (Figure 2.6A). As with face looking, the distribution of correlations clustered around r = 0, indicating that participants across ages more often prioritized salient locations in a similar way. We created a null distribution of correlations based on 1000 iterations of reshuffling age labels and eye movement data. Figure 2.6B depicts this distribution, with vertical lines delineating the middle 95% of the data. The original observed data was compared to the 95% range in the null distribution to determine whether the number of significant windows could be due to chance. As seen in Figure 2.6A, 22.07% of the observed correlations exceeded the 95% range of the null distribution. Using the ’multicon’ package in R (Sherman & Serfass, 2015), we conducted a randomization test that confirmed a significant difference between the average expected number of significant correlations due to chance (M = 7.4, SE = 4.09) and the 34 statistically significant correlations observed in the study, p¡.001. Unlike the age-face looking correlations, significant age-saliency correlations tended to be positive (30/34) rather than negative (4/34). 47 20 10 0 −0.6 −0.3 Frequency 20000 2.5% B. Randomized Correlations 0.0 r-value 0.3 0.6 0.3 0.6 97.5% Frequency A. Distribution of age by gaze saliency correlations 10000 0 −0.6 −0.3 0.0 r-value Figure 2.6: (A) Observed distribution and (B) randomized null distribution of correlations between age and gaze saliency for each window aggregated across videos. Vertical black lines mark the 95% range of correlations in the null distribution. 2.6 Discussion The current study measured the eye movements of infants, children, and adults across and within seven videos to test the global shift hypothesis. No global shift in attention was discerned at any level of analysis. We found no consistent age-related changes in looking to faces or to visually-salient locations across videos. Of the seven videos, only two videos 48 showed modest age-related increases in gaze saliency and only one video showed an agerelated decrease in face looking. No video showed an age-related increase in face looking. However, there were moment-to-moment age differences in looking at both faces and salient locations within videos. These findings suggest that the global shift hypothesis does not appropriately capture the nuances of developmental change in visual attention. Age differences in looking at both types of visual features only emerged at shorter time scales. Sliding window analyses revealed that the relation between age and each visual feature was in constant flux: For some time windows, age was correlated with face looking and gaze saliency, but for other windows participants of all ages attended to features in a similar way. When age did predict differences in face looking, we found both positive and negative correlations, suggesting that age differences were not global but rather depended on different prioritization of faces according to age. Sometimes adults looked more often at faces, but other times infants looked more often at faces. 2.6.1 Lack of global changes in attention The lack of overall age-related changes in visual attention to salient areas and faces differs from many prior studies that found such effects: increases in face looking (Franchak et al., 2016; Frank et al., 2009; Kwon et al., 2016; Amso et al., 2014), decreases in looking to salient areas (Helo et al., 2014; Kwon et al., 2016; Açik et al., 2010), or increases in looking to salient areas (Rider et al., 2018; Franchak et al., 2016; Frank et al., 2009). Many of these studies focused on developmental changes that occur during infancy, which could explain a discrepancy in findings. However, when restricting analyses to participants ¡ 18 months, 49 we still did not find global age-related change for either feature. There are several other differences between the current study and past work that may explain conflicting findings. One potential explanation is the duration of the selected stimuli. Each video clip used in the current study was 2 min, but most past studies used either static images or video stimuli that were shorter in duration: one 60-sec video (Franchak et al., 2016); twelve 20-sec videos (Frank et al., 2012); twelve 4-sec videos from one television program (Stoesz & Jakobson, 2014); 24 4-sec clips from a single video (Frank et al., 2009)). Averaging looking behavior over short stimuli likely misses the heterogeneity present in longer videos that would yield evidence of moment to moment changes. Indeed, our windowed analyses indicate that there is over a 20% chance of randomly picking a 10-s window from our stimuli that would show an age-related change in face looking. Thus, studies that use only one stimulus or a few stimuli with short durations may be at risk for selection effects that could lead to incorrect generalizations. Using images and short videos may also capture unique age-related differences in early scene inspection that are not characteristic of visual attention more broadly. Within the first few seconds of examining a new scene (e.g., following a cut), adults move from frequent, quick fixations to longer fixations associated with inspecting objects while infants persist slightly longer with rapid fixations (Helo, Rämä, Pannasch, & Meary, 2016). Other studies have found a bias in adults, but not young infants, towards looking at the center of the scene immediately following a cut or at the onset of a stimulus (Mital et al., 2011; Kirkorian et al., 2012; Wang et al., 2012). This has been attributed to an adult viewing strategy that expects screen-based media to center relevant information in the image frame 50 (P.-H. Tseng, Carmi, Cameron, Munoz, & Itti, 2009). Therefore, studies that use short stimuli or stimuli with frequent scene cuts may see biases that result from age differences in early scene viewing. However, other studies used longer stimuli, such as three 5-minute videos (Rider et al., 2018) and two 2-minute videos (Frank et al., 2014), and did find consistent age-related changes in the influences of saliency and/or faces on eye movements. However, since neither study systematically tested for changes in attention to saliency/faces on shorter timescales within each video, it is unclear whether the age differences at the video level are due to consistent effects over time or from local effects confined to particular times. Indeed, in the few videos that showed overall age differences in the current study, it was clear on closer inspection that those overall effects were in fact driven by differences in how adults or infants selected faces or salient locations for a few time windows as opposed to consistent effects across the entire video. Could the lack of a global age-related difference in gaze saliency be the result of the saliency algorithm being tuned to adults? We would argue that the opposite is true. If the saliency algorithm was a better measure across the board for adults compared to younger participants, we would expect to see higher gaze saliency values in adults for every video. Instead, we found that for many videos there was no substantive difference in saliency across ages. This suggests that even though the saliency model is trained on adult data, it performed similarly when applied to infant and child data for the majority of the time. The lack of global age-related changes in the current study may be a consequence of the particular video content we selected. Wass and Smith (2015) found that television 51 programs designed for toddlers more often contain a speaking character whose face is salient compared with programs designed for adults. It is possible that past studies used more infant or child focused video content, which could bias looking towards both faces and salient areas. In the current study, we selected the seven videos to provide diverse content that would be engaging to participants across the ages we tested, which included media designed both for children and for adults. Yet, delineations in our stimuli between videos designed for adults or children provide no insight to why particular videos showed age-related changes. For instance, age-related changes in saliency were observed for both child-directed (Video 2) and adult-directed (Video 5) stimuli. Diversity in video content provides the opportunity to investigate how other properties may explain the results we present. However, the challenge with this type of post hoc approach is identifying which of the countless properties that vary between videos or scenes can explain the findings. Possibly, diversity in the content we chose accounts for why there were no consistent age-related changes across videos. Past studies might have found more consistent effects because stimuli were homogeneous in content. Finally, the use of dynamic versus static stimuli in the current study versus past investigations may account for differences in face looking. Many (but not all) of the studies that found consistent age-related trends in face looking used static images (Helo et al., 2014; Kwon et al., 2016; Açik et al., 2010; Amso et al., 2014), whereas studies that found inconsistent effects of face looking (Frank et al., 2009; Franchak et al., 2016) used dynamic videos. Recent work demonstrated that face-looking preferences are greater in static as opposed to dynamic stimuli (Libertus, Landa, & Haworth, 2017; Stoesz & Jakobson, 2014). Faces may be the most relevant place to look in a static image, but in videos that display 52 complex actions involving hands and objects, faces may less often be the most important location. Beyond screens, real-life visual attention is not only used for passively viewing events but for actively controlling movements. Accordingly, infants infrequently look at caregivers’ faces and spend more time looking at objects (Yu & Smith, 2013; Franchak, Kretch, & Adolph, 2018)—presumably to support object-related manual actions. Thus, previously-measured global changes in looking to faces may be a byproduct of using less ecologically-relevant stimuli, such as static images, that do not convey as much information about action. 2.6.2 Development of visual attention involves changes in prioritizing features How, then, does visual attention to faces and salient features develop? We argue that children become better able to prioritize which features to attend to—whether faces or salient locations—depending on the particular content in a scene. Prior work has shown that adult observers prioritize which visual features to attend to based on their importance within a scene or their relevance to a task (Franchak et al., 2016; Henderson & Hayes, 2018; Henderson, 2017; Ballard & Hayhoe, 2009; Rothkopf, Ballard, & Hayhoe, 2007; T. J. Smith & Mital, 2013). The current study provides evidence that infants and children often, but not always, prioritize visual features in a similar way as adults. At the overall video level, age differences in gaze saliency and face looking were marginal. At the window level, most time periods within videos showed no age differences. Since gaze saliency and face looking changed greatly from moment to moment, this suggests that even observers as young as 6 months are responding to changes in feature relevance in a similar way as older children 53 and adults. The most striking example is the change in face prioritization over windows 2-10 of Video 2; Figure 2.3 shows that face looking jumps from 13% to 72% and then back to 13% in a short time for participants of every age. Related work found that increasing homogeneity in infants’ eye movements patterns within age groups could be explained by increasing similarity to adults’ eye movement patterns, suggesting a quantitative rather than qualitative change in how visual features attracted attention over development (Franchak et al., 2016). Similarities between infants’ and adults’ prioritization is also consistent with prior work showing that many other aspects of visual attention are mature by 6 months of age (Oakes & Amso, 2018), with some visual processing abilities reaching adult-like levels: scanning and fixations to simple shapes (Bronson, 1994), configural face processing (Cashon & Cohen, 2004), and perception and discrimination of object features (Colombo, Mitchell, Coldren, & Atwater, 1990). Despite similarities in how infants and adults prioritized faces and salient locations, age differences in attention to each feature could be detected for some time windows. Indeed, all seven videos showed age×window interactions for both visual features. Developmental differences in prioritization are consistent with past work showing that age moderates the degree to which infants’ face looking and gaze saliency varied across different types of scenes (Frank et al., 2012; Franchak et al., 2016). The current study extends these findings by showing that changes in prioritization are evident from infancy through childhood. Moreover, these differences emerged when scenes were defined in an objective way—evenly-spaced time windows that are agnostic to video content—rather than an ad hoc way—defining scenes based on particular content features. Furthermore, the current 54 study is unique in showing that infants’ prioritization differs from adults’ both in looking less often and more often at visual features depending on the time window. Thus, the developmental difference in prioritization cannot be explained by a global deficit in selecting (or inhibiting) a particular feature type. What might account for developmental differences in prioritization? First, temporal and spatial changes in attention may account for age differences in prioritizing features. Although some aspects of attention are nearly adult-like in the youngest participants we tested, other aspects are not. Infants’ temporal processing, or the rate at which infants are able to isolate individual changes in a stimulus, is much coarser than adults’ (Farzin, Rivera, & Whitney, 2011). In a dynamically changing scene, infants may be slower to change their prioritization of visual features to reflect what is important from moment to moment. Additionally, the development of endogenous attention—that is, the ability to exert voluntary control to select and inhibit where to attend—shows protracted improvements throughout infancy and early childhood (Colombo, 2001; Oakes & Amso, 2018). For example, children’s ability to sustain attention to a particular target while inhibiting distraction from other targets improves from 2.5 to 4.5 years (Ruff, Capozzoli, & Weissberg, 1998). Note that these same changes in attention motivate the global shift hypothesis—that is, increasing endogenous control allows infants to inhibit looking to irrelevant, salient areas while actively selecting faces. However, the current results suggest something more subtle: Increasing endogenous control allows infants to better prioritize information by inhibiting competition from faces and/or salient regions while sustaining attention towards locations they deem informative, whatever those might be. 55 The second possibility is that differences in prioritization reflect developmental changes in how infants and children comprehend scene content and determine which locations are most informative. Deficits in infants’ understanding of media are especially notable, as children under 24 months fail to even notice when scenes in a video narrative are presented in a scrambled order (Pempek et al., 2010). Such deficits in scene comprehension are likely a key factor that accounts for differences in how infants and children distribute eye movements while watching videos (Franchak et al., 2016; Helo et al., 2017; Kirkorian et al., 2012; Kirkorian & Anderson, 2018). It is important to note that in the current study we analyzed overall rates of face looking irrespective of which face observers fixated. Many scenes had multiple faces in view, so it is possible for observers of different ages to have similar face-looking rates while attending to different targets. Moreover, facelooking rates could be similar for two observers who looked at the same face for the same duration but at different times (even in the short, 10-s windows). Thus, it would be incorrect to interpret similar face-looking rates (and gaze saliency scores) between observers or between age groups to indicate similar comprehension of the scene. A more nuanced analysis of synchrony in looking at specific faces at specific times might bear on this issue; however, this was beyond the scope of the current investigation. Finally, attention and comprehension likely interact in several ways which would lead to age-related differences in viewing behavior. First, prior research shows that children’s gross attention to media depends on their understanding (Anderson et al., 1981; Lorch & Castle, 1997): Children are more prone to distraction and visually attend less while watching content that is beyond their comprehension. Although we excluded participants who had 56 large missing sections of gaze data, it is still possible that lower engagement in younger participants who did not understand what they were watching could have impacted their overall attention. Looking away from the video would prevent observers from monitoring key visual targets in the scene and disrupt following the narrative. Second, prior work shows age-related differences in how salient visual features interact with understanding of scene content in determining where observers look. For example, when viewing static images altered to include inconsistent objects (i.e., a bar of soap on a kitchen table), adults spend long periods fixating inconsistent objects regardless of their saliency but 24-month-olds only do so when those objects are visually salient (Helo et al., 2017). 2.6.3 Implications for attention development and media viewing In sum, the current study demonstrates that the developmental changes in eye movements while watching complex, dynamic stimuli reflect age differences in how observers prioritize different features as opposed to a global age-related shift in the selection of specific features. The results from this study add to a growing literature showing that singular feature based-approaches are insufficient to capture the complexity in gaze allocation (Tatler et al., 2011; Sailer, Flanagan, & Johansson, 2005; Henderson & Hayes, 2018; Land & McLeod, 2000; Pereira, Birmingham, & Ristic, 2019). What is meaningful in a scene changes dynamically and may not predictably map on to distinct visual features, making it challenging to determine why observers prioritize locations in a particular way. More work is needed to map out the degree to which changes in attention and/or comprehension account for developmental changes in prioritizing where to look. The current study makes an informative methodological contribution in showing that variability is the rule, not the 57 exception. Improving our understanding of how visual exploration changes with development will depend on studying a wider array of complex stimuli (and real-world situations) and analyzing gaze behavior across different timescales. Furthermore, as the first study to compare eye movements across a large sample, wide age range, and large, diverse set of video stimuli, our results have broad implications for understanding infant and child viewing of screen-based media. Since media viewing is a common and frequent childhood occurrence, it is important to understand how changes in visual attention might contribute to children’s understanding of screen-based media. One implication is that the challenge children face in learning the ‘right’ features is more complex than previously thought—there is no ‘one size fits all’ solution because the relevance of different features is in constant flux. Still, our work raises potential avenues for designing media to improve comprehension. First, designers of children’s media could restrict how often particular features change in relevance over time to improve children’s comprehension. Second, children should benefit from scenes in which different types of features converge rather than compete (Wass & Smith, 2015; Amso et al., 2014) to reduce the pressure on prioritization. Future work should seek to test children’s learning from video clips that systematically vary the need to change prioritization of visual features over time to track key educational content. 2.7 Acknowledgements I acknowledge that Chapter 2 of this dissertation was published in Developmental Psychology in November of 2020 58 Chapter 3 Attention to hands during manual actions account for developmental increases in attentional synchrony 59 3.1 Abstract Consistency of attention, both spatially and temporally, increases with age from infancy into childhood, when viewing dynamic stimuli (Franchak et al., 2016; Kirkorian et al., 2012). We propose that developing increasingly synchronous visual attention involves improvements in prioritizing meaningful information at each moment. The current study is a secondary analysis of Kadooka and Franchak (2020) in which eye movements of 79 children (6 months to 10 years) and 20 adults were recorded as they viewed five dynamic videos. We tested whether infants’ and children’s synchronous attention to two meaningful locations, hands and hand-object actions, could account for age-related increases in adult-like attention. Improvements in looking to these hand features may indicate a convergence towards adultlike prioritization of meaning. Findings show that the degree to which infants and children look to hands and hand-object actions can account for increases in similarities to adult attention, beyond age alone. In considering the spatiotemporal variability of meaning in a scene, we suggest that attention development involves changes in looking to semanticallyrelevant information, broadly. 60 3.2 Attention to hands during manual actions account for developmental increases in attentional synchrony The visual attention system involves a complex process of prioritizing different information from moment to moment. When passively watching a scene unfold in the real world, we are constantly moving our eyes to direct our attention, roughly three times per second (Schiller, 1998). Looking out a window, an observer may allocate their attention towards a busy walkway, leaves fluttering in the wind, the face of a familiar acquaintance, the iridescent flash of a bluebird or any number of nearly limitless possibilities. Yet, with all these potential targets of attention, adults tend to show high eye movement synchrony in where they look when viewing dynamic scenes (Dorr et al., 2010; Franchak et al., 2016; Hart et al., 2009; Mital et al., 2011; Shepherd et al., 2010; Wang et al., 2012). This level of synchrony in attention is not present in infants. But over the first few years of life, infants increasingly develop greater synchrony with adults (Franchak et al., 2016; Kirkorian et al., 2012). How do infants and children achieve more adult-like gaze behaviors? As we will discuss in subsequent sections, there are inconsistent findings about whether attention to certain types of visual features contributes to the development of adult-like gaze. In the current study, we test whether two understudied features – hands and hand-object actions – can account for age-related increases towards greater adult synchrony. Hands are a rich source of social, communicative, and semantic information. Furthermore, prior studies indicate that infants and children increasingly extract meaning from hands: social communication and word learning via gestures and pointing (Tomasello et al., 2007; de Villiers Rader & 61 Zukow-Goldring, 2010), joint attention by following hands (Yu & Smith, 2013; Deak et al., 2014), the affordances of tools from grip positions (Barrett, Davis, & Needham, 2007), and the goals and intentions of actions (Woodward, 2009, 1998). It is unknown whether these abilities to understand the meaningfulness of hands translates to greater attentional synchrony to these features when watching dynamic scenes. If visual attention development involves better prioritization of meaningful information, then increased synchrony to these meaningful features could partially account for the age-related changes towards greater adult synchrony. 3.2.1 Developmental changes in synchronization of attention Adults exhibit highly correlated gaze behaviors in looking to similar places at similar times (Dorr et al., 2010; Franchak et al., 2016; Hart et al., 2009; Mital et al., 2011; Shepherd et al., 2010; Wang et al., 2012). By recording the eye movements of adults when watching the same stimulus, researchers can calculate similarity in gaze location compared to other observers across the duration of the stimulus, which is known as the inter-subject correlation or ISC. Higher ISCs indicate greater spatiotemporal similarity in gaze. Adults, when compared to other adults, show high ISCs across a wide range of stimuli. In commercially produced films, adult attention is similar likely because directors and editors have designed their films with the explicit goal of guiding attention to specific information (Dorr et al., 2010; Gannon & Grubb, 2022; Goldstein et al., 2007). But even in more naturalistic live-action stimuli (Shepherd et al., 2010; Franchak et al., 2016; Rider et al., 2018), animated videos (Rider et al., 2018), and virtual reality experiences (Farmer et al., 2021), adults show synchronous eye movements. 62 Past research reveals age-related increases towards more consistent eye movements when watching dynamic videos (Franchak et al., 2016; Frank et al., 2009; Kirkorian et al., 2012). One way to examine this is to measure developmental differences in the spatial distribution of gaze. Kirkorian and colleagues (2012) found that several metrics of spatial variability decrease with age (i.e., became more similar) when comparing the gaze of 1year-olds, 4-year-olds, and adults while watching a video from Sesame Street, a live-action children’s program. In a different study, Frank, Vul, and Johnson (2009), found a similar trend for 3-month-olds, 6-month-olds, 9-month-olds, and adults when watching Charlie Brown, an animated television series. Eye movements were defined by less spatial variability as age increased. However, less spatial variability does not necessarily mean greater synchrony with adults as it is possible that children may increasingly look to a feature that adults do not look at. Franchak et al. (2016) directly addressed this by comparing ISCs between infants and adults as they watched a Sesame Street music video. By calculating the ISCs between an infant observer and a comparison group of adults, these correlations measured how adult-like their eye movements were. Synchrony with adults increased with age for infants ranging from 6-months to 24-months old. Age significantly predicts increases in adult synchrony (Franchak et al., 2016), but age alone holds limited explanatory value in determining what is changing. Age-related changes to several features have been proposed to partially account for developmental changes in attention. In studies with dynamic stimuli, the influence of visually salient locations (e.g. color, motion, contrast) increased with age (Franchak et al., 2016; Frank et al., 2009; Rider et al., 2018). Other studies have identified age-related changes in atten- 63 tion to socially relevant features and faces (Franchak et al., 2016; Frank et al., 2009, 2012, 2014; Stoesz & Jakobson, 2014). However, the content of the scene cannot be ignored. Scene factors like the number of agents (Franchak et al., 2016; Frank et al., 2012; Stoesz & Jakobson, 2014), centering of faces (Franchak et al., 2018), and dialogue (Frank et al., 2014) all influence the relationship between attention to faces and age. This suggests that scene content influences what is most meaningful. Neglecting to account for this shifting relevance leads to an incomplete characterization of how attention is developing. Therefore, we posit that synchrony with adults is not a global developmental shift in attention to features, but rather a change in prioritization of features depending on their relevance. For instance, an attentional bias for looking at objects was observed for 4-6 year old children, yet adults only looked to objects if it was relevant for the task (Darby, Deng, Walther, & Sloutsky, 2021; Spelke, 1990). Furthermore, adult gaze behavior when viewing a static image is reliably predicted by judgements of semantic meaning (Henderson et al., 2007; Henderson, 2017; Henderson & Hayes, 2018). For dynamic scenes, the meaningfulness of any given feature may change from moment to moment. Indeed, examining smaller 10 second windows of time within video stimuli revealed that the direction of age-related differences in attention to both salient regions and faces were in constant flux (Kadooka & Franchak, 2020). In any given stimulus, sometimes adults may look more towards a feature, yet at other times, infants look more towards that feature. Developing adult synchrony occurs when infants and children become more adult-like in their concept of what features are relevant and when they are relevant. Importantly, age-related changes in attention to particular features are still helpful in accounting for developing adult synchrony, however, 64 particular consideration must be taken towards measuring attention when those features are meaningful. 3.2.2 Age-related changes in attention to hands and hand-object actions In the current study we selected two features that we predicted would account for age-related changes in synchrony with adults: hands and hand-object actions. Hand-object actions are times when hands actively interact with objects in moving, manipulating, or other goal-directed actions. These features were chosen for three reasons: a) hands and hand-object actions convey meaning, b) infants and children improve in their ability to detect meaning from these features, c) hands and hand-object actions are spatiotemporally identifiable. These characteristics make hands and hand-object actions good candidates for investigating adult synchrony of attention. If attentional synchrony involves becoming more adult-like in prioritizing meaning, then changes in ability to extract meaning from hands should lead to greater attentional synchrony with adults. Hands are meaningful as a primary way that our bodies act on the world. The wide range of prehensile dexterity in the human hands is a defining feature of our species that sets us apart from other primates (Napier, 1956). Hands serve social, communicative, and semantic functions. When hands and objects interact, additional information is communicated about the objects (P. Tseng et al., 2012), affordances (Klatzky et al., 1989), goals and intentions (Zacks & Tversky, 2001). An adult neuroimaging study has identified a hands-specific region in the extrastriate body area of the visual cortex that is sensitive to hands but not other non-hand body parts (Bracci, Ietswaart, Peelen, & Cavina-Pratesi, 2010), highlighting the importance of hands. 65 What evidence is there for age-related changes in attention to hands? A head camera study from Fausey et al. (2016) found that during everyday activities of infants between 1 and 24 months old, the distribution of hands in view shifts towards an increasing amount of hands, especially the hands of other people. One possibility is that this shift reflects normative motor development that occurs in the first two years of life, like reaching and independent sitting, which may structure daily experiences towards seeing more hands. It is also possible that this may be driven by an infant’s own active selection of hand stimuli. It is likely that both the ability to control attention and exposure to hand stimuli help to regulate attention to hands. Frank et al. (2012) found increases in looking to hands in dynamic stimuli for 3- to 30-month-olds especially when the stimulus contained manual actions with objects, indicating an active selection for certain hand stimuli. Hands may provide information and structure to visual scenes by guiding infants’ attention to locations that are meaningful especially within a social context. For instance, pointing and gesturing facilitate early word learning by offering a synchronous cue that connects language to objects in the world (de Villiers Rader & Zukow-Goldring, 2010; Tomasello et al., 2007). Evidence also suggests that looking to hands is important in the development of joint attention between caregivers and 1-year-olds, particularly during goal-directed actions with objects (Yu & Smith, 2013). When engaged in goal-directed actions, visual attention of the actor is tightly coupled with the action being performed. Therefore, when watching caregivers complete actions, infants can coordinate their attention spatially and temporally with the attention of their caregiver by looking to the hands. In fact, the gaze of the actor, hands, and objects may provide redundant spatiotemporal 66 information about attention that allows scaffolding for more sophisticated social abilities in joint attention and gaze following (Shepherd, 2010). During the first two years of life, infants develop the ability to extract information about tool affordances based on hand grip positions (Barrett et al., 2007) and are sensitive to goals and intentions based on their own and the observation of other’s actions (Woodward, 1998, 2009). It is clear that attention to hands increases with age and children are sensitive to the wide range of meaningful information that is conveyed by hands. However, it is not clear if these changes in the ability to extract information to hands leads to prioritizing hands as a meaningful region both spatially and temporally. Synchronizing attention to hands or handobject actions may provide a developmental scaffold for prioritizing meaningful information in ways that are more adult-like. Certainly, hands and hand-object actions are not the only features that influences adult synchrony. But the spatial discernibility of physical hands and hand-object actions over time makes these features easily definable compared to other spatiotemporally sensitive features that convey meaning like goal intentions (e.g. the location that an agent plans to place a object), distal referents (e.g. the referent in a point or gesture), or locations of an agent’s attention. For these reasons, we investigated whether synchrony in looking to hands (hand synchrony) and hand-object actions (handobject synchrony) can account for age-related increases in adult synchrony beyond age. 3.2.3 Current Study The primary aim was to test whether synchronous attention to hands and hand- object actions can predict changes in attentional synchrony with adults. We performed secondary analysis of data from Kadooka and Franchak (2020) in which eye movements 67 were recorded during children’s (6 months to 10 years) and adult’s free-viewing of complex, dynamic video stimuli. Past work characterizing visual attention highlights the need to capture the moment-to-moment prioritization of what is meaningful. Averaging attention to a feature across an entire stimulus may artificially suppress effects if those features were not meaningful over the entire duration of the stimulus. To address this, we selected two features (hands and hand-object actions) and determined the times that each feature occurred on screen. Past studies have also recognized that different scene content may lead to different attention patterns (Franchak et al., 2016; Frank et al., 2009). Explanations of visual attention development that are specific to the stimulus are limited in utility. Therefore, our use of the previously collected dataset takes advantage of the long duration (two-minute videos) and variability (5 different live-action child friendly videos) of stimuli to ensure that detected effects are invariant to stimuli. First, we replicate past work on the age-related changes in adult synchrony. As a robust trend (Frank et al., 2009; Kirkorian et al., 2012) and direct methodological replication (Franchak et al., 2016), we expect to find attentional synchrony with adults will increase with age. In other words, infants and children will increasingly look to the same locations at the same time as adults. Past work in attentional development has found logarithmic patterns associated with age in which rapid changes in infancy are followed by slower changes into late childhood and beyond (Amso et al., 2014; Kadooka & Franchak, 2020). Therefore, a logarithmic model of age will be tested. If our expectations are met, adult synchrony will be used as a baseline to determine the unique contribution of other age-related factors above and beyond age. 68 Second, we will assess whether attention to hands can predict adult synchrony. Although past work shows hands attract attention as a meaningful and socially-relevant location for communication, gestures, joint attention, and manual actions (Bertenthal & Boyer, 2012; Frank et al., 2012; Fausey et al., 2016; de Villiers Rader & Zukow-Goldring, 2010; Yu & Smith, 2013), we calculate a novel measure of attention to hands—hand synchrony—that describes the degree to which infants and children follow hands in the scene. We predict that hand synchrony will predict adult synchrony above and beyond age alone. In other words, attention to hands will increase with age, regardless of the stimulus, because hands are a meaningful feature of scenes. Lastly, we will test whether attention to hands interacting with objects can predict adult synchrony. Hand-object synchrony will provide a measure of attention that is sensitive to the information cued when agents engage in manual actions with objects. Past research has identified that infants and children become increasingly capable of extracting information from hand-object actions like joint attention, affordances, and goals (Barrett et al., 2007; Shepherd, 2010; Woodward, 1998; Yu & Smith, 2013). However, it is unexplored whether the ability to extract information means more attention to hand-object actions in dynamic scenes. We predict that changes in hand-object synchrony will account for changes in adult synchrony above and beyond age alone. 69 3.3 Methods 3.3.1 Participants The current study involves secondary analysis of prior research (Kadooka & Fran- chak, 2020) on the developmental changes in visual attention of children (6 months to 11 years) and adults when watching videos. The original sample consisted of 79 children and 10 college-aged adults. Data from an additional 10 adults were collected as part of the current study to serve as a comparison group. In total, 79 children and 20 adults were used as participants for this study. For detailed information about the distribution of participant ages and exclusion criteria, see the original paper (Kadooka & Franchak, 2020). Participating families were recruited from the Riverside County area and received a book or small toy and $10. Adult participants were undergraduates from the University of California, Riverside who received credit towards completing a course requirement. Participating children were identified by their caregiver as Black/African American (n = 1), American Indian/Alaskan Native (n = 4), non-Hispanic White (n = 23), Hispanic or Latino(a)/White (n = 30), and more than one race (n = 21). Adult participants identified as Black/African American (n = 2), Asian (n = 8), non-Hispanic White (n = 1), Hispanic or Latino(a)/White (n = 8), and more than one race (n = 1). All participants had normal or corrected-to-normal vision. Written informed consent was provided by adult participants and parents or legal guardians of child participants. Approval for the study was given by the Institutional Review Board of the University of California, Riverside. 70 3.3.2 Stimuli Eye tracking data from five of the seven original video stimuli were selected for secondary analysis. Each of the seven videos were chosen to present 2 minutes of diverse, child-friendly scene content with no cuts and limited graphical animations. One video was excluded because it contained too few instances of hands in the scene and the other was excluded because it contained over 30 hands from 18 agents which were too widely distributed on screen to differentiate attention to hands vs non-hand areas. The five remaining videos included two Sesame Street videos, two music videos, and a children’s science demonstration. Figure 3.1 provides an exemplar screen shot and brief description of the video stimuli. The original audio tracks were replaced with children’s instrumental music to isolate the role of visual information. Videos of the stimuli with overlaid gaze from participants is available on Databrary (https://nyu.databrary.org/volume/1007). Participants in the final sample provided data for a minimum of one video but could provide data for up to five videos. The average number of videos of sufficient data per participant was M = 4.65 (SD = 0.82). 3.3.3 Apparatus Participant eye movements (right eye only) were recorded using an Eyelink 1000 Plus remote eye tracker (SR Research Ltd.) at a temporal resolution of 500 Hz. Videos were presented at 30 Hz. on a 43.2 cm (diagonal) monitor which subtended a visual angle of 31° x 19°. The monitor and eye tracker were mounted on an adjustable arm to facilitate different viewing heights and positions. 71 Video 1 Video 2 Video 3 Video 4 Video 5 One human actor and four Muppets sing and dance to a song about counting to four Four human actors perform a choreographed routine with trained dogs Two human actors participate in a science demonstration about the properties of frozen carbon dioxide Four human actors perform acrobatic stunts with objects in a reduced gravity aircraft Five human actors take turns singing about counting to five Figure 3.1: Exemplar screenshots and brief description of the five stimuli videos 3.3.4 Procedure Participants sat in a room separated from the experimenter. Infants sat in a highchair and were secured with a harness to reduce body movement. Caregivers sat behind 72 the highchair and were instructed to not interact with the infants. Adults and children sat in a chair facing the monitor. A target sticker was placed on the forehead to facilitate eye position tracking and distance estimation by the eye tracker. Angle and distance of the eye tracker was adjusted to be 60 cm from the participant’s eyes. All participants completed a 5-point calibration and subsequent validation procedure to ensure that calibration accuracy was 1.5° of error or less. If necessary, the calibration and validation was repeated until sufficient calibration was met. The video stimuli were then presented in a randomized order. Prior to each video starting, participants were required to fixate a gaze-contingent trigger target located at the center of the screen. This ensured that participant gaze always started in the center of the stimulus. The quality of infant eye tracking data is often lower than adults’ (Wass et al., 2013) which can impact accuracy (error between the true point of gaze and reported gaze) and precision (variation in reported gaze over successive samples). In our sample, average spatial errors across all ages were less than 1° (M = 0.53°, SD = 0.25) but were negatively correlated with age (r = -.27, p = .006). Yet, this age disparity only resulted in a difference of 0.22° between adults and the youngest infants which suggests minimal impacts to data quality. Following the method from Wass, Forssman, and Leppänen (2014), average precision across age was calculated (M = 1.65°, SD = 0.33) and was also negatively correlated with age (r = -0.20, p = .048). Similar to accuracy, the difference of 0.18° was minimal between adults and infants. However, concerns about fixation-detection algorithms for lower quality data in infants (Wass et al., 2014) led us to avoid fixation detection out of caution. 73 3.3.5 Data processing For each participant, raw eye tracking data was extracted as a time series of horizontal and vertical gaze coordinates for the duration of each stimulus. Periods spent with the eye occluded, eye closed, or looking off screen were excluded from analyses. Adult synchrony Attentional synchrony is the degree to which there is spatiotemporal consistency in gaze behaviors. Synchrony with adults’ gaze provides a baseline for assessing predictive factors. To calculate attentional synchrony with adults, participant eye movements were compared to a group of ten comparison adults. By using a comparison group, this ensures that all participants were compared to the same independent group of adults. Following the metric used by Franchak and colleagues (2016), inter-subject correlations (ISCs) were calculated for each participant paired with every adult in the comparison group to determine adult synchrony for each video. For each pair of observers, ISCs were calculated by: 1) calculating the correlation coefficient for the vertical time series, only for samples at which both observers have valid data, 2) calculating the correlation coefficient for horizontal time series, only for samples during which both observers have valid data, and 3) averaging the horizontal and vertical correlation coefficients. An individual participant’s adult synchrony is the average ISC between the participant and each of the ten comparison adult observers. As a correlation, ISCs are bounded between -1 and +1, therefore adult synchrony scores closer to 1 indicates greater spatiotemporal similarity with adults. 74 Hand synchrony To capture hand synchrony, we used dynamic area of interest (AOI) analyses. For each video, elliptical AOIs were drawn around the hands of each agent using Dataviewer Software (SR Research Ltd.). AOIs changed in size and location to accommodate character and camera movement across all frames in which the hands were in view. Across the five videos, 44 hands belonging to 22 agents were coded in this way. On average, hands were visible 84% of the time. However, this varied from video to video, with visible hands ranging from 62% to 100% of the video. Table 3.1 contains a breakdown of time that hands were visible on screen for each stimulus. It is important to note that while hands were prevalent, the average percent of time that an individual hand was visible in a stimulus was 35% (min = 1%, max = 98%) and was often discontinuous. In Figure 3.2, the blue color shows when hands of two agents were visible for the 2 minute duration of a Video 3. While hands were visible for the entire duration, this does not mean that every hand was visible at all times. 75 Table 3.1: Percentage of time hands and hand-object actions were visible for each video Agent 1 Video 1 Video 2 Video 3 Video 4 Video 5 Hands visible 68.94% 92.92% 100% 98.07% 61.65% Hand-object actions visible 37.11% 61.67% 72.81% 42.88% No hand-object actions Left hand Left hand with object Right hand Right hand with object Agent 2 Left hand Left hand with object Right hand Right hand with object Duration of stimulus Figure 3.2: Exemplar visualization of times when hands and hand-object actions are visible for two agents in Video 3. Blue horizontal bars indicate when hands are visible for each hand. Green horizontal bars indicate when hand-object actions are visible for each hand. For each hand AOI, a time series of Cartesian coordinates (X,Y) defined the ellipse of the AOI in units of pixels. To calculate hand synchrony, each participant’s gaze data was then correlated with the center of the nearest hand AOI. This was determined by the shortest Euclidean distance in 2-dimensional space between gaze and the centers of any present AOIs. Thus, this measure could account for simultaneous hand AOIs and capture dynamic changes in attention to different hands from moment to moment. Similar to calculating synchrony with adults, correlation coefficients in the horizontal and vertical dimension were calculated for all times in which hand AOIs existed and participants had 76 valid gaze on screen. Correlation coefficients in both dimensions were averaged to produce a single measure of hand synchrony for each participant in each video. Higher correlation values indicate greater hand synchrony. Hand-object synchrony Hand synchrony when interacting with objects provides a spatiotemporal cue about the manual actions of agents. We coded the times when each hand was interacting with an object. Interacting with an object involved being physically in contact with an item that could be carried or moved. Hands interacting with the agent’s own body/clothing, other agents, immovable furniture, or surfaces like walls and floors were not counted as manual actions with objects. One out of the five videos did not contain manual actions, therefore analyses involving hand-object synchrony only included four videos. Across the four videos, manual actions occur on average 54% of the time but range from 37% to 73% depending on the video (see Figure 3.1). Of the periods that hands were in view, 59% involve manual actions. Figure 3.2 provides a visual representation of periods of time when hand-object actions were occurring in green. The process of calculating synchrony with hand AOIs was repeated but only for portions of the videos in which a participant provided valid gaze and hands were interacting with objects. As before, if there were simultaneous hand AOIs, gaze was correlated to the center of the nearest hand AOI. Correlation coefficients in the horizontal and vertical direction were averaged for each video to produce scores of attentional synchrony to hands when interacting with objects. Hand-object synchrony scores closer to 1 indicate greater attentional consistency towards hand-object actions. 77 3.4 Results Three sets of analyses were performed in order to assess what predicts attentional synchrony with adults (adult synchrony) across diverse video stimuli. First, to replicate past work (Franchak et al., 2016), we tested whether adult synchrony increased with age. Second, hand synchrony was added as a predictor to determine its explanatory power of adult synchrony above and beyond age alone. Third, hand-object synchrony was tested as a predictor of adult synchrony. Due to only four videos containing hand-object actions, the first and second analyses were repeated with a subset of data including only the four videos so that contribution of hand-object synchrony in variance reduction could be identified relative to age and hand synchrony. Statistical analyses were conducted in R (R Core Team, 2017). Linear mixedeffects models (LMMs) were constructed using the ’lme4’ package (Bates, Mächler, Bolker, & Walker, 2015) to predict adult synchrony. The current data is structured such that participants viewed multiple videos yet they may have only contributed data to a subset of the videos. LMMs are well suited for modelling this type of repeated measure structure to account for similarities within participants and within videos. 3.4.1 Attentional synchrony with adults increases with age Prior work has demonstrated logarithmic changes in attention development (Amso et al., 2014; Kadooka & Franchak, 2020) which rapidly progress during infancy but slow as individuals enter late childhood. Therefore, log-transformed age was used to predict adult synchrony. Preliminary analyses confirm non-transformed age led to poorer fit compared to 78 log-transformed age. A LMM with log-transformed age and random intercepts for participant and video confirmed prior findings (Franchak et al., 2016; Frank et al., 2009; Kirkorian et al., 2012) that age significantly predicts adult synchrony (beta = 0.03, p < .001). Table 3.2 shows fixed and random effects for this model under Model Age - 5 Video. Eye movements become more adult-like with age following a logarithmic curve for all videos as seen in Figure 3.3. Adult synchrony More adult-like Less adult-like 0.4 Video 1 Video 2 Video 3 0.2 Video 4 Video 5 0.0 0 5 10 15 Age (years) 20 Figure 3.3: Changes in adult synchrony as a function of age. For comprehensibility, logarithmic functions are plotted for each stimulus video 79 3.4.2 Hand synchrony predicts adult synchrony Overall, attention was fairly synchronous with hands (M = .49) but varied widely (SD = .21, min = -.16, max = .80). Hand synchrony was positively correlated with adult synchrony (r = .40, p < .001). However, this relationship was stronger for some video than others (Figure 3.4). To determine whether attention to hands could account for the age-related increases in adult synchrony, hand synchrony was added as a fixed effect to the LMM constructed in the prior section. The fixed effect of log-transformed age and random intercepts for participant and video remained in the model. A model with random slopes for video failed to converge and was not included. Hand synchrony significantly predicted adult synchrony (beta = 0.17, p < .001) indicating a positive correlation with adult synchrony. Log-transformed age remained significant (beta = 0.03, p < .001). Table 3.2 shows fixed and random effects estimates for this model under Model Hands - 5 Video. Importantly, hand synchrony significantly improved model fit beyond age alone X2 (1) = 22.69, p < .001). Table 3.2: Comparison of linear mixed-effect model predicting attentional synchrony with adults (adult synchrony) from log-transformed age and hand synchrony. Random effects of subject and video estimate standard deviation (SD) of parameters. Model Age - 5 Video Model Hands - 5 Video Fixed Effects Predictor B SE p B SE p Intercept log(Age) Hand Synchrony 0.009 0.035 - 0.043 0.003 - 0.83 < 0.001 - -0.056 0.032 0.176 0.041 0.003 0.036 0.196 < 0.001 < 0.001 Random Effects (SD) Subject Video 0.016 0.073 - - 0.019 0.083 - - AIC -1165.5 - - -1186.2 - - 80 More adult-like 0.4 Adult synchrony Video 1 Video 2 Video 3 0.2 Less adult-like Video 4 Video 5 0.0 −0.2 0.0 Less synchrony 0.2 0.4 Hand synchrony 0.6 0.8 More synchrony Figure 3.4: Relationship between hand synchrony and adult synchrony. Each circle represents a single participants observation for one video stimulus. The overall effect of hand synchrony on adult synchrony is plotted in black. For additional comprehensibility, correlations are plotted for each stimulus video 3.4.3 Attentional hand-object synchrony predicts adult synchrony Only 4 videos contained hand-object actions. Among those videos, hand-object synchrony averaged M = .41, but ranged from a minimum of -.25 to a maximum of .72 (SD = .18). Similar to hand synchrony, hand-object synchrony was correlated with adult synchrony (r = .61, p < .001). Although a positive correlation was consistent across all videos, Figure 3.5 shows variation in this relationship by video. To determine the contribution of handobject synchrony as a fixed predictor relative to log-transformed age and attention to hands, three nested models were compared. In all models, age-related increases in adult synchrony 81 were predicted by the fixed effect of log-transformed age and the random intercepts for participant and video. No additional parameters were added to Model Age. Model Hands added hand synchrony as a fixed effect to Model Age. Model Hands-Object added handobject synchrony as a fixed effect to Model Age. Table 3.3 provides model estimates and model fit (AIC). The effect of hand-object synchrony was significant (beta = 0.153, p < .001) and indicated a similar positive correlation with adult synchrony. Both hand synchrony and hand-object synchrony improved model fit when added to Model Age (X2 (1) = 43.46, p < .001, X2 (1) = 20.53, p < .001, respectively). While model fit was higher for Model Hands over Model Hands-Object, neither were able to explain more variance than the other. Adult synchrony More adult-like Less adult-like 0.4 Video 1 Video 2 Video 3 0.2 Video 4 0.0 −0.25 0.00 0.25 0.50 Hand-object synchrony Less synchrony 0.75 More synchrony Figure 3.5: Relationship between hand-object synchrony and adult synchrony. Each circle represents a single participants observation of one video stimulus. The overall correlation is plotted in black. 82 Table 3.3: Linear mixed-effect model comparison predicting attentional synchrony with adults (adult synchrony) from three models. Log-transformed age, hand synchrony, and hand-object synchrony as fixed effects. Random effects of subject and video estimate standard deviation (SD) of parameters. Model Age Model Hands Model Hands-Object Fixed Effects Predictor B SE p B SE p B SE p Intercept log(Age) Hand Synch Hands-Obj Synch 0.015 0.038 - 0.042 0.003 - 0.729 < 0.001 - -0.121 0.032 0.306 - 0.051 0.003 0.044 - 0.0455 < 0.001 < 0.001 - -0.015 0.034 0.153 0.036 0.003 0.033 0.656 < 0.001 < 0.001 Random Effects (SD 3.5 ) Subject Video 0.016 0.070 - - 0.008 0.085 - - 0.014 0.054 - - AIC -902.30 - - -943.75 - - -920.84 - - Discussion In summary, we measured age-related changes in attentional synchrony with adults and investigated whether attentional synchrony with two features, hands and hand-object actions, can account for age-related differences in synchrony with adults. Our results confirm past findings (Franchak et al., 2016) of age-related changes towards increasingly synchronous eye movements that were robust across five varied and dynamic stimuli. Furthermore, both hand synchrony and hand-object synchrony predicted adult synchrony above and beyond age alone. The degree to which infants and children look to hands and hand-object actions reliably accounts for similarities in how those observers prioritize meaningful information. Here we will discuss the implications of these findings and a possible mechanism for the changes observed. Our findings support the perspective that development of visual attention involves learning to better prioritize meaningful features in ways that are increasingly more adult- 83 like. Hands and hand-object actions were not always present in the scenes, yet there is a strong relationship between attention to these features and adult synchrony. This suggests that looking to hands and hand-object actions may be one of many attentional behaviors that develop together towards more adult-like prioritization of meaningful features. In other words, it is likely that the degree to which an observer synchronizes attention to hand-object actions is also related to more general attentional biases that occurs when hand-object actions are not occurring. Becoming more adult-like in looking to hands may be helpful in developing key experiences and abilities that are developmentally appropriate. Extracting information from manual actions and grip positions during infancy coincides with infants being motorically ready to perform grasping actions of their own (Libertus & Needham, 2010; Needham, Barrett, & Peterman, 2002). Similarly, developing joint attention to objects by looking to hands occurs around when infants are learning names for objects (Yu & Smith, 2011). The development of looking to hands and hand-object actions may be developmentally gated to aid in the timing of these experiences. In other words, attention, comprehension, and the motor system may work together to structure the visual environment (L. B. Smith, Jayaraman, Clerkin, & Yu, 2018; Yoshida & Fausey, 2019) by increasing attention when the infant is ready to comprehend information, in this case manual actions. Too soon, and the infant is provided visual experiences that add unnecessary complexity, too late, and the infant may miss out on critical information. We identify two plausible and non-mutually exclusive mechanisms that would both be supported by our findings. The first mechanism involves maturation of general attention 84 skills. From this account, development in prioritizing hands, hand-object actions, and adultlike prioritization of meaningful regions is the result of improvements in general attentional control. Attentional control refers to the ability to orient and select visual information while also inhibiting attention to other features (Colombo, 2001). Infant attention to static images are impacted by their attentional selection and orientation abilites (van Renswoude, Visser, Raijmakers, Tsang, & Johnson, 2019) and attentional control may be even more relevant in dynamic stimuli when orienting and selecting information is time sensitive. Looking to hands and hand-object actions could be the result of actively selecting these features that convey important information in the moment. Likewise, developing attentional control would involve suppressing attention to other features in favor of hands. Improvements in attentional control are observed throughout infancy and childhood into at least the early teenage years (Oakes & Amso, 2018; Ruff et al., 1998; Paus, Babenko, & Radil, 1990; Aring, Grönlund, Hellström, & Ygge, 2007). Further, this general ability develops rapidly during early infancy which could account for the logarithmic pattern in developing adult synchrony. The other plausible mechanism involves improvements in comprehending a scene’s meaning. Through social input, exploration, and experience, infants may learn to better comprehend the content of a scene. As previously described, interactions with adults lead to experiences in which attention is guided towards meaningful locations like hands and hand-object actions (Yu & Smith, 2013). With repetitive and redundant encounters with cues that signal meaningful information, infants and children may modulate their prioritization as they increasingly comprehend the importance of social partners, hands, faces, 85 eyes, affordances, and information that is relevant to their past experiences. This account is also plausible in describing the results of this study. Most likely, both mechanisms are at play in the development of visual attention. Comprehension and attentional skills interact and recursively improve the ability to prioritize meaningful information. Both these mechanisms lead to broad cascading changes across development, which makes it difficult to establish temporal precedence. Individual differences in attentional control are associated with several achievements including word learning (Yu & Smith, 2011), walking ability (Mulder, Oudgenoeg-Paz, Verhagen, van der Ham, & Van der Stigchel, 2022), and executive function (Veer, Luyten, Mulder, van Tuijl, & Sleegers, 2017). It is entirely possible that any of these achievements could be a driving force behind developing adult-synchrony by offering greater comprehension of the visual world, rather than attentional control alone. Similarly, improvements in comprehension comes from gaining experience across a wide variety of knowledge; narrative structure (Kirkorian et al., 2012; Kirkorian & Anderson, 2018), goal-directed actions (Flanagan & Johansson, 2003), film techniques (Kirkorian & Anderson, 2017), language (Anderson et al., 1981), and regularities in scenes (Helo et al., 2017) all influence attention. But better attentional control may lead to voluntary detection of this information. 3.5.1 Limitations and future direction Our findings indicate a strong influence of attention to hands that changes with age. While hands were selected as a semantically relevant feature, it is possible that hands were visually salient. Other features that have been selected as ‘top-down’, like faces, tend to contain regions of high salience (Henderson et al., 2007; Torralba et al., 2006; Wass 86 & Smith, 2015). However, this is unlikely in our case given that, in preliminary analyses, salience was unable to consistently account for changes in adult synchrony. Further, no agerelated effect was detected in the the original data analysis (Kadooka & Franchak, 2020). Incorporating low-level information, in addition to other features, would help to deepen understanding of visual attention and explain changes in adult synchrony. Additionally, knowing how attention changes for particular features means that future studies could manipulate infants’ prioritization of meaningful information with targeted manipulations of comprehension. In this study we used a measurement, adult synchrony, that indexes the development of visual attention irrespective of features. Although this metric is spatiotemporally sensitive to changes in meaning, there are some limitations. Calculations of ISCs are likely confined to screen-based stimuli because it requires that observers view the same stimulus. Unfortunately this would limit the ability to test other potentially important changes in visual attention beyond the screen like motor ability or navigation. However, newer technology like virtual reality and head mounted displays (Farmer et al., 2021) with built-in eye tracking function could present identical stimuli. Head-mounted eye tracking would require constructing reproducible visual scenes in real life which is plausible but difficult. 87 Chapter 4 Attentional synchrony when viewing manual actions 88 4.1 Abstract Developing adult-like visual attention involves changes in synchronizing gaze to meaningful information at the right time. How do children develop the ability to prioritize meaningful information from moment-to-moment? We propose that perceptual experiences allow for increased comprehension of the information that conveys the most meaning. For manual actions, the meaningful information occurs when hands interact with objects. In the current study, we tested whether prior viewing of a novel manual action could change attention of 4-years-olds towards more adult-like prioritization of hands-object actions. Participants either viewed a live demonstration of a manual action or talked about the objects involved with the same action. Eye movements were recorded during a subsequent screen-based viewing of the action. Results show that prior opportunity to view the action did not change attention towards greater synchrony with adults or greater attention to hands-object actions. Comprehension and attention was similar between groups and was largely unaffected by a prior visual experience. However, synchrony with adults was significantly correlated with attention to hand-object actions. More effective manipulations of comprehension are discussed. 89 4.2 Attentional synchrony when viewing manual actions The development of visual attention in infants and children involves a progres- sion towards more synchronous adult-like eye movements when watching dynamic stimuli (Franchak et al., 2016; Kadooka & Franchak, 2020, in prep). In other words, where children choose to allocate their gaze becomes increasingly similar to adults as they get older. Our past work suggests that this attentional synchrony is the result of improvements in prioritizing the meaningful information that is conveyed in a scene from moment-to-moment (Kadooka & Franchak, 2020). For instance, looking towards an actor’s hands interacting with objects can account for changes in attentional synchrony with adults beyond age alone (Kadooka & Franchak, in prep). We propose that one possible route to developing more adult-like visual attention involves gaining experiences that improve scene comprehension leading to changes in how observers prioritize meaningful information. In the context of manual actions with objects, greater understanding of an action has been shown to change visual attention to handobject interactions (Filippi & Woodward, 2016; Hard, Meyer, & Baldwin, 2019; Rotman, Troje, Johansson, & Flanagan, 2006). Experiences that allow for better understanding of an action result in a feedback loop in which greater comprehension supports subsequent attention to meaningful information, which in turn provides a deeper understanding of the action. However, it has not been tested whether this reciprocal relationship specifically leads children’s eye movements to better synchronize with adults’ eye movements. That is, can providing an opportunity to extract meaning from an action lead to increased adult 90 synchrony? If so, are there changes in looking towards the locations where hands and objects interact? 4.3 Developmental increases in adult synchrony Adult synchrony is the degree to which an observer’s eye movements correlate with the gaze behavior of adults (Franchak et al., 2016). This comparison is measured as an inter-subject correlation (ISC) which is the correlation of eye gaze location coordinates over time when viewing the same stimulus. The utility of measuring adult synchrony comes from the reliable overlap in where adults look, both spatially and temporally. A high level of synchrony persists across a wide range of dynamic stimuli including animated cartoons, naturalistic live-action videos, and Hollywood-produced films (Dorr et al., 2010; Franchak et al., 2016; Gannon & Grubb, 2022; Hart et al., 2009; Mital et al., 2011; Shepherd et al., 2010; Rider et al., 2018; Wang et al., 2012). This is to say, adult synchrony is a metric of attention that indexes how adult-like an observer’s eye movements are. Importantly, adult synchrony is agnostic to expectations of where researchers think attention should be allocated. Of course, this does not mean that adults look to exactly the same place at every moment, but rather, there is considerable convergence between different adults’ attention. How does age relate to adult synchrony? Eye movements of infants and children become increasingly adult-like with age (Franchak et al., 2016; Kadooka & Franchak, 2020). A large body of developmental literature has identified age-related changes in visual attention to scene features including saliency, faces, bodies and hands (see Kadooka and Franchak (2020) for a more thorough review). However, in our prior work (Kadooka & 91 Franchak, 2020), we provide evidence that the development of adult synchrony is not based on global changes in attention to features but rather driven by improvements in prioritizing meaningful information from moment to moment. At one moment, adults may prioritize looking to a face of a central character, but within a few seconds adults’ attention may shift towards a gesture of an entirely different person. Measurements of global attention that are based on single features, like faces, may fail to capture the spatiotemporal changes in relevance to many different features. On the other hand, adult synchrony is an index of similarity between a given observer and adults’ actual spatiotemporal allocation of attention. Attentional synchrony is achieved when there is agreement with adults on how to prioritize information within dynamic scenes. The dynamic nature of the visual world means that different features may be prioritized depending on what is meaningful in that moment. As a scene unfolds, the importance of a person’s face may wane after that person points towards a toy and grasps it. This emphasis on understanding the influence of ‘meaning’ on attention is supported by Henderson et al. (2017) in which ‘meaning maps’ were computed to measure the semantic relevance of static images. Meaning maps were created by asking crowd-sourced raters to judge the meaningfulness of many overlapping image regions which were then used to create a ‘map’ that evaluates the spatial distribution of meaningfulness. In several variants of this methodology, they have shown that regions rated as more meaningful receive greater attention (Henderson, 2017; Henderson & Hayes, 2018; Rehrig et al., 2020). 92 4.4 Comprehension drives prioritization of meaning How does comprehension relate to attention? There is likely a reciprocal rela- tionship between what an observer understands and how they allocate their attention to meaningful information. When comprehension is high, attention should be allocated to semantically-relevant information that is important for understanding the scene. Because the relationship between comprehension and attention should progress together, one approach to disentangling these constructs involves violating semantic expectations. Helo and colleagues (2017) compared adults and 2-year-olds as they viewed scenes with irregularities (e.g. a bar of soap on a kitchen table) and scenes without irregularities. Adults, but not 2-year-olds, were sensitive to regularities in the environment and looked longer to the irregular objects. Similarly, when adults view static scenes that contain ‘impossible’ floating objects (e.g. a saucepan hovering in a kitchen), greater attention is allocated to these impossible objects (Võ & Henderson, 2009). Observers use their comprehension to determine meaningful information that should receive more attention. However, selection of meaningful information also implies inhibiting attention to information that is not meaningful. When preschool-aged children were presented with TV show content that was edited to be incomprehensible based linguistic or narrative manipulations, look duration was shorter and gross attention decreased compared to comprehensible scenes (Anderson et al., 1981). Information that is less comprehensible and meaningful should not receive as much attention. However, modulating attention based on meaning requires the observer to be sensitive to the difference. Infants younger than one year old look to comprehensible and incomprehensible scenes for the same amount of time (Pempek et al., 2010). Therefore, 93 developmental improvements in comprehension helps children to determine and attend to meaningful information. Likewise, attention is allocated to the information that is most meaningful for comprehending the scene. The relatedness of comprehension and attention through the selection of meaningful information provides context for age-related differences in attention to hand-object actions previously observed (Kadooka & Franchak, in prep). When infants, children, and adults viewed live-action videos, greater spatiotemporal agreement between their gaze and the locations of hand-object actions accounted for attentional synchrony with adults. Attentional synchrony with adults occurs as part of a feedback loop in which comprehension refines our attention towards meaningful information and this refinement towards attending to meaningful information allows us to notice or grasp new comprehension of the visual world. In the following section, we will describe this feedback loop within the context of manual actions. 4.5 Attending to and comprehending manual actions To understand the relationship between attention and comprehension in manual actions, we need to know the meaningful information that exists when hands and objects are interacting. The action literature provides several featural and structural avenues that convey meaning. For instance, goal construal is signaled by the path of the hands (Sommerville et al., 2005), intentionality is signaled by hand trajectories (Cannon & Woodward, 2012), and object affordance is signaled by hand shapes (Barrett et al., 2007; Ambrosini et al., 2013). 94 A more global indicator of proficient attention and comprehension is the ability to make predictive eye movements to the goals of actions. For instance, when observing simple actions, like moving an object from one location to another, adults’ eye movements move along the path of the hands ending in a predictive look to where the object will be placed ∼150 ms before the hand arrives at that goal location (Flanagan & Johansson, 2003; Rotman et al., 2006). Differences in predictive looks between knowledgeable and naive observers provide insight on how changes in comprehension can change attention. Comparisons of adults’ eye movements between predictable and unpredictable actions show that looks to the goal are delayed when the observer is uncertain of the goal (Rotman et al., 2006). However, adults quickly recover by using other features like trajectory of the hand to reestablish a prediction. When faced with unpredictability, adults’ recruitment of other meaningful information likely reflects an accumulation of experience. Prediction of action goals helps identify the developmental changes in action comprehension and attention. In a study by Kochukhova and Gredebäck (2010), past familiarity with actions led to differences in predictive looks for 6-month-olds, 10-month-olds, and adults. When observing an actor use a spoon, an action that all age groups were familiar with, predictive looks to the mouth were observed for all ages. However, when the actor used a comb, only adults made predictive looks to the head, reflecting adults’ familiarity with this action. Infants made reactive looks to the head in which their gaze followed the hand. In another study involving eating-utensil actions, Swedish and Chinese 8-month-olds made predictive looks to the mouth only for utensils that were familiar to them (Green, Li, Lockman, & Gredebäck, 2016). Chinese infants predicted the action only when chopsticks 95 were used, but Swedish infants made predictive looks to the mouth only when a spoon was used. Importantly, infants in this study were at an age in which they had been fed with these utensils but were not yet able to use them on their own. While motor abilities may not be required to develop comprehension of actions, the importance of motor abilities is well established in the literature for adults and infants. Expert adult athletes, compared to novices, are better able to predict trajectories of thrown projectiles based on human movements (Moore & Müller, 2014; Williams, Ward, Knowles, & Smeeton, 2002). The ability to detect statistical regularities in the goals of actions were correlated with 8- to 11-month-olds ability to perform those actions (Monroy et al., 2017), and prediction of goals to grasping actions correlates with 4- to 10-month-olds ability to perform grasping actions (Kanakogi & Itakura, 2011). 4.6 Manipulating comprehension and attention in manual actions through action experience The feedback loop involved in comprehension and attention is directly related to the experiences of the observer. Repeated action experiences (either viewing or doing an action) provide opportunities to attend to meaningful information and build on prior comprehension. In previously discussed work, differences in age or culture stood in for these experiences (Green et al., 2016; Kanakogi & Itakura, 2011; Kochukhova & Gredebäck, 2010; Monroy et al., 2017). However, direct manipulations of exposure to experiences are effective at influencing comprehension and attention. These training studies measure the impact of providing opportunities to experience actions. In the “Sticky Mittens” paradigm, 3-month- 96 old pre-grasping infants are trained to ‘grasp’ objects with the use of Velcro-lined mittens, prior to when infants typically learn to intentionally grasps objects. Studies using this paradigm have shown that training leads to several cascading effects in infants’ subsequent attention, comprehension and action, including greater visual and object exploration, better understanding of reaches as goal-directed, and higher sensitivity to changes in the goals of an actor’s reach (Libertus & Needham, 2010; Needham et al., 2002; Sommerville et al., 2005). Another study using a similar design has shown improvements in 13-month-olds ability to predict actions based on observed hand shapes and hand kinematics (Filippi & Woodward, 2016). By gaining specific experiences, infants improve comprehension of meaningful aspects of actions and increase sensitivity of meaningful information when attending to actions. Combining past work, manual actions provide an avenue for understanding how experiences change comprehension and attention. The reviewed literature on manual actions has focused on changes in infants’ and children’s sensitivity to features of action based on coarse measurements of attention like looking time or dwell time. Therefore it is unknown if action experience changes attention towards greater adult synchrony, specifically. Providing opportunities for observers to learn about an action may improve their sensitivity and prioritization of meaning in ways that are more similar to adults. 4.7 Current Study In the current study, we tested whether action observation increased children’s attentional synchrony with adults when viewing that action. The eye movements of 4-yearolds and a group of comparison adults were recorded while viewing a novel action after they 97 had observed either a live demonstration of the action or an irrelevant experience. Because attention and comprehension of action is tied to prior experiences and abilities (Kanakogi & Itakura, 2011; Needham et al., 2002; Rotman et al., 2006), we selected an action that was novel and unfamiliar to 4-year-olds. By providing an opportunity to observe the action, we expected that this may allow 4-year-olds to identify meaningful information involved in the action, particularly the ways in which hands interact with objects. As a result, we predicted that children who had a relevant action experience would have higher adult synchrony than those with an irrelevant experience. Additionally, we predicted synchronous attention to hand-object actions would mediate this effect, since attention to hand-object actions has been closely related to adult synchrony in our prior work (Kadooka & Franchak, in prep). 4.8 Method 4.8.1 Participants The final sample consisted of 10 college-aged adults (4 female, M = 19.69, SD = 1.27 ) and 26 4-year-olds (15 female, M = 3.94, SD = .13). Due to the COVID-19 pandemic, child data collection was stopped prematurely and fell short of the 40 children originally planned for this study. Six additional 4-year-olds participated in the study, but their data were excluded due to issues with cooperating and following directions. All participants in the final sample had normal or corrected-to-normal vision with no color blindness or history of familial color blindness. Families were recruited from the Riverside County area. Adults were college undergraduates recruited from the departmental participant pool and received course credit 98 for participation. Participating children were identified by their caregivers as Black/African American (n = 1), American Indian/Alaskan Native (n = 1), Asian (n = 1), non-Hispanic White (n = 6), Hispanic or Latino(a)/White (n = 10), and more than one race (n = 6). One parent did not report race/ethnicity. Families received $10 and a small gift or book for participating. Adult participants identified as American Indian/Alaskan Native (n = 1), Asian (n = 4), non-Hispanic White (n = 1), Hispanic or Latino(a)/White (n = 2), and more than one race (n = 1). One adult participant did not report race/ethnicity. Participants or their caregivers provided informed consent after hearing the details of the study. Children gave verbal assent. The study procedure conforms to the US Federal Policy for the Protection of Human Subjects and was approved by the Institutional Review Board of the University of California Riverside. 4.8.2 Design All participants engaged in a two-part task: a live demonstration followed by a video observation task. Participants were randomly assigned to one of two conditions that varied the live demonstration to provide either a relevant or irrelevant perceptual experience. In the relevant action condition, participants watched the experimenter perform the target action. In the irrelevant action condition, participants talked about the items involved in the target action but did not receive a live demonstration of the target action. In the video observation task, all participants watched the same video recording of the target action while an eye tracker recorded their eye movements. 99 Stimuli The target action consisted of moving colored juice from one container to another using a plastic syringe. This specific goal-directed action was chosen because pilot testing revealed that it was an unfamiliar action sequence for 4-year-olds but not adults. Materials for the live demonstration involved a plastic 35.56 cm x 45.72 cm tray, a 35 ml plastic syringe (plunger and barrel), a 440 ml wide-mouth mason glass jar with a straw hole lid, a 147.8 ml tilted glass jar with a lid, a disposable plastic straw, a 15.24 cm x 20 cm x 7.93 cm hinged wooden box, and powdered juice mix (lemonade and cherry Kool-Aid). Figure 4.1 presents the materials used during the target action. A video was filmed at 30 fps that depicted an actor performing the target action with same materials used in the live demonstration. The total time of this video was 45 seconds. This video was viewed during the video observation task. Figure 4.2 is a frame from the video depicting the target action. Figure 4.1: Items Used in Target Action Note. Materials used during the target action: a plastic tray, a 35 ml plastic syringe (plunger and barrel), a wide-mouth mason glass jar with a straw hole lid , a tilted glass jar with a lid, a disposable plastic straw, and a hinged wooden box. 100 Figure 4.2: Example frame from the Stimulus Video Note. A frame from the video presented during the video observation task. This video depicted an actor using a large plastic syringe to move colored juice from one container to another. Eye Tracking Apparatus The video was presented on a 43.2 cm (diagonal) widescreen monitor at 30Hz and subtended a visual angle of 31°x19°. An Eyelink 1000 Plus remote eye tracker (SR Research Ltd.) was mounted below the monitor on an adjustable arm. Eye movements (right eye only) were recorded with a temporal resolution of 500 Hz. 4.8.3 Procedure Prior to the arrival of participants, materials for the live demonstration were pre- pared. The straw, plunger, and barrel were placed in the wooden box. 35 ml of lemonade was placed in the small, titled jar and 70 ml of Kool-Aid was placed in the large jar. Both jars were placed adjacent to each other, approximately 5 cm apart (see Figure 4.1). Once participants arrived and completed consent procedures, they were seated at a table across 101 from the experimenter. Child participants were seated on a booster seat. A research assistant would enter the room carrying the prepared tray and wooden box. The experimenter would proceed with the live demonstration and video observation. Live Demonstration For the relevant action condition, the tray was placed in front of the experimenter. The experimenter explained their goal was to mix the two flavors of juice using the items in the box in order to taste the combined flavors. Participants were instructed to watch the experimenter as they mixed the juice. Adults were given the following prompt: “I want to know how these two liquids taste when they are mixed together. Please observe the sequence of actions I make to move the liquid from the small jar to the large jar and then take a sip of it.” Children were given a similar but more child-friendly prompt: “I want to know what these two flavors of juice taste like when I mix them together. Watch me as I use the items in the box to move the red juice to the jar with the yellow juice and then take a sip of it.” Experimenters were trained to use a sequence of timed actions that matched the timing of the actions in the stimulus video. Participants in this condition viewed the entirety of the target action sequence. For the irrelevant action condition, the tray was placed in front of the participant. The researcher would ask the participant to talk about each item in front of them including the items in the wooden box. This allowed participants the opportunity to familiarize themselves with the materials but did not allow for observation of the target action sequence. Adults were given the following instructions: “We want to know what the items on the table are. Please identify each item in front of you, as well as the ones in the box, and say the 102 material the item is made out of.” Children were given the following instructions: “We want to know what all these things on the table are. Can you point to each of these objects, including the objects inside the box, and tell me what you think they are made out of?”. Video Observation Once participants completed the live demonstration, all participants were moved to a separate area containing the eye tracker in order to complete the video observation task. Participants sat in a viewing area that was separated from the experimenter by a hanging curtain. A target sticker was placed on the forehead to facilitate the eye tracker detecting the observers’ eyes. Children and adults sat in a chair (with a booster seat for children). The experimenter adjusted the monitor such that participants were at a viewing distance of 60 cm. Calibration involved a 5-point calibration routine followed by a 5-point validation check. Validation calculated the average error as the disparity between the target location and estimated point of gaze location. This calibration process was repeated until validation indicated 1.5° of average error or less. After calibration and validation, participants were shown the pre-recorded video of the actor performing the target action of moving the juice using the syringe with no audio. Adults and children were instructed to watch the video as their eye movements were recorded. Importantly, materials and placement in the video were identical to the live demonstration. Timing of actions in the video were comparable to the live demonstration for the relevant action group. 103 4.8.4 Data Processing and Measures Raw eye gaze locations during the video observation were extracted as a time series of horizontal and vertical gaze coordinates for each observer. Time periods when gaze was off screen, eyes were closed, or eyes were otherwise occluded were excluded from analyses. Adult synchrony Adult synchrony measures the degree of spatiotemporal similarity between an observer and a comparison group of adults. For this study, adult data was collected to serve as the comparison group. Half of the adults were assigned to each condition and participated in the study to ensure minimal differences between children and adult attention as a result of the study procedure. As a familiar action, adult attention to the target action was synchronized. Using a previously applied process (Franchak et al., 2016; Kadooka & Franchak, in prep), attentional synchrony was calculated by comparing each child participant to every adult in the adult comparison group. For each comparison made, the inter-subject correlation (ISC) was calculated as the correlation coefficient for the time series of eye coordinates. This was calculated separately for the horizontal and vertical direction, then averaged. An individual child’s adult synchrony score was the average ISC between that child and each of the adult observers. Adult synchrony scores closer to 1 indicate more adult-like gaze. Hand-object Synchrony Hand-object synchrony measured the spatiotemporal agreement between an observer’s gaze and the location of hands during hand-object actions in our video stimulus. Following the process described in Kadooka and Franchak (in prep), elliptical areas of inter104 est (AOIs) were drawn around hands at all times that hands were visible using Dataviewer software (SR research Ltd.). Next, hand-object actions were coded as the times a hand was physically interacting with an object that could be moved. For both hand AOIs, a time series of coordinates was created that described the center of the ellipse for the entire stimulus duration. To calculate hand-object synchrony, each participant’s eye movements were then correlated with the coordinates of the nearest hand AOI center. For times when both hands were visible, the nearest AOI was determined by the shortest Euclidean distance. Similar to ISCs, correlations were calculated in the horizontal and vertical dimensions separately then averaged. Hand-object synchrony scores closer to 1 indicated greater spatiotemporal attention to the locations of hands when interacting with objects. 4.9 Results We predicted that the relationship between condition and adult synchrony would be mediated by hand-object synchrony. Therefore, we performed a mediation analysis. To establish a mediation, the relationship between condition and adult synchrony was examined first. A linear regression using condition to predict adult synchrony indicated no total effect of condition, (beta = 0.043, p = .419). Synchrony did not significantly differ between children who received relevant action experience (M = .609, SD = .086) and those who received irrelevant action experience (M = .566, SD = .161). Past guidance of mediation models would suggest that a significant total effect is necessary to move forward with the mediation model, however, recent recommendations suggest that continuing without a significant effect can still provide useful information 105 (Hayes, 2018). Therefore, we proceeded to model the effect of condition on our mediating variable, hand-object synchrony. A linear regression predicting hand-object synchrony from condition found no effect of condition (beta = 0.05, p = .25). Children in the relevant condition (M = .65, SD = .08) looked to hand-object actions similarly to those who were in the irrelevant action (M = .60, SD = .14). Without a significant effect of condition on hand-object synchrony, there is no mediation effect. Contrary to our expectations, children viewed the action in similar ways, regardless of whether they saw a live demonstration of the target action prior. In past work, adult synchrony was predicted by attention to hands (Kadooka & Franchak, in prep). Therefore, we examined our data to determine if this effect replicated. Since condition did not impact either variable, we combined data from both conditions. There was a strong linear correlation between hand-object synchrony and adult synchrony, (r = .96, p < .001). This is evident in Figure 4.3 showing that hand-object synchrony is highly predictive of adult synchrony. 4.10 Discussion In this study we attempted to influence adult synchrony and hand-object syn- chrony by providing 4-year-olds an opportunity to observe a live demonstration of a novel action prior to viewing a video of the action. Our results indicate that regardless of whether 4-year-olds observed the live demonstration, their comprehension of and attention to the novel action was largely unaffected. However, we did confirm past results indicating that 106 More adult-like Adult synchrony 0.7 Less adult-like 0.6 0.5 0.4 0.3 0.4 Less synchrony 0.5 0.6 0.7 Hand−object synchrony 0.8 More synchrony Figure 4.3: Relationship between hand-object synchrony and adult synchrony when viewing target action video attention to hand-object actions provides considerable explanatory value in how similar observers are to adults. Contrary to our expectation, opportunities to view a novel action did not influence attention. Perceptual experiences can provide opportunities to learn a wide range of information about what is meaningful during an action but we did not observe any effect of condition. One possibility is a ceiling effect in attentional synchrony with adults and handobject actions for this particular action. This might occur if the action is too simplistic for this age and opportunities to learn about the action provide no additional knowledge. While the average correlation values in adult synchrony and hand-object synchrony were high, Figure 4.3 indicates that both adult synchrony and hand-object synchrony varied 107 quite widely. Adult synchrony ranged from a minimum = .29 to a maximum = .76 (SD= .13) and hand-object synchrony had a similar range with a minimum = .36 and a maximum = .80 (SD= .11). Therefore it is unlikely to be a ceiling effect, as there were improvements to be gained by at least some of the 4-year-olds. As it is applied in this study, adult synchrony and hand-object synchrony are likely to be related to action comprehension in ways that limit our conclusions. In this action, greater adult synchrony was related to looking at the hands interacting with objects. This is exactly what would be expected for knowledgeable adults: attention should being synchronized to meaningful hand-object interactions. As a closely related metric, high comprehension would also lead attention to hand-object actions. Therefore, comprehension is expected to be related to greater hand-object synchrony and greater adult synchrony. But this is not obligatory of these measures. Rather, this particular stimulus likely presents an action that leads to the collinearity that is observed in Figure 4.3. In a different hypothetical action stimulus, we can imagine that greater comprehension might lead attention to an area that is not the hands. For example, if the action involved manipulating a puzzle box to navigate out a rolling marble, then attention would likely be allocated to the marble as the most meaningful information. Simply looking like an adult (or looking to hand-object interactions) does not equate to understanding the meaningful information involved in the action. Our attempt to influence comprehension of and attention to meaningful information was not effective. In past work that changed attention or comprehension, the most effective manipulations have involved providing experiences that allow for self-directed ex- 108 ploration of the action. For example, in studies involving “sticky mittens” (Needham et al., 2002; Sommerville et al., 2005), infants are given repeated opportunities to engage in the action themselves, with longer training involving 10 minutes a day for two weeks and shorter changes occurring only after 200 seconds of training. Opportunities to discover the relationship between their own actions and ‘grasping’ the Velcro objects may provide a stronger or longer-lasting change to the infants’ comprehension of meaning in grasps and attention to meaningful aspects of grasp-like actions they subsequently observe. Some researchers point to activation of the infants’ own motor system as the reason why these manipulations are effective (Libertus & Hauf, 2017). While the role of the motor system is crucial, simply observing actions can still influence an observer’s comprehension of action. Repeated, selfpaced viewing of actions lead to reorganization of observers understanding of an action’s hierarchical structure (Hard et al., 2019). Self-paced viewing is an opportunity for observers to discover meaningful information about action structure by explore the stream of action at their own pace. Another common aspect of successful manipulations is the repeated nature of the opportunities provided. In line with theories of statistical learning (Saffran & Kirkham, 2018), repeated exposures help to reveal regularities in the type of information that is important to look at or that change comprehension. Therefore, we suggest that a sufficient manipulation in the current study may involve allowing children to explore the action by repeatedly attempting the action on their own. 4.10.1 Limitations and future direction In this study, the stimulus was selected to provide an action that was just within the motor abilities of 4-year-olds but unfamiliar. However, there is a striking degree of similarity 109 between adult synchrony and hand-object synchrony. How much of this similarity can be attributed to this specific action and how much can be attributed to all goal-directed manual actions? In this study, the action stimulus was designed to be fully in-view, with no face visible, take up a large area of a screen, and be goal-directed with no interleaved breaks or hesitations. These factors likely contribute towards there being few, if any, times that adults look to a location that did not involve hands and objects. However, coordination between eye movements and hands is a know feature of manual action (Flanagan & Johansson, 2003; Rotman et al., 2006) with only a couple hundred milliseconds of difference between eyes and hands. In future studies, consideration towards tasks and the metrics used to measure them should be central, especially when exploring new measurements. Within this stimulus, there is a variety of actions involving simple grasps to more complex syringe use. While attentional synchrony with adults and hand-object synchrony is sensitive to spatiotemporal variability, investigating specific portions of the action may reveal differences in attention. For portions that show simple grasps, children’s synchrony with adults could be high, but this may not be the case for the more novel actions involving the syringe. A closer analysis of that separates out different actions may show that our study conditions were effective at changing attention but only for actions that were entirely novel to 4-year-olds. The action literature provides detailed descriptions of attention to simple actions. One consistent finding is that eye movements of experienced observers are predictive in looking to goal locations. Novice observers tend to be reactive and follow behind the hand. However, we did not examine this aspect of attention. Identifying times at which attention is 110 most synchronous, predictive, and reactive, compared to adults may show that children are predictive of familiar actions but reactive for novel actions. Time lagged cross correlations would be helpful in this endeavor. 111 Chapter 5 Conclusions In this dissertation, I described the results of three studies that add to our understanding of visual attention development in dynamic scenes. In Chapter 2, I reported the role of two visual features, faces and salience, to test whether there is a developmental shift from bottom-up to top-down attention. The results revealed that visual attention is a dynamic process, in which people prioritize which features are important from moment to moment. Building on the idea of prioritization, Chapter 3 measured developmental changes in attention to hands and hand-object actions. Using newly-developed measurements of synchrony, I found further evidence in support of a developmental account of visual attention that emphasizes prioritization. In Chapter 4, I applied this developmental account to visual attention when viewing goal-directed manual actions. To test whether action comprehension changes visual attention to actions, I experimentally manipulated 4-year-olds’ prior experience in viewing an action sequence. However, I found no difference in subsequent action attention between children who had previously viewed the action sequence 112 and those who had not. In the investigation of developmental changes in visual attention, I revealed complex variability and applied nuanced perspectives to develop better conceptual approaches that takes this complexity into consideration. The main theoretical contribution from the present work is showing that feature accounts of development, whether bottom-up or top-down, fail to capture age-related changes in visual attention when meaning is not considered. Whereas prior work (Franchak et al., 2016; Frank et al., 2009; Kwon et al., 2016; Rider et al., 2018) suggested that attention develops via changes in looking at faces and salient locations, we provided evidence that development is unlikely to involve a global shift from bottom-up to top-down features. Change occurs quantitatively, rather than qualitatively, in the ability to prioritize meaningful information. In Chapter 3, I directly tested whether a general measure (adult synchrony) that captures the spatiotemporal variability of attention is predicted by age. Indeed, unlike faces or salience in Chapter 2, there was a consistent effect of age on adult synchrony. This suggests that changes in how more mature participants prioritize what is meaningful to look at cannot be reduced to simply, “look less at salient areas and more at faces”. Although the increase in adult synchrony with age suggests that observers become increasingly better at prioritizing attention towards meaningful areas, an alternative explanation is that what infants think is meaningful is different compared to what adults think is meaningful. This cannot be ruled out from the present work, however, past work suggests this is not the case (Franchak et al., 2016): 6-month-olds and 9-month-olds showed highly idiosyncratic gaze patterns rather than settling on a single, “meaningful” area, whereas 24-month-olds showed greater consistency within their age group. 113 A secondary theoretical contribution from this work is identifying attention to hands and hand-object actions as meaningful features that show developmental increases in the organization of attention. The designs of both measures take into account the spatiotemporal variability of features within a stimulus and were directly informed by insights from Chapter 2. For naturalistic scenes on a screen, hands and manual actions are a developmentally informative feature for influencing how infants and children become more adult-like observers. Building on work from Chapter 4, future research could determine if hands and hand-object actions are important in developing adult-like attention beyond the screen. Prior work already points to the ability of 1-year old infants to use the hands of caretakers during object interactions to engage in joint attention without looking to the faces of their caretaker (Yu & Smith, 2013). Parents who leverage infants’ attention to hand-object actions to scaffold joint attention can also provide opportunities for infants to discover other types of information about gaze following, coordinated social attention, object properties, or motor abilities. This illustrates the complexity of developing systems and how development of visual attention is inherently connected to development beyond simply looking to hands. Understanding antecedents that promote visual attention development and developmental cascades that arise from visual attention development will require rigorous interdisciplinary work. My studies indicated that measuring meaning is vital for characterizing the development of visual attention beyond measuring simple features. Faces are not always meaningful and can change in meaning within a scene. What does the emphasis towards meaning suggest about visual attention theory as it relates to both bottom-up and top- 114 down features? Firstly, given the overlap between salient locations and faces (Torralba et al., 2006; Wass & Smith, 2015), it is likely that locations in a scene that convey meaning are also visually salient. Conversely, bottom-up features can also be meaningful. Both bottom-up and top-down features are capable of cueing observers to areas of importance. Secondly, from a developmental perspective, the overlap of both feature types may provide redundant cues for infants and children to learn associations between features and areas of meaning. Redundancy may present statistical regularities that are detectable by infants (Saffran & Kirkham, 2018) and may structure the visual environment to promote scaffolding towards better comprehension of meaning or development of social abilities (Shepherd, 2010). Lastly, I suggest the separation of bottom-up and top-down features in a visual scene presents a false dichotomy. While bottom-up and top-down features are categorically distinct, the human visual attention system may organize these features hierarchically in service of determining meaning in a scene. For instance, low-level features in a scene like contours, colors, or motion may support better conceptualization of meaningful features like objects and social agents which in turn supports even more complex meaning like features involved in the intentions of people or information relevant to tasks. Observers are expected to attend to features that convey meaning but where would a person look if they were faced with a scene that they did not understand like watching an unfamiliar sport? In a hierarchically organized system of meaning, an observer may rely on a lower-level of features to find meaning like looking to the ball or perhaps attending to salient features like where the most movement is occurring. In recent work, my collaborators and I found support for this notion by showing salience was a better predictor of attention for adults who 115 view videos with shots edited to occur out of order compared to adults who viewed intact videos (Jing et al., in press). When there is not enough information to select a meaningful location, observers may rely on features that are lower in the hierarchy of meaning. This may also explain why salience better predicts attention during the first fixations of static images (Parkhurst & Niebur, 2003). A hierarchical organization of meaning for visual attention would also apply to the development of attention. If visual attention is allocated to the highest level of meaning available for an observer, this could still account for the attention of young infants as ‘stimulus-driven’ (Oakes & Amso, 2018; Stechler & Latz, 1966). In this case, infants may have an undeveloped hierarchical system of what is meaningful and therefore use less informative features, yet features that are most meaningful to them, which would be low-level salient features. Infants may choose to look to meaningful information that is appropriate for their level of comprehension complexity. Past studies have found a similar self-selection in infants for visual information that is not too complex nor too simple (Kidd, Piantadosi, & Aslin, 2012) and has been used to explain novelty/familiarity preferences in infants (Kidd et al., 2012; Kidd, Piantadosi, & Aslin, 2014). Kidd et al. (2012) suggest several explanations of this phenomenon including difficulty of encoding, efficiency of computational resources, and selection of the optimal level of complexity for learning. Despite uncertainty of the mechanism, infants’ attention based on a hierarchical organization of meaning seems compatible with past work indicating infants choose to attend to information that exists within a cognitive ’Goldilocks’ zone. 116 If meaning is organized hierarchically based on the observers individual experiences, there are interesting implications to how this applies to cultural differences. For instance, cultural-specific practices or tools may lead to different comprehension of meaning in a scene. One such difference is observed in attention to the use of eating utensils between Swedish and Chinese infants in predicting looks to the mouth based on whether spoons or chopsticks are used (Green et al., 2016). It could be argued that differences in infants’ attention to eating utensils comes from a more general deficit in cognitive reasoning and inexperience with human actions rather than a cultural difference. However, adults also display cultural differences in understanding meaning when comparing naive and experienced observes. Comprehension of video clips was poorer for a group of adults from Turkey with no prior media experience compared to adults with either low or high media experience (Ildirar & Schwan, 2015). Without a developed hierarchy of meaning for media conventions, comprehension of scenes that involved cuts led to failures in interpreting temporal and spatial relations between scenes (e.g. cuts to a different angle of an animal misinterpreted as the animal rotating or cuts from establishing shots outside to inside not being connected as the same location or time). These findings suggest that acquired experience from everyday visual experiences beyond formal learning environments support a more developed hierarchy of meaning. Day-to-day experiences allow for transmission of culture-specific concepts indicative of socio-cultural learning processes for visual attention development. While a hierarchical account of meaning of visual features seems to fit well with past and current findings, this perspective is speculative. Closer examination, perhaps cross-culturally, would help to test this account. 117 Beyond theories of attention, the studies in this dissertation also made methodological contributions to the study of visual attention. Chapter 2 demonstrated that variability in stimuli should be the norm. After finding no global effect of faces or salient region on age, further examination of stimuli revealed patterns within stimuli that occurred when 10 second windows were calculated independently. Variations within and between stimuli were striking and revealed that attention was incredibly complex. Studies that use a single stimulus are not accurately representing the stimuli space in which the developmental phenomena occurs in. Furthermore, given the distribution of age correlations during 10 second windows that fell within the 95% confidence level in Figures 2.4 and 2.6, studies that continue to use singular or short duration stimuli are prone to find null effects even if there are developmental changes in prioritization of meaning. Chapter 3 and 4 provided insight on how measures of attentional synchrony can provide a way to measure attention in complex, dynamic stimuli. Adult synchrony captures spatiotemporal variability in attention while also indexing development of visual attention. Because it is based on the eye movements of adult participants, it avoids selective biases that researchers introduce by choosing which features to measure (e.g., faces). Most importantly, adult synchrony is a way of capturing what is meaningful in a scene without defining what meaning is. The consistent age-related increase in adult synchrony provides a powerful way to simultaneously test how different features account for the changes in prioritization of meaningful information. A general trend in the visual attention literature towards understanding meaning by having adults inform researchers about what is meaningful draws a clear parallel between adult synchrony in the dissertation and meaning maps 118 described earlier. Although the stimuli are different for meaning maps and adult synchrony (images and videos, respectively), the contributions of both may indicate that less informative feature-based approaches will lose popularity in favor of understanding more complex influences, like meaning. I also extended the synchrony measure approach to measure attention to hands and hand actions. Basic descriptive statistics, such as proportion of time looking at faces, cannot account for the spatiotemporal variability in face looking over a stimulus. However, the new hand synchrony and hand-object synchrony measures did predict changes in adult-synchrony. In determining other features that may influence attention, I provide two considerations 1) considering the meaning that the feature cues and 2) the times at which the feature is most meaningful. Across the three studies, findings suggest that the development of visual attention in dynamic scenes involves improvements in moment-to-moment prioritization of attention to meaningful information. Evidence from Chapter 4 about the mechanism of change is inconclusive, however, I propose that perceptual experiences reveal regularities about actions that improves comprehension. While still technically compatible with theoretical approaches that categorize the influence of bottom-up and top-down features, I suggest that moving away from features as singular stable factors towards consideration of when and where these features convey meaning is a step towards a more comprehensive understanding of visual attention development. 119 References Ambrosini, E., Reddy, V., De Looper, A., Costantini, M., Lopez, B., & Sinigaglia, C. (2013). Looking ahead: Anticipatory gaze and motor ability in infancy. PLoS ONE , 8 (7), e67916. doi: 10.1371/journal.pone.0067916 Amso, D., Haas, S., & Markant, J. (2014). An eye tracking investigation of developmental change in bottom-up attention orienting to faces in cluttered natural scenes. PLoS ONE , 9 , 1–7. doi: 10.1371/journal.pone.0085701 Anderson, D. R., Lorch, E. P., Field, D. E., & Sanders, J. (1981). The effects of tv program comprehensibility on preschool children’s visual attention to television. Child Development, 20 , 151–157. doi: 10.2307/ 1129224 Aring, E., Grönlund, M. A., Hellström, A., & Ygge, J. (2007). Visual fixation development in children. Graefe’s Archive for Clinical and Experimental Ophthalmology, 245 (11), 1659–1665. Açik, A., Sarwary, A., Schultze-Kraft, R., Onat, S., & König, P. (2010). Developmental changes in natural viewing behavior: Bottom-up and top-down differences between children, young adults and older adults. Frontiers in Psychology, 1 . doi: 10.3389/fpsyg.2010.00207 Ballard, D. H., & Hayhoe, M. M. (2009). Modelling the role of task in the control of gaze. Visual Cognition, 17 , 1185–1204. doi: 10.1080/13506280902978477 Barrett, T. M., Davis, E. F., & Needham, A. (2007). Learning about tools in infancy. Developmental Psychology, 43 (2), 352. Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67 (1), 1–48. doi: 10.18637/jss.v067.i01 Bertenthal, B., & Boyer, T. (2012). Developmental changes in infants’ visual attention to pointing. Journal of Vision, 12 (9), 480–480. Birmingham, E., Bischof, W. F., & Kingstone, A. (2008). Social attention and real-world scenes: The roles of action, competition and social content. Quarterly Journal of Experimental Psychology, 61 , 986–998. doi: 10.1080/17470210701410375 Birmingham, E., Bischof, W. F., & Kingstone, A. (2009). Saliency does not account for fixations to eyes within social scenes. Vision Research, 49 , 2992–3000. doi: 10.1016/j.visres.2009.09.014 120 Borji, A., & Itti, L. (2013). State-of-the-art in visual attention modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35 , 185–207. doi: 10.1109/TPAMI.2012.89 Bracci, S., Ietswaart, M., Peelen, M. V., & Cavina-Pratesi, C. (2010). Dissociable neural responses to hands and non-hand body parts in human left extrastriate visual cortex. Journal of Neurophysiology, 103 (6), 3389–3397. Bronson, G. W. (1994). Infants’ transitions toward adult-like scanning. Child Development, 65 , 1243–1261. doi: 10.2307/1131497 Bruce, V. (1993). What the human face tells the human mind: Some challenges for the robot-human interface. Advanced Robotics, 8 , 341–355. doi: 10.1163/156855394X00149 Cannon, E. N., & Woodward, A. L. (2012). Infants generate goal-based action predictions. Developmental Science, 15 (2), 292–298. Cashon, C. H., & Cohen, L. B. (2004). Beyond U-shaped development in infants’ processing of faces: An information-processing account. Journal of Cognition and Development, 5 , 59–80. doi: 10.1207/s15327647jcd05014 Castelhano, M. S., Mack, M. L., & Henderson, J. M. (2009). Viewing task influences eye movement control during active scene perception. Journal of Vision, 9 , 1–15. doi: 10.1167/9.3.6 Colombo, J. (2001). The development of visual attention in infancy. Annual Review of Psychology, 52 , 337–367. doi: 10.1146/annurev.psych.52.1.337 Colombo, J., Mitchell, D. W., Coldren, J. T., & Atwater, J. D. (1990). Discrimination learning during the first year: Stimulus and positional cues. Journal of Experimental Psychology: Learning, Memory, and Cognition, 16 , 98–109. doi: 10.1037/02787393.16.1.98 Coutrot, A., & Guyader, N. (2014). How saliency, faces, and sound influence gaze in dynamic social scenes. Journal of Vision, 14 , 1-17. doi: 10.1167/14.8.5 Darby, K. P., Deng, S. W., Walther, D. B., & Sloutsky, V. M. (2021). The development of attention to objects and scenes: From object-biased to unbiased. Child Development, 92 (3), 1173–1186. Deak, G. O., Krasno, A. M., Triesch, J., Lewis, J., & Sepeta, L. (2014). Watch the hands: Infants can learn to follow gaze by seeing adults manipulate objects. Developmental Science, 17 (2), 270–281. de Villiers Rader, N., & Zukow-Goldring, P. (2010). How the hands control attention during early word learning. Gesture, 10 (2-3), 202–221. Dorr, M., Martinetz, T., Gegenfurtner, K. R., & Barth, E. (2010). Variability of eye movements when viewing dynamic natural scenes. Journal of Vision, 10 , 28. doi: 10.1167/10.10.28 121 Farmer, H., Bevan, C., Green, D., Rose, M., Cater, K., & Stanton Fraser, D. (2021). Did you see what i saw?: Comparing attentional synchrony during 360° video viewing in head mounted display and tablets. Journal of Experimental Psychology: Applied , 27 (2), 324. Farzin, F., Hou, C., & Norcia, A. M. (2012). Piecing it together: Infants’ neural responses to face and object structure. Journal of Vision, 12 , 6–6. doi: 10.1167/12.13.6 Farzin, F., Rivera, S. M., & Whitney, D. (2011). Time crawls: the temporal resolution of infants’ visual attention. Psychological Science, 22 , 1004–1010. doi: 10.1177/0956797611413291 Fausey, C. M., Jayaraman, S., & Smith, L. B. Changing visual input in the first two years. 10.1016/j.cognition.2016.03.005 (2016). From faces to hands: Cognition, 152 , 101–107. doi: Filippi, C. A., & Woodward, A. L. (2016). Action experience changes attention to kinematic cues. Frontiers in Psychology, 7 , 19. doi: 10.3389/fpsyg.2016.00019 Flanagan, J. R., & Johansson, R. S. (2003). Action plans used in action observation. Nature, 424 (6950), 769–771. doi: 10.1038/nature01861 Foulsham, T., Walker, E., & Kingstone, A. (2011). The where, what and when of gaze allocation in the lab and the natural environment. Vision Research, 51 , 1920–1931. doi: 10.1016/j.visres.2011.07.002 Franchak, J. M., Heeger, D. J., Hasson, U., & Adolph, K. E. (2016). Free viewing gaze behavior in infants and adults. Infancy, 21 , 262–287. doi: 10.1111/infa.12119 Franchak, J. M., & Kadooka, K. (2022). Age differences in orienting to faces in dynamic scenes depend on face centering, not visual saliency. Infancy. Franchak, J. M., Kretch, K. S., & Adolph, K. E. (2018). See and be seen: Infant–caregiver social looking during locomotor free play. Developmental Science, 21 , e12626. doi: 10.1111/desc.12626f Frank, M. C., Amso, D., & Johnson, S. P. (2014). Visual search and attention to faces during early infancy. Journal of Experimental Child Psychology, 118 , 13–26. doi: 10.1016/j.jecp.2013.08.012 Frank, M. C., Vul, E., & Johnson, S. P. (2009). Development of infants’ attention to faces during the first year. Cognition, 110 , 160–170. doi: 10.1016/j.cognition.2008.11.010 Frank, M. C., Vul, E., & Saxe, R. (2012). Measuring the development of social attention using free-viewing. Infancy, 17 , 355–375. doi: 10.1111/j.1532-7078.2011.00086.x Gannon, E. T., & Grubb, M. A. (2022). How filmmakers guide the eye: The effect of average shot length on intersubject attentional synchrony. Psychology of Aesthetics, Creativity, and the Arts, 16 (1), 125. 122 Gluckman, M., & Johnson, S. P. (2013). Attentional capture by social stimuli in young infants. Frontiers in Psychology, 4 . doi: 10.3389/fpsyg.2013.00527 Goldstein, R. B., Woods, R. L., & Peli, E. (2007). Where people look when watching movies: Do all viewers look at the same place? Computers in biology and medicine, 37 (7), 957–964. Green, D., Li, Q., Lockman, J. J., & Gredebäck, G. (2016). Culture influences action understanding in infancy: Prediction of actions performed with chopsticks and spoons in chinese and swedish infants. Child Development, 87 (3), 736–746. doi: 10.1111/cdev.12500 Hard, B. M., Meyer, M., & Baldwin, D. (2019). Attention reorganizes as structure is detected in dynamic action. Memory & Cognition, 47 (1), 17–32. doi: 10.3758/s13421018-0847-z Harel, J., Koch, C., & Perona, P. (2006). Graph-based visual saliency. In Proceedings of the 19th International Conference on Neural Information Processing Systems (pp. 545–552). Cambridge, MA: MIT Press. doi: 10.7551/mitpress/7503.003.0073 Hart, B. M., Vockeroth, J., Schumann, F., Bartl, K., Schneider, E., Konig, P., . . . Einhäuser, W. (2009). Gaze allocation in natural stimuli: Comparing free exploration to head-fixed viewing conditions. Visual Cognition, 17 , 1132–1158. doi: 10.1080/13506280902812304 Hayes, A. F. (2018). Partial, conditional, and moderated moderated mediation: Quantification, inference, and interpretation. Communication monographs, 85 (1), 4–40. Hayhoe, M. M., Shrivastava, A., Mruczek, R., & Pelz, J. B. (2003). Visual memory and motor planning in a natural task. Journal of Vision, 3 (1), 6–6. Helo, A., Pannasch, S., Sirri, L., & Rämä, P. (2014). The maturation of eye movement behavior: Scene viewing characteristics in children and adults. Vision Research, 103 , 83–91. doi: 10.1016/j.visres.2014.08.006 Helo, A., Rämä, P., Pannasch, S., & Meary, D. (2016). Eye movement patterns and visual attention during scene viewing in 3-to 12-month-olds. Visual Neuroscience, 33 , 1–7. doi: 10.1017/S0952523816000110 Helo, A., van Ommen, S., Pannasch, S., Danteny-Dordoigne, L., & Rämä, P. (2017). Influence of semantic consistency and perceptual features on visual attention during scene viewing in toddlers. Infant Behavior and Development, 49 , 248–266. doi: 10.1016/j.infbeh.2017.09.008 Henderson, J. M. (2017). Gaze Control as Prediction. Trends in Cognitive Sciences, 21 , 15–23. doi: 10.1016/j.tics.2016.11.003 Henderson, J. M., Brockmole, J. R., Castelhano, M. S., & Mack, M. (2007). Visual saliency does not account for eye movements during visual search in real-world scenes. In R. van Gompel, M. Fischer, W. Murray, & R. Hill (Eds.), Eye movements: A window 123 on mind and brain (pp. 537–562). Oxford: Elsevier. doi: 10.1016/b978-0080449807/50027-6 Henderson, J. M., & Hayes, T. R. (2018). Meaning guides attention in real-world scene images: Evidence from eye movements and meaning maps. Journal of Vision, 18 , 1–18. doi: 10.1167/18.6.10 Ildirar, S., & Schwan, S. (2015). First-time viewers’ comprehension of films: Bridging shot transitions. British Journal of Psychology, 106 (1), 133–151. Itti, L., & Baldi, P. (2005). A principled approach to detecting surprising events in video. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp. 631–637). IEEE. Itti, L., & Baldi, P. (2009). Bayesian surprise attracts human attention. Vision Research, 49 , 1295–1306. doi: 10.1016/j.visres.2008.09.007 Itti, L., & Koch, C. (2000). A saliency-based search mechanism for overt and covert shifts of visual attention. Vision Research, 40 , 1489–1506. doi: 10.1016/s0042-6989(99)001637 Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20 , 1254–1259. doi: 10.1109/34.730558 Jing, M., Kadooka, K., Franchak, J., & Kirkorian, H. L. (in press). The effect of narrative coherence and visual salience on children’s and adults’ gaze while watching video. Journal of Experimental Child Psychology. Jost, T., Ouerhani, N., Wartburg, R. V., Müri, R., & Hügli, H. (2005). Assessing the contribution of color in visual attention. Computer Vision and Image Understanding, 100 , 107–123. doi: 10.1016/j.cviu.2004.10.009 Kadooka, K., & Franchak, J. M. (2020). Developmental changes in infants’ and children’s attention to faces and salient regions vary across and within video stimuli. Developmental Psychology, 56 (11), 2065. Kadooka, K., & Franchak, J. M. (in prep). Attention to hands during manual actions account for developmental increases in attentional synchrony. Kanakogi, Y., & Itakura, S. (2011). Developmental correspondence between action prediction and motor ability in early infancy. Nature communications, 2 (1), 1–6. Kidd, C., Piantadosi, S. T., & Aslin, R. N. (2012). The goldilocks effect: Human infants allocate attention to visual sequences that are neither too simple nor too complex. PloS one, 7 (5), e36399. Kidd, C., Piantadosi, S. T., & Aslin, R. N. (2014). The goldilocks effect in infant auditory attention. Child development, 85 (5), 1795–1804. 124 Kirkorian, H. L., & Anderson, D. R. (2017). Anticipatory eye movements while watching continuous action across shots in video sequences: A developmental study. Child Development, 88 (4), 1284–1301. Kirkorian, H. L., & Anderson, D. R. (2018). Effect of sequential video shot comprehensibility on attentional synchrony: A comparison of children and adults. Proceedings of the National Academy of Sciences, 115 , 9867–9874. doi: 10.1073/pnas.1611606114 Kirkorian, H. L., Anderson, D. R., & Keen, R. (2012). Age differences in online processing of video: An eye movement study. Child Development, 83 , 497–507. doi: 10.1111/j.14678624.2011.01719.x Klatzky, R. L., Pellegrino, J. W., McCloskey, B. P., & Doherty, S. (1989). Can you squeeze a tomato? the role of motor representations in semantic sensibility judgments. Journal of memory and language, 28 (1), 56–77. Klin, A., & Jones, W. (2008). Altered face scanning and impaired recognition of biological motion in a 15-month-old infant with autism. Developmental Science, 11 , 40–46. doi: 10.1111/j.1467-7687.2007.00608.x Klin, A., Jones, W., Schultz, R., Volkmar, F., & Cohen, D. (2002). Visual fixation patterns during viewing of naturalistic social situations as predictors of social competence in individuals with autism. Archives of General Psychiatry, 59 , 809–816. doi: 10.1001/archpsyc.59.9.809 Kochukhova, O., & Gredebäck, G. (2010). Preverbal infants anticipate that food will be brought to the mouth: An eye tracking study of manual feeding and flying spoons. Child Development, 81 (6), 1729–1738. Kwon, M.-K., Setoodehnia, M., Baek, J., Luck, S. J., & Oakes, L. M. (2016). The development of visual search in infancy: Attention to faces versus salience. Developmental Psychology, 52 , 537–555. doi: 10.1037/dev0000080 Land, M. F. (2009). Vision, eye movements, and natural behavior. Visual Neuroscience, 26 , 51–62. doi: 10.1017/S0952523808080899 Land, M. F., & Fernald, R. D. (1992). The evolution of eyes. Annual Review of Neuroscience, 15 , 1–29. doi: 10.1146/annurev.ne.15.030192.000245 Land, M. F., & McLeod, P. (2000). From eye movements to actions: Batsmen hit the ball. Nature Neuroscience, 3 , 1340–1345. doi: 10.1038/81887 Libertus, K., & Hauf, P. (2017). Motor skills and their foundational role for perceptual, social, and cognitive development (Vol. 8). Frontiers Media SA. Libertus, K., Landa, R. J., & Haworth, J. L. (2017). Development of attention to faces during the first 3 years: Influences of stimulus type. Frontiers in Psychology, 8 . doi: 10.3389/fpsyg.2017.01976 125 Libertus, K., & Needham, A. (2010). Teach to reach: The effects of active vs. passive reaching experiences on action and perception. Vision Research, 50 (24), 2750–2757. doi: 10.1017/S0952523808080899 Lorch, E. P., & Castle, V. J. (1997). Preschool children’s attention to television: Visual attention and probe response times. Journal of Experimental Child Psychology, 66 , 111–127. doi: 10.1006/jecp.1997.2372 Mahdi, A., Su, M., Schlesinger, M., & Qin, J. (2017). A comparison study of saliency models for fixation prediction on infants and adults. IEEE Transactions on Cognitive and Developmental Systems, 10 , 485–498. doi: 10.1109/tcds.2017.2696439 Mital, P. K., Smith, T. J., Hill, R. L., & Henderson, J. M. (2011). Clustering of gaze during dynamic scene viewing is predicted by motion. Cognitive Computation, 3 , 5–24. doi: 10.1007/s12559-010-9074-z Monroy, C., Gerson, S., & Hunnius, S. (2017). Infants’ motor proficiency and statistical learning for actions. Frontiers in Psychology, 8 , 2174. doi: 10.3389/fpsyg.2017.02174 Moore, C. G., & Müller, S. (2014). Transfer of expert visual anticipation to a similar domain. Quarterly Journal of Experimental Psychology, 67 (1), 186–196. doi: 10.1080/17470218.2013.798003 Mulder, H., Oudgenoeg-Paz, O., Verhagen, J., van der Ham, I. J., & Van der Stigchel, S. (2022). Infant walking experience is related to the development of selective attention. Journal of Experimental Child Psychology, 220 , 105425. Napier, J. R. (1956). The prehensile movements of the human hand. The Journal of bone and joint surgery. British volume, 38 (4), 902–913. Needham, A., Barrett, T., & Peterman, K. (2002). A pick-me-up for infants’ exploratory skills: Early simulated experiences reaching for objects using ‘sticky mittens’ enhances young infants’ object exploration skills. Infant Behavior and Development, 25 (3), 279–295. doi: 10.1016/S0163-6383(02)00097-8 Oakes, L. M., & Amso, D. (2018). The development of visual attention. In J. Wixted (Ed.), The Steven’s Handbook of Experimental Psychology and Cognitive Neuroscience (4th ed., Vol. 4, pp. 1–33). New York: Wiley. doi: 10.1002/9781119170174.epcn401 Parkhurst, D. J., & Niebur, E. (2003). Scene content selected by active vision. Spatial Vision, 16 (2), 125–154. Parkhurst, D. J., & Niebur, E. (2004). Texture contrast attracts overt visual attention in natural scenes. European Journal of Neuroscience, 19 , 783–789. doi: 10.1111/j.0953816x.2003.03183.x Pascalis, O., de Haan, M., & Nelson, C. A. (2002). Is face processing species-specific during the first year of life? Science, 296 , 1321–1323. doi: 10.1126/science.1070223 126 Paus, T., Babenko, V., & Radil, T. (1990). Development of an ability to maintain verbally instructed central gaze fixation studied in 8-to 10-year-old children. International journal of Psychophysiology, 10 (1), 53–61. Pempek, T. A., Kirkorian, H. L., Richards, J. E., Anderson, D. R., Lund, A. F., & Stevens, M. (2010). Video comprehensibility and attention in very young children. Developmental Psychology, 46 , 1283–1293. doi: 10.1037/a0020614 Pereira, E. J., Birmingham, E., & Ristic, J. (2019). The eyes do not have it after all? attention is not automatically biased towards faces and eyes. Psychological Research, 1–17. doi: 10.1007/s00426-018-1130-4 Peters, R. J., Iyer, A., Itti, L., & Koch, C. (2005). Components of bottom-up gaze allocation in natural images. Vision Research, 45 , 2397–2416. doi: 10.1016/j.visres.2005.03.019 R Core Team. (2017). R: A language and environment for statistical computing [Computer software manual]. Vienna, Austria. Retrieved from https://www.R-project.org/ Rehrig, G., Peacock, C. E., Hayes, T. R., Henderson, J. M., & Ferreira, F. (2020). Where the action could be: Speakers look at graspable objects and meaningful scene regions when describing potential actions. Journal of Experimental Psychology: Learning, Memory, and Cognition, 46 (9), 1659. Reinagel, P., & Zador, A. M. (1999). Natural scene statistics at the centre of gaze. Network: Computation in Neural Systems, 10 , 341–350. doi: 10.1088/0954-898x/10/4/304 Rideout, V. (2017). The common sense census: Media use by kids age zero to eight. San Francisco, CA: Common Sense Media. Rider, A. T., Coutrot, A., Pellicano, E., Dakin, S. C., & Mareschal, I. (2018). Semantic content outweighs low-level saliency in determining children’s and adults’ fixation of movies. Journal of Experimental Child Psychology, 166 , 293–309. doi: 10.1016/j.jecp.2017.09.002 Rothkopf, C. A., Ballard, D. H., & Hayhoe, M. M. (2007). Task and context determine where you look. Journal of Vision, 7 , 1–20. doi: 10.1167/7.14.16 Rotman, G., Troje, N. F., Johansson, R. S., & Flanagan, J. R. (2006). Eye movements when observing predictable and unpredictable actions. Journal of Neurophysiology, 96 (3), 1358–1369. doi: 10.1152/jn.00227.2006 Ruff, H. A., Capozzoli, M., & Weissberg, R. (1998). Age, individuality, and context as factors in sustained visual attention during the preschool years. Developmental Psychology, 34 , 454–464. doi: 10.1037/0012-1649.34.3.454 Saffran, J. R., & Kirkham, N. Z. (2018). Infant statistical learning. Annual review of psychology, 69 , 181. Sailer, U., Flanagan, J. R., & Johansson, R. S. (2005). Eye–hand coordination during learning of a novel visuomotor task. Journal of Neuroscience, 25 , 8833–8842. doi: 10.1523/jneurosci.2658-05.2005 127 Schiller, P. H. (1998). The neural control of visually guided eye movements. In Cognitive neuroscience of attention (pp. 13–60). Psychology Press. Shepherd, S. V. (2010). Following gaze: gaze-following behavior as a window into social cognition. Frontiers in integrative neuroscience, 4 , 5. Shepherd, S. V., Steckenfinger, S. A., Hasson, U., & Ghazanfar, A. A. (2010). Humanmonkey gaze correlations reveal convergent and divergent patterns of movie viewing. Current Biology, 20 , 649–656. doi: 10.1016/j.cub.2010.02.032 Sherman, R. A., & Serfass, D. G. (2015). The comprehensive approach to analyzing multivariate constructs. Journal of Research in Personality, 54 , 40–50. doi: 10.1016/j.jrp.2014.05.002 Smith, L. B., Jayaraman, S., Clerkin, E., & Yu, C. (2018). The developing infant creates a curriculum for statistical learning. Trends in cognitive sciences, 22 (4), 325–336. Smith, T. J., & Mital, P. K. (2013). Attentional synchrony and the influence of viewing task on gaze behavior in static and dynamic scenes. Journal of Vision, 13 , 1–24. doi: 10.1167/13.8.16 Sommerville, J. A., Woodward, A. L., & Needham, A. (2005). Action experience alters 3-month-old infants’ perception of others’ actions. Cognition, 96 (1), B1–B11. doi: 10.1016/j.cognition.2004.07.004 Spelke, E. S. (1990). Principles of object perception. Cognitive science, 14 (1), 29–56. Stechler, G., & Latz, E. (1966). Some observations on attention and arousal in the human infant. Journal of the American Academy of Child Psychiatry. Stoesz, B. M., & Jakobson, L. S. (2014). Developmental changes in attention to faces and bodies in static and dynamic scenes. Frontiers in Psychology, 5 , 193. doi: 10.3389/fpsyg.2014.00193 Tatler, B. W., Hayhoe, M. M., Land, M. F., & Ballard, D. H. (2011). Eye guidance in natural vision: Reinterpreting salience. Journal of Vision, 11 , 1–23. doi: 10.1167/11.5.5 Tomasello, M., Carpenter, M., & Liszkowski, U. (2007). A new look at infant pointing. Child Development, 78 (3), 705–722. doi: 10.1111/j.1467-8624.2007.01025.x. Torralba, A., Oliva, A., Castelhano, M. S., & Henderson, J. M. (2006). Contextual guidance of eye movements and attention in real-world scenes: The role of global features in object search. Psychological Review , 113 , 766–786. doi: 10.1037/0033-295X.113.4.766 Tseng, P., Bridgeman, B., & Juan, C.-H. (2012). Take the matter into your own hands: a brief review of the effect of nearby-hands on visual processing. Vision research, 72 , 74–77. Tseng, P.-H., Carmi, R., Cameron, I. G., Munoz, D. P., & Itti, L. (2009). Quantifying center bias of observers in free viewing of dynamic natural scenes. Journal of Vision, 9 , 1–16. doi: 10.1167/12.13.3 128 van Renswoude, D. R., Visser, I., Raijmakers, M. E., Tsang, T., & Johnson, S. P. (2019). Real-world scene perception in infants: What factors guide attention allocation? Infancy, 24 (5), 693–717. Veer, I. M., Luyten, H., Mulder, H., van Tuijl, C., & Sleegers, P. J. (2017). Selective attention relates to the development of executive functions in 2, 5-to 3-year-olds: A longitudinal study. Early childhood research quarterly, 41 , 84–94. Võ, M. L.-H., & Henderson, J. M. (2009). Does gravity matter? effects of semantic and syntactic inconsistencies on the allocation of attention during scene perception. Journal of Vision, 9 (3), 24–24. Võ, M. L.-H., Smith, T. J., Mital, P. K., & Henderson, J. M. (2012). Do the eyes really have it? Dynamic allocation of attention when viewing moving faces. Journal of Vision, 12 , 1-14. doi: 10.1167/14.8.5 Wang, H. X., Freeman, J., Merriam, E. P., Hasson, U., & Heeger, D. J. (2012). Temporal eye movement strategies during naturalistic viewing. Journal of Vision, 12 , 1–27. doi: 10.1167/12.1.16 Wartella, E., Richert, R. A., & Robb, M. B. (2010). Babies, television and videos: How did we get here? Developmental Review , 30 , 116–127. doi: 10.1016/j.dr.2010.03.008 Wass, S. V., Forssman, L., & Leppänen, J. (2014). Robustness and precision: How data quality may influence key dependent variables in infant eye-tracker analyses. Infancy, 19 , 427–460. doi: 10.1111/infa.12055 Wass, S. V., & Smith, T. J. (2015). Visual motherese? Signal-to-noise ratios in toddlerdirected television. Developmental Science, 18 , 24–37. doi: 10.1111/desc.12156 Wass, S. V., Smith, T. J., & Johnson, M. H. (2013). Parsing eye-tracking data of variable quality to provide accurate fixation duration estimates in infants and adults. Behavior Research Methods, 45 , 229–250. doi: 10.3758/s13428-012-0245-6 Westheimer, G. (1982). The spatial grain of the perifoveal visual field. Vision Research, 22 , 157–162. doi: 10.1016/0042-6989(82)90177-8 Williams, A. M., Ward, P., Knowles, J. M., & Smeeton, N. J. (2002). Anticipation skill in a real-world task: measurement, training, and transfer in tennis. Journal of Experimental Psychology: Applied , 8 (4), 259. doi: 10.1037//1076-898x.8.4.259 Woodward, A. L. (1998). Infants selectively encode the goal object of an actor’s reach. Cognition, 69 (1), 1–34. Woodward, A. L. (2009). Infants’ grasp of others’ intentions. Current directions in psychological science, 18 (1), 53–57. Yarbus, A. L. (1967). Eye movements and vision. New York: Plenum. Yoshida, H., & Fausey, C. M. (2019). Visual objects as they are encountered by young language learners. International Handbook of Language Acquisition, 115–127. 129 Yu, C., & Smith, L. B. (2011). What you learn is what you see: using eye movements to study infant cross-situational word learning. Developmental Science, 14 (2), 165–180. Yu, C., & Smith, L. B. (2013). Joint attention without gaze following: Human infants and their parents coordinate visual attention to objects through eye-hand coordination. PLoS ONE , 8 , e79659. doi: 10.1371/journal.pone.0079659 Zacks, J. M., & Tversky, B. (2001). Event structure in perception and conception. Psychological bulletin, 127 (1), 3. Zelinsky, G. J., & Bisley, J. W. (2015). The what, where, and why of priority maps and their interactions with visual working memory. Annals of the new York Academy of Sciences, 1339 (1), 154–164. 130