The Anticipatory and Task-Driven Nature of Visual Perception. Sebo Uithol, Katherine L Bryant, Ivan Toni, Rogier B Mars. Cerebral Cortex, bhab163, September 7 2021. https://doi.org/10.1093/cercor/bhab163
Abstract: Humans have a remarkable capacity to arrange and rearrange perceptual input according to different categorizations. This begs the question whether the categorization is exclusively a higher visual or amodal process, or whether categorization processes influence early visual areas as well. To investigate this we scanned healthy participants in a magnetic resonance imaging scanner during a conceptual decision task in which participants had to answer questions about upcoming images of animals. Early visual cortices (V1 and V2) contained information about the current visual input, about the granularity of the forthcoming categorical decision, as well as perceptual expectations about the upcoming visual stimulus. The middle temporal gyrus, the anterior temporal lobe, and the inferior frontal gyrus were also involved in the categorization process, constituting an attention and control network that modulates perceptual processing. These findings provide further evidence that early visual processes are driven by conceptual expectations and task demands.
Keywords: visual Categorization, MVPA, FMRI, conceptual knowledge
Discussion
We have shown that the nature of a stimulus (dogs or frogs) can be decoded from the fMRI data, primarily in left and right V1 and V2, and the right fusiform gyrus. Interestingly, the decoding accuracy was strongly dependent on the viewing task. Decoding image perception following superordinate-level questions was significantly less than following basic-level questions. This suggests that the activation in early visual areas is not solely driven by perceptual input, but a combination of the input and task properties, in line with “active vision” theories. This stronger decoding accuracy may be partly driven by the occurrence of more concrete predictions upon basic-level questions, but not entirely, since the cortical surface from which we can decode frogs and dogs is much larger than the cortical surface from which we can validate predictions.
Previous work shows that task properties (i.e., physical vs. semantic judgments) have an impact on the processing of object stimuli at several cortical sites, including ventral temporal and prefrontal regions (Harel et al. 2014). It has been shown that the usability of a presented object (e.g., tool vs. nontool) affects the occipitotemporal cortex differently (Bracci et al. 2017). Similarly, Nastase et al. (2017) found differences in brain response for a taxonomic versus an ethological judgment task in multiple brain regions, including occipital areas. These studies all found task-dependent processing of visual information, but only outside the primary visual areas. We, however, did find task dependence activation in primary visual areas. This may be due to the fact that our study has fewer categories (i.e., only 2 categories in 2 different tasks), compared to previous studies, which enhances statistical power drastically.
We speculated that prefrontal areas, specifically inferior frontal gyrus would be involved in modulating the activity in both temporal and visual areas. Indeed, these areas all seem to contain information about the task and stimulus identity, as reflected in above-chance decoding accuracy in Analyses 1 and 3.
The primate is an inherently visual animal, which is reflected in its elaborate visual system, including the so-called dorsal parietal and ventral temporal streams. It has been argued that the ventral, temporal stream evolved to allow an ever more abstract processing of the visual stimulus, which might provide the basis for our categorization behavior (Murray et al. 2019). In the ape and human lineages, this ability is more developed and possibly expanded to multisensory information (Bryant et al. 2019). As such, we expected that a network of prefrontal, temporal, and visual areas would underlie our capacity to use conceptual knowledge to process visual input. The anterior temporal cortex and the middle temporal gyrus, both bimodal association areas, are known to be involved in categorical decisions (Patterson et al. 2007). Indeed, it was possible to decode the level of abstraction of the required processing of the stimulus ventral anterior temporal cortex and middle temporal gyrus. The middle temporal gyrus result is particularly interesting, as it is close to the part of the temporal cortex that has most expanded and reorganized in the human, compared to the macaque, brain (Mars et al. 2018; Van Essen and Dierker 2007). The level of abstraction of the question itself could be decoded in a much larger set of cortical areas, including the inferior frontal cortex. Interestingly, these frontal and temporal areas are connected by specific sets of white matter fibers (i.e., the arcutate fasciculus and the inferior fronto-occipital fasciculus), some of which are particularly extended in the human lineage (Eichert et al. 2020). Our results suggest the involvement of these systems in tuning early visual processing for efficient task processing.
These results are in line with the framing of perception as a dynamic and task-driven process, tailored to the current needs of the cognitive system. Enactivist theories argue that cognition is not the representation of a pregiven world by a pregiven mind, but rather the enactment of a world and a mind on the basis of a history of the variety of actions that a being in the world performs. Within this view, perceptual capacities are embedded in a more encompassing biological, psychological, and cultural context (Varela et al. 1991). This active engagement with the environment is also suggested by more recent theoretical approaches to cognition (Hutto and Myin 2013; Myin and Degenaar 2014).
In line with this, we show that the early visual areas are tuned to those features in the environment that are relevant for the task at hand. The finding that left inferior frontal cortex shows significantly different activation patterns for basic-level and superordinate level judgment tasks suggests that the control this area exerts is not confined to behavioral control, but control over perceptual processing as well (Higo et al. 2011). This could also explain the absence of a univariate effect in our comparison of basic-level and superordinate-level trials. When perception is not a neutral process, but sense-making from the start, it would be equally task-driven in both conditions.
The finding of a behavioral difference suggests that the 2 decision processes (basic vs. superordinate) are not equally difficult. Superordinate categories are assumed to be less restricted in terms of visual input (e.g., the category “mammal” shows greater variance than the category “dog”). This increased difficulty is reflected by an increase in reaction time in the behavioral task. At the same time, the increased difficulty is reflected in a decrease of the cortical area from which the perceptual input could be decoded. Together with the fact that the increased difficulty is not reflected in gross brain activation (univariate BOLD result) during the viewing epoch of the imaging task, this suggests that the cortical areas are qually strongly but differently in nature involved in both tasks.
For efficient processing it is likely that task-dependent tuning to perceptual features primes the visual system before the actual perception. Indeed we have found evidence for expectations of upcoming stimuli in V1 and V2. A classifier trained on contrasting dog from frog questions was able to contrast dog from frog images as well. This anticipation surpasses low-level features such as lines and orientation, as different images were used per animal. This finding of modulation of V1 is in line with a recent reports showing that processes in V1 are biased by semantic categories (Ester et al. 2019) as well as action intentions (Gallivan et al. 2011). The finding of stimulus anticipation in V1 is in line with predictive coding accounts that recently have gained attention (Clark 2013; Rao and Ballard 1999). The influence of the level of the question we showed in V1 and V2 could partly be attributed to the presence of a concrete expectation of a dog or a frog in basic-level trials and the absence of such an expectation in superordinate trials, yet the cluster was far more extensive in the “levels” analysis compared with the anticipation analysis.
One could argue that the decreased decoding accuracy in superordinate trials is a consequence of differences in viewing behavior. Since participants were allowed to explore the presented image freely, it could be that viewing behavior in the superordinate condition was more variable. We did not collect eye-tracking data in order to quantify this potential difference, but the absence of a univariate results and the fact that the average difference in reaction time during the behavioral experiment between the 2 conditions was only 50 ms (note that the average saccades lasts 150–200 ms (Palmer 1999)), suggest that the contribution of differences in viewing behavior to the decoding effect is likely to be limited. Additionally, if indeed viewing behavior would play a role, one would expect this difference to be largest in the retinotopically organized occipital areas (e.g., V1). To the contrary, in our results, above-chance decoding is “preserved” in V1 and V2, and absent in more complex visual areas.
We cross-decoded questions and images, and questions and the gray screen between images and questions in order to check the nature of the anticipation present in early visual areas. The fact that we could not cross-validate questions and gray screens, but we could cross-validate questions and images suggests that the anticipation is a more complex phenomenon than mere sustained activity, and points toward more dynamical explanations (see for instance Wolff et al. 2017 for an example of such a model for working memory).
In all, these findings suggest that early visual areas are not processing visual input in a neutral or passive way. Rather their activation seems to be the result of anticipatory, task-driven processes, constituting an active engagement with the environment. These findings could have profound consequences for our understanding of how concepts are processed by the brain. Apparently, a frog-as-a-frog is processed differently than a frog-as-an-amphibian. Even the activity in the left temporal pole, which has been suggested to accommodate task-independent concept representations (Patterson et al. 2007), shows task-dependent modulation in our study. Our findings are thus more in line with classical pragmatists (Sellars 1963) and more recent enactivist (Hutto and Myin 2013) theories that suggests that the identity of a concept is (partly) grounded in the way a concept is used. This could provide a highly speculative, but interesting new explanation for the reported dependence of conceptual knowledge on perceptual systems (Barsalou et al. 2003): concepts can be seen as perceptual capacities, driven by parietal and prefrontal control processes, rather than internal representations. When concepts are much more use-based, as hypothesized, the question moves from how concepts are represented (Patterson et al. 2007), to how concepts acquire the stable character that they have in their (communicative) use. Part of the stability may be dependent on invariant structures outside of the brain, for instance in social practice or other behavioral patterns.