Abstract
The visual system is required to compute objects from partial image structure so that figures can be segmented from their backgrounds. Although early clinical, behavioral, and modeling data suggested that such computations are performed pre-attentively, recent neurophysiological evidence suggests that surface filling-in is influenced by attention. In the present study we developed a variant of the classical Kanizsa illusory triangle to investigate whether voluntary attention modulates perceptual filling-in. Our figure consists of “pacmen” positioned at the tips of an illusory 6-point star and alternating in polarity such that two illusory triangles are implied to compete with one another within the figure. On each trial, observers were cued to attend to only one triangle, and then compared its lightness with a matching texture-defined triangle. We found that perceived lightness of the illusory shape depended on the polarity of pacmen framing the attended triangle, although the magnitude of this effect was weaker than when all inducers were of the same polarity. Our findings thus reveal that voluntary attention can influence lightness filling-in, and provide important data linking neurophysiological effects to phenomenology.
Introduction
The natural environment is cluttered with mutual occlusions among objects. Our ability to group common parts of goal-relevant objects while segmenting them from distracting objects is a critical visual function. Visual illusions provide powerful tools to probe the neural computations involved in such perceptual organisation. Since being popularised several decades ago 1, psychologists, neuroscientists, and philosophers have used Kanizsa figures to debate the mechanisms underlying the perception of occlusions, lightness and form. These figures give rise to a vivid percept of a shape emerging from sparse information, and thus demonstrate the visual system’s ability to interpolate structure from fragmented information, to perceive edges in the absence of luminance discontinuities, and to fill-in a shape’s surface properties 1.
Visual attention – focusing on some parts of an image – is known to modulate the perception of figure-ground relationships. Driver and Baylis 2 found that figure-ground organization of ambiguous figures was influenced by the region to which observers attended. Attended regions were interpreted as figures, whereas unattended regions were interpreted as ground. In addition to manipulating depth order, Tse 3 found that voluntary attention can influence surface filling-in. He developed a visual illusion that demonstrates that voluntary attention modulates the lightness of overlapping transparent surfaces. In one version of this illusion, three discs are positioned around a fixation point such that their borders overlap slightly. By covertly shifting attention to each disc in succession, the viewer can perceive the attended disc to be darker than the others, thus revealing that visual attention influences filling-in processes. In these cases, as well as others including Rubin’s classic face-vase figure or the Necker cube, visual attention modulates the appearance of surfaces that are defined physically. It remains unclear, however, whether visual attention modulates the visual system’s representation of illusory structure – i.e. structures that are generated by visual processing itself.
Psychophysical findings that visual attention influences figure-ground segmentation are consistent with the notion that visual attention acts on structure computed by border ownership cells. Qiu et al 4 investigated whether the responses of border-ownership cells in macaque area V2 are influenced by visual attention. Border-ownership cells signal which side is an object versus background for a given border. After determining the side preference of a sample of border-ownership cells, they displayed overlapping figures such that a border shared between two objects was positioned within the cells’ receptive fields. They found that some border-ownership cells signal figure-ground relationships in the absence of visual attention, but that the activity of other cells is modulated by visual attention. Qiu et al suggested that visual attention can influence figure-ground perception by modulating the gain of border-ownership cells, biasing perception more toward one percept over a competing one. Poort et al 5 further investigated the modulatory effects of attention on the responses of V1 and V4 neurons to object borders and surfaces. They found that, whereas the response of neurons to an object’s borders were relatively unaffected by attentional allocation, filling-in of an object’s surface was modulated by attention.
In contrast to the above studies suggesting an influence of attention over perceptual organization, other investigations suggested perception of illusory Kanizsa figures is pre-attentive. Mattingley et al 6 investigated whether Kanizsa figures were perceived by a stroke patient who experiences “extinction”, a phenomenal loss of awareness of contralesional stimuli when presented concurrently with ipsilesional stimuli. Despite gross lapses in attentional allocation to the contralesional space, this patient was nonetheless able to perceive Kanizsa figures whose illusory borders extended across the visual meridian. This finding suggests that filling-in is a pre-attentive process. Davis and Driver 7 drew a similar conclusion after having healthy participants perform a visual search task involving illusory figures. They found that the time taken to find an illusory figure in a display did not increase with additional search items, a hallmark of pre-attentive processing (but see reference 8). The role of visual attention in computing illusory objects thus remains somewhat contentious.
In the present study, we designed a multi-stable illusory figure to investigate whether observers’ visual attention can determine perceptual filling-in of illusory figures. Although an object’s boundary is typically defined by luminance, color, or texture contrast, borders can be perceived where no physical difference exists when spatially discontiguous visual features are interpolated to create illusory contours. The illusory edges of the sort produced by Kanizsa figures composed of isolated ‘pacman’ inducers elicit contour responses in V2 neurons 9 and produce a vivid percept of a shape. Of particular importance to the present study, the surface of Kanizsa figures are filled-in, resulting in an apparent surface lightness of greater contrast than its immediate surrounds 1,10. Kanizsa figures thus offer important insights into the mechanisms underlying perceptual organization by providing minimal conditions under which multiple phenomena arise.
Our new figure is composed of two spatially overlapping Kanizsa triangles (Fig. 1; Gaussian blurred to strengthen the effect, see also Fig. 2). On first inspection, the reader may perceive a grey star occluding six black and white discs. However, the black and white inducers are arranged so as to imply competing geometric forms. Whereas the black inducers imply an inverted triangle, the white inducers imply an upright triangle. These cues are equally physically salient so that, in the absence of attention, perceptual organization may be a random draw of any possible interpretation (e.g. a star, an upright triangle on top, or an inverted triangle on top). Thus, with a single image, we can manipulate observers’ attention and test whether there is a corresponding systematic change in filling-in.
We designed the stimulus in Figure 1 to control for the known lightness effect observed in the classic Kanizsa figure. In Figure 2A we show our novel stimulus alongside two other variants of the classic figure in which the inducers are all of the same polarity. When inducers are homogenous in their luminance, any influence of attention on the illusory surface could minimise or amplify the baseline illusion, and such an effect could be explained simply by an interaction of attention with the low-level stimulus properties. In contrast, we designed the mixed-inducer condition such that the perceptual outcome cannot be predicted by the low-level stimulus properties alone -- depth order and filling-in must be determined either stochastically, or according to an observer’s selective attention. It was also critical for our stimulus to imply two spatially overlapping figures to test the hypothesis that depth order is determined prior to the stage at which visual attention operates e.g. 6,11. Had we presented two spatially non-overlapping illusory figures, one defined by black inducers and the other defined by white inducers, any modulatory effect of attention on filling-in could be attributed to pre-computed structure (e.g. compare the apparent surface lightness of the homogenous-inducer stimuli of Fig. 2A). Therefore, the spatially overlapping implied triangles of our mixed-inducer condition allows us to assess if selective attention can modulate depth order of illusory surface properties inferred by the visual system.
Results
We investigated whether filling-in of an illusory surface can be modulated by voluntary endogenous attention. A typical trial sequence of our psychophysical task is shown in Figure 2C. Observers (n = 15) reported the apparent lightness of only the upward oriented illusory triangle or only the downward oriented illusory triangle by comparing it with a luminance-defined matching triangle presented on a background of white noise. To draw observers’ attention to one of the two illusory triangles, they were instructed to judge only the figure that matched the orientation of the luminance-defined triangle. We thus directed observers to attend to an illusory triangle defined by white (or black) inducers endogenously without making reference to the colour of the inducers. We refer to this condition as the “mixed inducer” condition. For comparison, we also included conditions in which all inducers were black or white (“homogenous inducer” condition; Fig. 2A). Whereas the critical mixed inducer condition allows us to investigate clearly the influence of voluntary attention on perceptual filling-in, the homogenous inducer condition provides a baseline in which conflict between competing structural cues is reduced. Finally, we further tested whether filling-in was affected by certainty of edge location by applying varying levels of blur to the Kanzisa figure and matching triangle background (Fig. 2B). The inducer condition, orientation of the illusory figure, matching triangle, level of blur, and display side of the matching triangle were all counter-balanced across trials (see Methods for all experimental details).
We defined perceived lightness as a point of subjective equality (PSE), the mid-point of the psychometric function fit to the proportion of “lighter” responses as a function of the contrast of the matching triangle. Psychometric functions fit to data collapsed across participants and blur condition are shown in Figure 3. These fits were calculated without the most blurred condition in which the results were different than the other conditions (see Fig. 4A).
PSEs for all conditions, averaged across participants, are shown in the right panel of Figure 4A. For all but the most blurred condition of the mixed inducer condition, illusory triangles defined by black inducers were perceived as lighter than illusory triangles defined by white inducers. This effect appears to be weakest for the two least blurred conditions. We verified these observations with Bayesian paired samples t-tests, comparing PSEs across inducer polarity within each blur level. There was strong evidence of a difference for the three most blurred conditions, but equivocal evidence of a difference of PSEs within the two least blurred conditions (BF10 in order of least to most blurred = 0.821, 1.627, 66.64, 10.585, and 80.393). Therefore, despite being shown the same image (within each blur condition), observers’ perceptual reports depend on their attentional goals.
We included a homogenous inducer condition in which all inducers were of the same polarity in each trial. This condition provides little ambiguity over the interpretation of the figure, but we nonetheless instructed observers to attend to only the shape cued by the texture-defined triangle. The homogenous inducer condition thus gives an indication of the upper-bound of the difference in PSEs between black vs white inducer conditions. Data from this condition are shown in the right panel of Figure 4A, and reveal a stronger effect of inducer polarity than the mixed inducer condition, with equal strength across all but the most blurred condition. Bayesian paired samples t-tests revealed strong evidence of a difference in PSE between the inducer polarities for the four least blurred conditions, but evidence against a difference for the most blurred condition (BF10 in order of least to most blurred = 28.919, 427.841, 297.566, 105.704, and 0.304).
The strength of the effect of attention on PSEs is further summarized in Figure 4B. In this figure we show the difference between attending to the figure in black inducers vs attending to the figure in white inducers for all conditions. To formally test for whether there was a difference between mixed vs homogenous inducer conditions, we submitted these difference scores to a 5 x 2 repeated measures Bayesian ANOVA, with factors blur level (5 levels of blur), and inducer type (2 levels: mixed vs homogenous). The best model included the two main effects (blur level + inducer type: BF10 = 5.264e+17). There was strong evidence against an interaction between factors (BF10 = 0.116).
Finally, we found weak to moderate evidence against a difference in the slope parameter of the psychometric functions for all black vs white inducer comparisons in Figure 4A (all BF10 < 0.5; min = 0.262; max = 0.444).
Discussion
We asked observers to allocate their attention to varying elements of a novel illusory figure and report the perceived lightness of the surface of the center of the figure. By presenting the same figure on each trial and only manipulating observers’ goals, we controlled physical aspects of the stimulus and found that observers’ reports of perceived lightness were contingent on to which illusory figure they attended. We do not think observers selectively reported the apparent lightness of only some parts of a pre-attentively perceived image in the mixed inducer condition for three reasons. First, the influence of selective attention over depth order and apparent surface lightness can be experienced first-hand by the reader by inspecting Figure 1. Second, we had observers verbalise their reports during an initial practice session (see Methods), and they frequently reported that the attended triangle appeared as the top-most surface. Third, Harrison and Rideaux have reported that selectively attending to only some inducers of the mixed-inducer stimulus results in the perception of illusory edges between those inducers 12, providing converging evidence that the depth ordering of the stimulus depends on observers’ attention control. We thus infer that apparent surface lightness of our novel illusory figure can be modulated by attention.
We found that the strength of filling-in was weaker in the mixed-inducer condition than the homogenous inducer condition (Fig. 4B). There could be several reasons for this. First, there is a larger signal implied by the homogenous inducer condition, since all inducers share the same polarity. Second, lapses in attention on any given trial will have more impact on the mixed-inducer condition in which the opposite polarity inducers imply competing surface arrangements. Third, the modulatory effects of attention on filling-in in our mixed-inducer condition may simply be weaker than in cases in which attentional selection plays no role, which may have been the case in the homogenous inducer condition if observers perceived a star. Addressing this issue requires further investigation. Nonetheless, our finding of a difference in perceived lightness when black versus white inducers were attended in the same image (mixed inducer condition) reveals that illusory lightness is contingent on observers’ attentional goals. For all but the most blurred conditions, we found filling-in was not substantially moderated by the amount of spatial blur added to the illusory figures induced by homogenous pacmen. This finding is consistent with the idea that filling-in involves broadly tuned spatial filters that are insensitive to high spatial frequencies. We did, however, find a reversal of the direction of filling-in for the most extreme blur (rightmost point in left panel of Fig. 4A). We speculate that the explanation for this is straightforward: because the inducers were so blurred that the implied tips of the triangles, and thus the illusory figure, were abolished. In this case, filling-in would be consistent with the contrast of the inducers, rather than opposite to them. It is also worth noting that observers have a large bias to report the inner area of the figure as lighter in their responses for the most blurred homogenous inducer condition. This observation is not predicted by current accounts of filling in but we speculate that it could be related to the blackshot mechanism 13 that amplifies the relative salience of the dark inducers.
It has been shown that visual attention can affect perception over short timescales in multiple ways, such as by increasing apparent contrast or spatial frequency 14. Any such effect in the present study requires an initial grouping stage, such that changes in contrast were selectively applied to only some features of the image. Selectively attending to only the inducers surrounding the cued target shape could cause illusory spreading of the apparent surface lightness. Whereas previous studies have shown that visual attention can influence depth order 2 or even surface lightness of semi-transparent discs 3, the case presented here is unique in that it shows observers’ task instructions alter the visual system’s inference, creating one of multiple competing illusory surface appearances and arrangements.
Recent evidence suggests that cells in V2 play an important role in this process 4,11,15,16. V2 cells are selective for an object’s borders and the relative position of the object. A network of such “figure-ownership” cells may be supported by grouping cells, thus encoding figure-ground structure essential for visual cognition. Indeed, recent work has shown that when visual attention is allocated to a region of space from which a grouping cell is currently receiving structured input, figure-ground segmentation is enhanced by feedback from the grouping cell to its connected figure-ownership cells 4,11,15. This model may account for our results. Attending to the upright (or inverted) triangle would result in the grouping cell enhancing the structured input to only the corresponding shape, biasing the processing of form and systematically altering perceptual organization. Our results are also consistent with recent neuro-imaging research in humans using electroencephalogram, in which top-down attention modulates early neural activity associated with illusory contour formation 17. Our study provides important data that supports the notion that such top-down signals modulate perceptual phenomenology.
Our study thus helps to bridge a gap between psychophysical and neurophysiological investigations of figure-ground segmentation. Consistent with the activity of V4 neurons found by Poort et al 5, we found that perceptual filling-in of an object’s surface is modulated by attention. However, unlike the conclusion of Poort et al that attention does not modulate the responses to the object’s borders, our preliminary data using a similar figure to the present study suggests border computations also depend on voluntary attention 12.
Author contributions
W.J.H. and P.J.B. designed the Kanizsa stimuli and psychophysical procedure. A.J.A. ran the experiment, and W.J.H. analyzed the data. W.J.H. and P.J.B. wrote the manuscript. All authors discussed the results and commented on the manuscript.
Data availability
The data that support the findings of this study are available from the corresponding author upon request.
The authors declare no competing financial interests.
Methods
Participants
15 observers completed the psychophysical experiment, including the authors. All were naïve to the purposes of the experiment, except for authors W.J.H. and P.J.B. This experiment was approved by the Northeastern University Institutional Review Board, and methods were carried out in accordance with the relevant guidelines. All participants were fully informed as to the requirements of participation prior to providing their consent.
Stimuli and procedure
Stimuli were programmed with the Psychophysics Toolbox 18,19 in MATLAB (MathWorks), and were displayed on a CRT (40 x 30 cm, 1024 x 768 pixels; 85 Hz refresh). Six inducer “pacmen” (diameter = 3° visual angle; jaw angle = 60°) were arranged at the corners of an imaginary hexagram with a uniform gray background (luminance = 20 cd/m2). From trial to trial, the inducers could be all white or all black (homogenous condition), or mixed (see Fig. 2). Maximum luminance of the white inducers was 40 cd/m2, minimum luminance of the black inducers was 0 cd/m2. Blur conditions were achieved by applying a Gaussian spatial filter with standard deviation of 0.1, 0.5, 1.9, 7.5, or 30.1° (Fig. 2). The matching triangle was an equilateral triangle (edge length = 6°) whose luminance varied according to an observer’s responses (see below). The matching triangle was presented on a background of white noise (mean luminance = 20 cd/m2), to which the same spatial filter was applied as for the Kanizsa figure. The apex of the matching triangle pointed upward or downward, depending on the condition (see below).
Observers sat in a dimly lit room approximately 58 cm from the display with a keyboard in their lap. Prior to each trial, visual noise masked the entire display for 500 ms. The Kanizsa figure was then presented on one half of the monitor and the matching triangle was presented on the other half (Fig. 2C). The side of the Kanizsa figure (and therefore the matching triangle) was selected randomly on each trial. Observers were instructed to press the left arrow key if the matching triangle was darker than the corresponding illusory triangle, or the right arrow key if it was lighter. The stimulus remained on the screen until a response was made. Responses were not time-limited and reaction times were not recorded: observers were instructed to not respond until they attended to the cued triangle.
To estimate PSEs from all 20 conditions in a single testing session, we used a 2-down 2-up staircase procedure that varied the luminance of the matching triangle according to an observer’s response series for each condition. Although this staircase converges on an observer’s point of subjective equality (PSE) – the luminance at which the matching triangle appeared equal to the attended illusory triangle – we nonetheless fit psychometric functions to constrain the PSE estimate by all observations (see Analyses section below). The staircase started randomly between ±20% contrast to ensure fitted psychometric functions were constrained. The initial step size was 10%, and halved after each reversal. A separate staircase was run for each of the 20 conditions, with all conditions interleaved. A staircase for a given condition ended after seven reversals, after which the condition was removed from the trial sequence.
For each of the five blur conditions (Fig. 2B), the inducers and matching triangle were arranged into four conditions: homogenous black inducers, homogenous white inducers, mixed inducers with the matching triangle cueing the white inducers, mixed inducers with the matching triangle cueing the black inducers. In all conditions, the orientation of the matching triangle was randomized to match the appropriate target illusory figure. Thus, with the five blur conditions and four inducer conditions, each observer was tested on a total of 20 conditions. Conditions were presented in random order until at least seven button press reversals were registered for every condition.
Prior to testing, observers were introduced to the classic Kanizsa figure, and then our modified mixed-inducer figure (Fig. 1). The experimenter then explained the task using static images that exemplified each of the inducer and blur conditions. When describing that the matching triangle cued the observer to which of the two possible illusory triangles to respond, the experimenter never referred to the inducer color, but only to the upward or downward pointing illusory triangle. The participants completed about 20 trials of practice and verbalized their responses in order for the experimenter to check they understood the task. Observers were encouraged to actively look at the matching triangle and illusory triangle, and to take their time with each decision. Training and all conditions were tested in a single session lasting approximately 30 minutes.
Analyses
Cumulative Gaussian functions were fit to the proportion of “lighter” responses at each luminance level of the matching triangle for each of the twenty conditions. The PSE was the luminance that produced 50% lighter responses, converted to Weber contrast: where Lm is the luminance of the matching triangle producing 50% lighter responses for a given condition, and Lb is the mean luminance of the background. This value was converted to a percent by multiplying by 100. Bayesian analyses were performed in JASP 20. In Figure 4B, we subtracted the mean PSE when the black inducers were cued from the mean PSE when the white inducers were cued for each blur condition.
Acknowledgments
The authors thank Dr Emily Wiecek for her constant critical feedback, without which we would not have developed our methods fully. We also thank Guido Maiello for providing feedback on earlier versions of the manuscript. This work was supported by NIH grant R01EY021553 (PJB) and the National Health and Medical Research Council of Australia APP1091257 (WJH).