Studies in Visual Perception, II
Motion perception in various settings.

by Jack Schwartz

Courant Institute of Mathematical Sciences. New York University

Nevertheless, it moves... -- attributed to Galileo

The aim of the extensive set of visual examples to be developed in this paper and its sequels is to illuminate and define a view of the way in which the initial stages of human (doubtless also primate, and,with some modifications, many other mammalian) visual system are organized, In broad terms, this is conceived of as a bottom-up process, carried by multiple families of cells all layed out in an essentially retinotopic correspondence.

The architecture which the images and animations given below seem to is as follows:

'Early' processes [pre-edge perception, 'dots' only]
These are high-speed, timing sensitive, and polarity sensitive.

'Late' processes [edge dependent]
These are slower, less timing sensitive, and relatively insensitive to polarity.

Our first animated figure shows a pair of periodically reversing Kanisza figures which suggest motions in 3-dimensional space, and even the 3-D shapes of the illusory moving objects.

Figure 1. Flipping Kanisza figures suggesting motion in 3 dimensions.

The next figure shows a smoother animation, which is even more strongly suggestive of an illusory object moving in 3 dimensions.

Figure 2. A Kanisza animation suggesting motion in 3 dimensions.

It is well known that motion can suggest 3 dimensional shape very strongly. Our next figure, an anaglyph showing one sided of a steadily revolving conical 'Christmas tree' shows this. The 3 dimensional shape of the object emerges very plainly. But which side of the cone are we seeing? If it is the front, then the tree, seen from the top, must be revolving clockwise; but if we are viewing the other side from inside the cone of the tree (or if the tree surface is simply a hollow 'facade') then the tree, seen from the top, must be revolving counter-clockwise. The eye prefers the first interpretation, since we normally see the front, rather than the reverse, surface of the objects we view (except for the occasional mask or mold.) Hence the tree is ordinarily seen to be revolving clockwise. (However, the other percept is not impossible.) If we view the figure anaglyphically with the blue filter over the right eye, stereo confirms this percept, which becomes inescapable. But if we then reverse the anaglyphic glasses, we immediately see that we are viewing the tree surface from the other side, and the perceived direction of rotation instantly reverses to 'counter-clockwise'. If either eye is then closed, we revert immediately to perceiving the tree's rotation as 'clockwise' These phenomena strongly suggest that formation of 3-D motion percepts comes subsequent to binocular depth perception in the flow of visual processing.

Figure 3. A 'revolving' anaglyph which suggests shape even when viewed monocularly.

Our next figure is a variant of Figure 3 in which the dots on both sides of the 'revolving tree' are shown. This doubling stands out quite plainly in anaglyphic view, even if the anaglyphic glasses are reversed. But here another phenomenon manifests itself. At any moment half the dots are moving to the right and half are moving to the left. This can be seen plainly enough even in the ordinary monocular view. But which of these two groups of dots, equal in number and in general statistics of motion, comprise the front, and which the reverse, of the tree? Depending on how this question is answered, the tree will be revolving either clockwise or counterclockwise, and we can expect either the leftward-moving or the rightward moving group of dots, constituting the front, to be perceptually salient. As one might suspect, both of these two quite symmetrical perceptions are possible. The eye switches bistably between them if one gazes at the figure long enough. (This does not seem to be controllable by an act of will, so one must be patient.)

If this same figure is viewed anaglyphically, the ambiguity in its direction of motion is unambiguously resolved since one knows which dots lie in the foreground. Clockwise (resp. counterclockwise motion) is always seen when the red (resp. blue) filter is over the right eye. However, even in this case the opposite perception makes its presence felt in brief evanescent flashes. If either eye is closed reversibility returns full force.

Figure 4. A 'revolving' anaglyph with a bistably reversing perceived direction of motion.

There is of course no reason why the depth perceived from dot motion and from dot stereo must confirm each other, as our next figure shows. Viewed non-anaglyphically, this Figure generates much the same 3-D shape percept as Figure 4, since the pattern of dot motions does not differ discernably from that seen in Figure 4. However the anaglyph has been adjusted so that when viewed anaglyphically the same figure shows a distinct 3-D 'crease', not corresponding to anything in the non-anaglyphic view. Viewed with the blue filter over the right eye (which brings the anaglyphic surface forward) the figure seems to consist of two smaller 'half-trees' turning in the same direction, with the moving dots flowing over the surface of the right-hand 'half-tree', then down into the 'crease', and finally over the surface of the left half-tree. With the glasses reversed one can see the expected 'back surface' and get a percept of counterclockwise motion symmetrical to that just described. But here perception sometimes seems to be dominated by the eye's apparent preference for interpreting a moving surface whose most rapid motion lies interior to it as a convex surface facing toward rather than away from the viewer as a surface bulging toward the viewer. At any rate with the red filter over the right eye perceived direction of revolution reverses from time to time, now being seen as clockwise, now as counter-clockwise. When clockwise motion is perceived the sensation of depth weakens.

Figure 5. A 'revolving' anaglyph with a anaglyphic depth different from its motion-inferred depth.

Next we set two 'doublesided' trees of the kind seen in Figure 4 side by side (but turned 60 degrees from one another to hide their identity.) This is to test whether the perceptual reversal noted in Figure 4 can affect them separately, or whether they always reverse simultaneously. it will be seen that they reverse simultaneously more often than not. But they sometimes can be seen to be revolving in opposite directions, indicating that the domain of coherence for the direction-of-turn percept is the single tree rather than the whole figure, but that there is some longer-range influence.

Figure 6. A pair of double-sided revolving trees.

As a basis for a more extended discussion of the interesting point raised in the previous paragraph we take the double-sided revolving sphere anaglyphs seen in the next two figures. Like the "Christmas tree" image seen in Figure 4, these generate to competing coherent motion gestalts, one of clockwise and the other of counterclockwise rotation about an axis, which is the y axis in Figure 7 and the x axis in Figure 8. Perception switches unpredictably between these two possibilities, occasionally for short enough periods for the sphere to seem to reverse direction, turn no more than 180 degrees,and then reverse direction again.

Figure 7 can also be used to demonstrate an effect that may be due to fatigue in some relevant population of cells and reflect a mechanism underlying the reversals ordinarily perceived. If the figure is viewed anaglyphically, its foreground and background surfaces are unambiguously distinguished, pinning down the direction of rotation quite firmly (but not absolutely!) If this view through anaglyphic glasses is maintained for a few seconds, say for 10 rotations of the sphere, and the glasses then removed suddenly, it will often (but not invariably) be found that the perceived motion reverses immediately, or perhaps within a single rotation. To test this, the viewer may want to try it, first with the red filter over the right eye, then with the blue filter over the right eye.

Figure 7. A two-sided anaglyphic sphere rotating horizontally.

When, as in our next Figure, the rotation is about the x axis rather than the y (so that we have a 'rolling' rather than a 'rotating' motion) the perception is surprisingly different. The perception of coherent motion is considerably more likely to break up into a perception of incoherent motion, or perhaps patches sharing a common motion separated by incoherently moving zones. Sometimes the perception os even that of the left half of the sphere turning in one direction while its right half turns in the opposite direction. This difference would seem to reveal a systematic horizontal-to-vertical difference in the ordinary shapes of the receptive fields of some relevant population of motion-sensitive neurons.

Anaglyphic glasses can be used to demonstrate the same fatigue effect noted in connection with Figure 7.

Figure 8. A two-sided anaglyphic sphere rotating about the x-axis.

The fact that when viewing images like those seen in Figures 7 and 8 we deal with two incipient gestalts of precisely equal weight between which perception switches sharply, and the fact, seen in Figure 6, that the domain of coherence for the direction-of-turn percept is a coherently moving zone of dots rather than the whole field viewed, suggest that images of this class may be excellent tools for investigating the rules apply to early stages of gestalt formation. Perhaps one can even get some useful sense of the mechanisms involved.

Our next figure begins to explore this idea by cutting the rotating sphere of Figure 7 into its top and bottom halves, between which an empty zone 20 pixels wide is placed. It will be seen that the two halves can often be perceived to move in opposite directions.

Figure 9. The two-sided anaglyphic sphere of Figure 7, cut horizontally.

If the distance separating the top and bottom of the rotating sphere is reduced to 10 pixels, it becomes substantially less likely for them to be perceived as rotating in opposite directions. The next figure demonstrates this.

Figure 10. The two-sided anaglyphic sphere of Figure 7, cut horizontally with a thinner cut.

However, as the following figure shows, if the empty zone separating the top and bottom of the sphere is a 10-pixel wide line instead of an empty zone, contrary rotations are more frequently seen.

Figure 11. The two-sided anaglyphic sphere of Figure 7, cut horizontally by a line.

But a vertical separating line does not break the sphere into zones allowing contrary motion, even if it is 30 pixels wide. This is shown by our next animation.

Figure 12. A 30 pixels wide vertical separating line does not break the sphere into zones allowing contrary motion.

Our next two figures show that if instead of separating the two halves of the rotating sphere by an occluding stripe we separate them by making one half of the sphere lighter than the background and the other darker, contrary motion of the top and bottom can easily occur. This suggests that reversal of intensity polarity separates regions more strongly that does the presence of an occluding line 10 pixels in width. There is also much more of a tendency for the perceived motion in each half to break up into patches moving separately, sometimes in opposite directions. With left and right halves separated in this same way the effect is similar. Opposed motion of the two halves appears occasionally, and there is also an interesting tendency for motion reversals to occur frequently.

Figure 13. A two-sided sphere against a grey background, with top and bottom halves of opposite polarity.

Figure 14. A two-sided sphere against a grey background, with left and right halves of opposite polarity..

Now we give two figures which indicate the (unsurprising) fact that brighter dots generate stronger motion signals. The first figure shows a two-sided rotating sphere in which all the dots on one half of the sphere are kept green and all on the other side are kept red. Since green is of greater intensity when seen against the dark background of the figure, the motion perceived is always that of the green dots. There is little or no tendency to reverse. In the second figure (Figure 16) the green dots are darkened to a greenish blue which roughly matches the intensity of the red when seen against the same dark background. This causes occasional reversal of the perceived direction of motion to reappear. It is interesting that one is little aware of the fact that as the dots move coherently through the perceived foreground they are always blue in the foreground and red in the background when the rotation is perceived as being counterclockwise, and the opposite when the rotation is perceived as being clockwise. The change of colors is not very salient, and unless watched closely the main groups of dots perceived seem to retain a constant color, generally blue.This image also tends to fall apart into patches perceived as moving in opposite directions.

When this last image is viewed anaglyphically the motion perception generated is quite different. There is of course no sensation of depth, since the red and the blue dots are not anaglyphically paired. Nevertheless they are separated, one group being seen by each eye. The percept that emerges is somewhat less that of the alternating rotations seen in the binocular view and somewhat more that of two simultaneously moving families of dots with opposite motions and different colors. This suggests each of the two single-eye inputs has its own motion detection system, or at least that these two inputs influence populations of coherent motion detectors that are partly distinct, though possibly also overlapping.

Figure 15. A two-sided sphere against a black background, with green dots always in the foreground and red in the background.

Figure 16.A two-sided sphere against a black background, with blue-green dots always in the foreground and red in the background.

The motion-reversal perceptions which appear in the last group of figures can be seen in simpler settings, such as the linearly moving family of dots shown in our next figure. This is contrived as follows. Each of the rows in the figure contains a sequence of dots at the same fixed separation. Four frames are shown in ann endless sequence. As we go from one frame to the next, the bottom (resp. top) row will have moved 1/4 of the inter-dot distance to the right (resp left.) Thus the expected perception of the bottom (resp. top) row will be of dots moving endlessly to the right (resp. left). The middle row of dots does not move at all, but simply flashes on (on cycles 1 1nd 3) and off (on cycles 2 and 4.) As one might expect, it is seen as moving anyhow. One can try to explain this in the following way. The coherent motions detected in the upper and lower rows may create corresponding 'motion images', of low resolution, interior to the visual system. Because of their low resolution these 'images', which are indications of the presence of zones of coherent motion in some particular direction, will tend to infect nearby areas containing incoherent motion with the same perceived coherency; hence these will be seen as moving in the same direction. Since in Figure 17 the context is evenly balanced between a lower zone inducing motion to the right and an upper zone inducing motion to the left, the middle row of incoherently moving red dots will be perceived as moving either to the right or to the left, these two percepts alternating intermittently.

Figure 17. Induced coherent motion in a row of dots. (See discussion above).

The frenzied shop-window of animated dots seen next, into which six demonstrations labeled (A) thru (F) have been crammed, explores some of the questions raised by the interpretation we have offered for the perceptions arising in Figure 17. (If all the motion in the figure is too much of a distraction from the individual demos, simply mask off all but one with a piece of paper.)

Figure 18. Six variant cases of induced coherent motion in a row of dots. (See discussion above).

Edge-dependent motion perception and edge perceptions arising from pure motion. Since almost everything one can say about vision is apt to prove hopelessly naive, it is now time to examine a class of motion percepts which indicate the presence of mechanisms going beyond those discussed above. The following figure can serve to open this discussion. It shows two alternating circular patches of incoherent random motion in a random-dot background. The motion seen at any moment is entirely free of any intensity or coherency cues. Nevertheless phi motion between these two patches of motion is seen. The fact that motion between states of pure motion can be sensed suggests that at last two levels of motion-sensing mechanisms, the second receiving inputs arising from the processing of signals generated by the first, may be present in the visual system.

One can also ask whether the transfer of coherent motion between zones can, like the transfer of incoherent motion, generate the phi-motion percept. The answer, as shown in Figure 19b, seems to be no. Instead, one gets a 'call and answer' perception. The reason for this bears consideration.

Figure 19. Phi motion between two patches of incoherent motion.

Figure 19b. Phi motion is not generated between two patches of alternating coherent motion.

The motion of patches of pure incoherent motion can even be continuous, as our next figure shows.

Figure 20. Continuous motion of patches of incoherent motion.

Figure 21 shows something even more surprising. It contains a rectangle filled with pure incoherent motion, within which we can readily distinguish a subarea also filled with pure incoherent motion but having a different 'motion texture'. The two areas of motion differ only in the following way: In the surround, the image alternates between two uncorrelated random patterns. In the distinguished area, which spells out the word 'Hi', the image alternates between a random pattern and its reverse. We can explain the fact that the eye distinguishes between these two kinds of incoherent motion in the following way. In the distinguished area, every pixel reverses on each cycle. In the surround, probability dictates that half the pixels are identical in the two superposed images, so only half the pixels reverse on each cycle. Thus the inner area is 'more intense' when considered as a pure motion image: it is, so to speak, 'motion-bright' within its surround of 'motion grey'. This understanding will be exploited later in the present paper.

Figure 21. Discernably different 'textures' within incoherent motion.

It turns out that even the motion of the edges of patches of distinct motion texture within a field of otherwise identical chaotic motion can be sensed. This is shown in Figure 22, which also shows that the 'jump back' occurring at the end of the small rightward motion of the letters 'Hi' generates a clear phi-motion percept.

Figure 22. Smooth and phi motion of textured areas within incoherent motion.

A related, particularly revealing instance of phi motion is seen at the end of each cycle of the next figure, which shows a line of incoherent motion moving coherently from left to right in a random-dot field. (The random-dot pattern changes behind the line, even though the eye does not detect this.)

Figure 23. A line of motion passing through a stationary random-dot field.

Note that any homogeneous texture, not merely the random-dot texture, would give this same effect, as shown by the following figure.

Figure 24. A line of motion passing through a stationary texture.

Here again the moving line of change is quite plain. The texture changes un-noticed behind it, and the flash of phi motion seen at the end of the animated cycle seems to move from left to right. We now ask, following Nakayama (see Tse, P., Cavanagh, P., & Nakayama, K. (1998). The role of parsing in high level motion processing. In T. Watanabe (Ed.), High-level motion processing: computational, neurobiological, and psychophysical perspectives (pp. 249-267). Cambridge, MA: MIT Press) the unexpectedly significant question: why from left to right, not from right to left? At the moment when the right-to-left motion has just restarted and is about to begin again, the actual input to vision is as follows: the strips to the left of the final position of the line and to the right of its initial position are stationary (as they are throughout the animation; a flash of chaotic motion coming from the sudden change of the second version of the texture back to its first version covers the whole of these two line positions; coherent line motion has just ended at the line's final position and restarted at its initial position.

Questions of this kind are explored in a sequence of interesting examples given in the cited paper of Nakayama et al, which our next few figures reproduce and elaborate. The following figure is basic to the examples given. In the figure on the left the central bar which appears is most strongly perceived as emanating from the left-hand rectangle by a phi motion proceeding left to right; in the figure on the right the same bar is most strongly perceived as emanating from the right-hand rectangle by a phi motion proceeding right to left. The only difference between the two figures is the position of the small stationary lines, which must therefore be determining the apparent direction of the phi motion.

Figure 25. Figural modulation of perceived phi motion.

Figure 26 shows this same effect in a cruder setting, in which the modulating lines of Figure 25 are replaced by sharp corners.

Figure 26. Figural modulation of perceived phi motion, version 2.

If the sharp corners in Figure 26 are rounded off by insertion of 'fillets', and the bar which appears and disappears is separated form the small rectangle by the addition of 'cuts', the perceived direction of phi motion reverses, and it now seems to be emanating from the large rather than the small rectangle. This is seen in Figure 27. It is interesting that all the elements responsible for the change of perception appear with the horizontal bar rather than before it, showing that the direction of the phi motion is being determined more by the appearance of the image after the bar appears than by the image prior to this.

Figure 27. Reversal of the perceived phi motion of Figure 26 by addition of 'cuts' and 'fillets'. (See discussion above.)

Our next figure probes the question raised by Figure 27 a bit more closely. It consists of two sections, both composed of three frames, which show the figure in three successive states: rectangle only, middle bar shown, fillets and cuts shown with middle bar. In the upper version of the figure, the appearance of the fillets and cuts is delayed 1/15 of a second after the appearance of the bar; in its lower version this delay is extended to half a second. (If the two figures together are two confusing visually, mask one of them off by resizing/scrolling your browser window.) It will be seen that in the upper figure the dominant perception of phi-motion direction is from the large bar to the small rectangle, whereas in the lower figure it reverses, and goes from the small rectangle to the large bar. This shows that the visual process which assigns the direction of pi motion operates for about 250 milliseconds after the appearance of the bar.

Figure 28. Effect on the perceived phi motion of Figure 26 when appearance of 'cuts' and 'fillets' is delayed. (See discussion above.)

A possible explanation of the effects involved, related to but not identical with that offered by Nakayama, is as follows. The appearance of the horizontal bar causes a 'flash of change' in the region that it occupies. A visual process operating subsequent to the elementary detection of this change determines the boundary of the region in which the change is perceived, and, if some section of this boundary can be joined smoothly with that of a per-existing image boundary, deems that boundary section to be the 'source' of the new object, and assigns the direction of phi motion accordingly.

The following figure tests this idea by comparing two cases of phi motion, one without 'cuts', the other showing their influence. In both a cross-shaped figure based on four rectangles flashes on and off. Where there are no cuts, the phi motion perceived runs from all four of the squares toward their common center. On the lower right cuts have been used to separate the cross shape that appears from three of the four squares, and so the phi motion perceived runs upward from the bottom square to the three others.

Figure 29. A cross-shaped figure showing the influence of 'cuts' and 'fillets'.

Figure 30 below is a variant of Figure 29 in which the cuts have been moved to cause two of the small rectangles to show short emanating bars, and a bent bar to emanate from the rectangle on the right. Note that no phi motion is perceived as emanating from the topmost rectangle, even though it is of the same color as the bar which appears, while the right-hand rectangle, from which phi motion does appear to emanate, must change color. The percept changes only slightly even if the right-hand rectangle remains black instead of changing to green. Evidently the presence of cuts exerts a stronger perceptual influence than the change of color.

Figure 30. A cross-shaped figure showing the influence of color, 'cuts' and 'fillets'.

It is well-known that Kanisza's illusory edges can generate phi-motion sensations that are just as strong as, and sometimes even stronger than, those generated by visible edges. Our next figure confirms this, and shows that the phi motion of illusory figures is subject to Nakayama's rules in the same way that visible figures are.

Figure 31. Phi motion of illusory shapes.

Phi-motion sensations can also be generated by regions defined by texture boundaries only, as shown by the following Figure 32. Here the sensation of motion proceeding from the left is weaker than in our preceding examples, but motion from the right is clearly absent.

Figure 32. Phi motion of shapes defined by texture boundaries.

Phi motion does not seem to have the same ability as normal coherent motion to 'infect' adjacent regions of chaotic motion. This may point to a revealing difference in the mechanisms involved.

It is hard to understand how all of the percepts seen in the last few figures could be generated unless edges detected by an initial analysis of moving (perhaps chaotically moving) micro elements are being passed as inputs to higher-level motion detectors.

Modes of visual perception can be considered to operate in parallel if percepts of all thee kinds can be handled simultaneously without excessively troubling interference. The following figure shows that this the case for simultaneous perception of pure chaotic motion, of texture, of color, and of coherent motion. The only arguable source of the perception of coherent motion is the set of edges of the red rectangle: every other part of the figure is filled either with chaotic motion or with stationary random dot patterns. It is also interesting that the coherent motion percept derived from the edges of the red bar infects the nearby regions of chaotic motion, making the whole moving area appear to move slightly up and down in a coherent way.

Figure 33.Simultaneous perception of chaotic motion, texture, color, and coherent motion.

Perception of chaotic motion is also compatible with stereo perception, as the following anaglyph demonstrates. Texture, color, edge-determined stereo, and coherent motion could also be worked in since all are compatible. This suggests that the visual system extracts images in at least these five modalities.

Figure 34.Stereo perception within a field of chaotic motion.

Our next two figures show that stereo can be detected from pure motion edges, implying that motion edges are definitely input to stereo, so at least first-stage motion must be detected earlier. The second figure in this group shows that motion edges set against intensity edges of either polarity can also generate stereo when set against intensity edges, confirming the fact that the binocular system is getting inputs of both kinds which are held in the same retinotopic register. In both cases one sees (view with red filter over right eye) a pair of transparent rectangles raised above the random-dot background, the rectangle on the left being raised more, that on the left being raised only slightly. Close inspection of the Figure 28 shows that binocular fusion between intensity and motion edges is poor: double vision persists even though the depth sensation is clear. This dissociation of two effects that are normally coupled closely suggest that a single binocular disparity detection system may be receiving inputs from variety of earlier visual processing stages, and feeding back signals that choke off the perception of one or more of its inputs when depth is detected, but that this repression is incomplete when several distinct edge modalities are combined.

Figure 37. Stereo perception can arise from pure motion edges. (View with red filter over right eye.)

Figure 38. Stereo perception can arise from binocular proximity of intensity edges with pure motion edges. (View with red filter over right eye.)

The dissociation we have noted of binocular depth perception from perceived edge fusion is confirmed by the following anaglyph, which shows two pairs of rectangles side by side. The upper pair contains a right-hand rectangle visible to the red-filtered eye and a left-hand rectangle visible to the blue-filtered eye. The lower pair reverses these. When viewed with the red filter over the right eye, the right-hand rectangle is clearly raised, particularly at its left edge. However, fusion between the two middle edges is fitful at best, even though the perception of binocular depth is steady. Much less depth is seen in the lower pair of rectangles, which ought on abstract grounds to be seen as recessed behind the random-dot plane. This may be explained by the general difficulty, noted elsewhere in this series of papers, of perceiving the depth of background figures, perhaps because foreground depth perceptions exert a systematic repressive effect on those behind. If the glasses are reversed the perception reverses upper-to-lower, confirming the depth perception which this anaglyph generates.

Figure 39. Rectangles demonstrating depth perception without edge fusion. (View with red filter alternately over each eye.)

Fine binocular percepts like those discussed just above are sometimes seen more clearly in a direct mirror view of a stereo image rather than in an anaglyph, which mutes the intensity of the two images seen, particularly the red-filtered image. The following figure, which is the origin of Figure 9 but which is set up for mirror viewing, shows the difference. The way in which it should be viewed using a mirror is explained at http://www.multimedialibrary.com/education/illusions/StereoTutorial.html

Figure 40. A variant of Figure 9 set up for mirror viewing.

Discussion. How might it be possible to account for the perceptual phenomena observed in the preceding and other related figures? In the bistable perceptions seen, we observe two equal and competing populations of moving dots, each capable of generating a corresponding coherent-motion gestalt, and from this perhaps also a perception of three-dimensional shape and motion. When perception switches between the two global gestalts, the whole group of dots supporting the gestalt can be affected simultaneously, for example by becoming more salient, being perceived as lying in front of rather than behind other dots not belonging to the gestalt, and as forming part of a clearly perceived 'object' rather than being mere 'background'. The phenomena observed seem to imply that feedback influences from the gestalt itself, however it may be encoded, can selectively influence the perception of just those dots deemed to be part of it. The effect can be compared in this regard to that seen in binocular perception, or in color-constancy illusions.

Figure 41. A Kanisza anaglyph seen as a bent 3-dimensional surface (View with red filter over right eye).

Figure 42. An anaglyph seen as the boundary a bent clamshell or belt-buckle surface (View with blue filter over right eye).

Figure 43. An anaglyph bounding a bent belt-buckle surface seen against a random dot background. (View with blue filter over right eye).

Next we give a variety of images which show that edges defined by distinct modes of perception can be seen as continuations of each other.

Figure 44. A rectangle bounded by contrast and chaotic motion edges.

Figure 45.A rectangle bounded by contrast edges of opposite characters.

Figure 46.A rectangle bounded by texture-difference and chaotic motion edges.

A surmise concerning 'early' and 'late' processes in vision. The visual phenomena explored in this paper and its predecessor confirm and sharpen a theoretical view of the 'pre-gestalt' processes of the visual system. This claims that he processes in question can be separated into 'early' and 'late' families. Some perceptual modalities, for example motion and binocularity, may be handled by both 'early' and 'late' mechanisms. We conjecture that the dividing line between these two families of visual modules comes with the perception of visual edges, so that the 'early' mechanisms could also be characterized as 'edge independent' and the 'late' mechanisms as 'edge dependent'. Common characteristics of the 'edge independent' mechanisms would seem to be high speed, consequent sensitivity to disruption by small temporal offsets, segregation of image micro-elements by polarity (i.e. sign of intensity relative to background intensity.) We provisionally assign random-dot stereo perception, perception of coherent random-dot motion, and Glass-pattern detection to 'early' perception. We assign edge-determined stereo, edge-determined coherent motion of regions, phi motion, region color ascription, color constancy management, generation of Kanisza' illusory edges, binocular fusion (seen as the repression of one (or both!) of the image elements entering into a binocular perception), and boundary determination for visible regions to 'late' perception.

Our next series of figures will attempt to buttress this theoretical surmise. We begin with an observation concerning intensity of chaotic motion. It was noted above that reversal of all the pixels in a black-and-white random-dot image generates a 'motion texture' distinct from that generated by reversal of a random half of them, presumably because in the first case the first-level sensation of motion generated is more intense. One can therefore generate chaotic motions of any desired degree of intensity my use of reversing masks ranging from solid black to random-dot patterns of low density. The following figure does this to achieve three levels of chaotic motion intensity, two of them appearing in surrounds of more intense motion. The reversing masks used to generate the three moving areas seen (by reversing every other frame) are shown below them.

Figure 50. Three levels of chaotic motion intensity.

Inside the left and right zones of this figure there appear discernable subrectangles within which the chaotic motion is somehow reduced, as if a translucent calming 'skin' had been placed over them. (The effect is best seen by loading the image into a specialized viewing tool like "GifBuilder", which is better able than an ordinary web browser to keep up with very rapidly changing sequences of images. Browsers can be used to best effect by resizing the browser window to fit just around the moving image to be viewed; this minimizes the image refresh demand on the browser.) But the motion which appears relatively stationary withing a more intensely moving surround is seen to be quite active when it itself appears as the surround of a rectangle moving less intensely, or in a stationary surround. This is presumably the analog of well-known grey-level surround effects, and may the same neural base, but within a conjectured 'motion image' rather than an ordinary intensity image.

If the remarks made in the preceding paragraph hold water, they give us a way of manipulating the intensity of one of the hidden intermediate images formed within the visual system, a 'motion intensity' image, directly.

It is worth noting that since the acuity of pure motion vision is low, the motion perception generated by a reversing mask set against a random-dot field is largely independent of the mask texture, and reflects only its average density of black (reversing) pixels. This is shown by our next figure, which shows three different chaotically reversing areas, with the masks used to produce them.

Figure 51. Chaotic motion generated by three different masking textures.

The following anaglyph shows that under favorable circumstances the edges detected at the boundary between chaotically moving regions of distinct motion intensities, in the sense explained above,

The theoretical view set forth above would hold that (a) The presence of chaotic motion a region is detected by an 'early' process, after which boundaries between region moving differently are detected and passed along to 'late' mechanisms which work with edges in a manner largely independently of their origin. (b) Consequently, notion of these edges can be sensed, and they can give rise to binocular, coherent motion, phi motion, and color constancy effects. Our next few images test these predictions. Coherent and phi motion for images of this class have already been demonstrated in Figures 19-22, so we can concentrate on the remaining visual modalities listed above.

Suppose first that we take a random-dot stereogram, like that shown anaglyphically in the next figure, which contains a raised random-dot rectangle along with a line rectangle in front of it.

Figure 52. An anaglyph containing a raised random-dot rectangle with a line rectangle in front of it. (View with red filter over right eye.)

A feature of this image worth noting is the characteristic 'transparency' which appears inside the raised rectangle, which makes it look something like a perfectly clear pane of glass. This percept always appears when an edge-derived perception of the depth inside a rectangle finds nothing in the appropriate region to which a suitable depth can be ascribed, and when the ascribed surface fit to a boundary like the rectangle cannot be continued amodally behind something which seems to hide a portion of it.

It is well known that if one of the two images in a random-dot stereo is inverted black-to-white, stereo perception of it is entirely destroyed, and breaks up into a chaotic stereo 'fuzz'. Our theoretical surmise therefore leads us to expect that if the red plane in the foregoing anaglyph is inverted black-to-white, the raised random-dot rectangle seen in it will disappear completely, but the raised line rectangle, perceived by a late line-dependent mechanism, will persist. Here is the test:

Figure 53. The anaglyph of Figure 52 after red-to-white reversal of its red plane. (View with red filter over right eye.)

It will be seen that the perception of depth attaching to the rectangle is still quite clear, even though clean binocular fusion of the white and black rectangles seen cannot be achieved, perhaps because this is an early process which characteristically treats image elements of opposite polarity separately. We can also note the tendency of that part of the random-dot fuzz which appears within the raised rectangle to rise into the plane of the rectangle. This can be regarded as an instance of the Treisman-like tendency, which we have already noted in various motion contexts, of perceptions of coherency, perhaps edge-derived, to infect nearby percepts lacking coherency. If the glasses are reversed, the rectangle cn be seen as recessed behind t he fuzz, though this perception is less distinct, in accordance with the general tendency for the depth of objects seen behind others to be muted.

Next we demonstrate that the effects noted above apply to perceptions of coherent motion also. The following figure shows a random-dot rectangle moving slowly and coherently over a moving random-dot background, in company with a line rectangle. Since inversion of successive frames tends to reduce the salience of coherent random-dot motions considerably, our theoretical surmise leads us to expect that if this is done to the animation shown in this figure the perception of the small random-dot rectangle will be significantly weakened or disappear completely, but perception the raised line rectangle will be little affected, even though it will be seen to flash alternately white and black. Figure 55 shows the extent to which this expectation is borne out.

Figure 54. Coherent motion of a random-dot and a line rectangle over a moving random-dot background.

Figure 55. Effect of inverting alternate frames in Figure 54.

Differences in time-dependencies of early and late visual processes. The theory proposed above holds that early visual processes, which operate very rapidly (typical timings being 50ms, corresponding to a frame rate of 20 frames/sec.) should be much more sensitive than late processes to 50ms delays. For example, delaying frame receipt by blanking alternate frames in a scene should have a much more destructive effect on perception of coherent motion of a random-dot texture than on perception of a like line-figure's motion. Our next animation shows that such is often the case. Note however that if the successive frames are projected at a rate significantly higher than the 15 frames/sec. seen in the figure, the fact that the random-dot background (which is moving at about twice the linear speed of the line rectangle seen) is moving coherently becomes clear again.

Figure 56. Coherent motion of a random-dot pattern and of a line rectangle with and without one-frame delay of successive frames.

The theory proposed would predict that the perception of illusory Kanisza edges will be little affected by the presence of random-dot backgrounds, or even chaotic motion,since these edges are generated by a late process carried out after whatever information can be gleaned from these patterns or motions have been extracted. The following figure bears out this expectation.

Figure 60. An illusory rectangle moving in front of a field of chaotic motion.

Note that the illusory edges show up even more strongly when seen against a chaotically moving background than they do against a featureless background (also, they show up more vividly against a white than against a grey background.) A possible explanation for this is as follows.

Next we turn to the examine color constancy effects, which e have assigned to the second stage of visual processing, which would lead to purely edge-dependent effects insensitive to the presence of a random-dot texture. The following image shows that the situation is not that simple.

Figure 61. Effect of a random-dot texture on region color fill-in.

Texture. Where, in the crude taxonomy suggested by the theory proposed above, are we to put the perception of texture, color's sophisticated but confusing cousin? One way of attempting to decide this question is to ask whether sudden texture changes can, like intensity, color, or motion-intensity changes (all of which are early-stage percepts), produce phi motion. Our next figure shows that this is possible.

Figure 65. Phi motion produced by a change of texture.

Note that the perception appearing in Figure 65 is quite different from that produced by the momentary flash of incoherent motion seen when a region in a textured area changes but retains the same texture. This latter case is shown in the following figure.

Figure 65b. Change of detail in an area, without change, of texture produces no phi motion.

We test our conclusion using a few other sample textures:

Figure 66. Phi motion produced by a second change of texture.

Figure 67. Change of detail in an area, without change of texture, produces no phi motion.

Not every change of texture that could be detected in a static presentation will produce phi motion. Our next figure illustrates this point by putting a patch of left-to- right reversed semicircles into a random-semicircles texture. Even though the semicircles texture seen in the background, so that the changed patch is reasonably salient in a static presentation, these two textures are too close to each other to produce a phi-motion percept.

Figure 68. Insufficiently salient change of texture produces no phi motion.

However, if the contrast between the two textures grows larger phi-motion reappears. We show this in the next figure by simply doubling the number of semicircles in the reversed patch.

Figure 68b. Making a change of texture sufficiently salient restores phi motion.

Even some texture changes drastic enough to be immediately visible in a static setting, for example a blending operation which 'greys out' some of the details defining the texture of a blob, may not suffice to generate a consistently stable phi-motion percept. Our next figure, in which the greyed-out areas sometimes undergo normal phi motion but sometimes seem to disappear by an 'expanding' motion of the background areas near them, shows this.

Figure 68c. Unstable phi motion percept produced by 'greying out' textural details of a phi region.

Our next figure shows that much the same sensation of phi motion as in Figure 65 is produced even if three (or more) textures are involved, so that 'objects' undergoing phi motion seem to change their texture (or color, or size) simultaneously.

Figure 68d. A phi motion involving a change of texture in the moving 'object'.

If only one of the two areas involved in a 'phi jump' changes its texture enough for phi motion to have appeared,, it seems to 'flash on' as a new 'object', but generates no sensation of phi motion.

Figure 69. Insufficiently salient change of texture in one of the two ends of a 'phi jump' produces 'flashing', but no phi motion.

We can consider two textures to be 'very similar' if a change for one to the other produces no apparent phi motion. This introduces a somewhat broader notion of relatedness than texture identity, which we define as the inability to detect patches copied from one texture and inserted into another. Our next few figures explore this notion, as it applies to a parametric family of random-dot 'Markov' textures derived by choosing the probability p that a pixel will be reversed from the pervious pixel in line. p can as usual any value between zero (which would produce a surface all in one color) and 1 (which would produce a perfect checkerboard. ) At p = 1/2 we have the standard uncorrelated random-dot texture. As the parameter p moves toward 1, the random black and white 'streaks' composing it become longer. Near p = 0 we have increasingly regular, checkerboard-lie patterns, made up of longer and longer 'streaks of regularity'.

As we have seen above, a large enough texture difference (in the following example, the difference between the p = 1/2 and p = 1/10 textures) produces a clear phi-motion percept.

Figure 71. Phi motion produced by a change between two 'Markov' textures. (See comments above)

The next figure shows that a phi-motion percept is still produced between the p = 1/3 and p = 1/2 textures.

Figure 72. Phi motion produced by a change between two other 'Markov' textures. (See comments above)

But between the p = 2/3 and p = 3/4 textures phi-motion disappears, as the following figure shows. Note that these textures are marginally distinguishable in a static presentation.

Figure 73. Phi motion is not produced by a change from the p = 2/3 to the p = 3/2 texture. (See comments above)

The preceding observations suggest that texture perception occurs as an 'early' visual process. Texture would seem to represent as much of object surface detail as can be extracted by the purely 'early' mechanisms of the visual system.

Perception of Glass Swirl. Should the perception of Glass swirl be regarded as an 'early' or 'late' visual process? To test this, we can ask whether swirl perception is easily disrupted by insertion of a small delay between the presentation of its first and second essential elements (as is typical for 'early' visual processes), or resistant to such disruption. (This means testing Glass perception in what we can regard as its 'dynamic form', namely perception of coherent rotating motions. Without the insertion of a 160ms delay between presentation of the two groups of dots comprising it, the following figure would generate the perception shown in Figure 75b. A small box has been included at the upper right of Figure 75 to demonstrate that the inserted delay does not disrupt phi-motion perception nearly as badly as Glass swirl perception.

Figure 75. Insertion of a 160ms delay between presentation of the two groups of dots comprising a Glass figure disrupts the coherent rotational motion of the figure, but not perception of ordinary phi motion.

Figure 75b. Without a delay between presentation of its two groups of dots, Figure 75 would show a clear rotating-motion percept.

We present one more figure to confirm the fact that insertion of a roughly 120ms delay does not disrupt the perception of phi motion, which belongs to our family of 'late' processes, nearly as much as it does Glass swirl perception.

Figure 75c. Phi motion of a texture, seen with a 120 ms delay between frames.

We conclude from the above that Glass swirl perception occurs as an early visual process.

Concerning Kanisza's Illusory edges. The geneal view of the visual system we have been exploring suggest that the marginally visible edges that appear in Kanisza's illusory figures are actually marginal percptions of image traces formedinternally to he vision system

Figure 75b. Illusory edges affect color perception in areas they surround.

A comment on color. As the next figure shows, acuity for color is much lower than that for intensity differences. Unless accompanied by intensity differences, pure color changes cannot be used to convey fine detail. The second figure below shows that relative motion or areas of differing intensity is also much more salient than the same motion of isoluminant areas. (By viewing the following figures through anaglyphic glasses, you will be able to see how pronounced the spectral differences within it really are. The fact that all of the gestalt-forming and motion-related effects surveyed in this paper and on this website are intensity rather than color dependent suggests that color perception is more a grace note to than a central feature of human vision.

Figure 77. Text approximately isoluminant with its background.

Figure 77b. Motion with and without a luminance difference.

Studies in Visual Perception, IIMotion perception in various settings.

by Jack Schwartz

Courant Institute of Mathematical Sciences. New York University

Studies in Visual Perception, II
Motion perception in various settings.