Script – Vision and the Neocortex

 
This is curt Doolittle for the propertarian institute.
 
[ INTRO ]
 
 
Welcome to Part 3 of the Nervous System.  
Our topic is vision and the neocortex.
 
Hopefully, in part two, we succeeded in developing an understanding of the common cortical algorithm – meaning scales of networks of competition for successful prediction – and that the cortex does about the same thing everywhere, and that each regions’ responsibilities are determined by what they’re connected to, and then all of these regions are connected to one another.
 
Now that we understand our powers of prediction using an emphasis on touch, let’s look at errors in prediction – using our most developed and sensitive of the senses, vision.
 
 
The light that hits our eyes is the same that its the sand on the beach, side of your house, the surface of a leaf. it’s just a very high frequency vibrations striking your retina. But, unlike sand, home, or leaf, we must disambiguate it in order to act by using it.
 
 
Over the past two and a half centuries, we’ve seen photography demonstrate, that what the human eye sees, and what the camera sees, are nearly identical – except for our rather exceptional ability to compensate for differences in luminosity. 
 
While we have color vision, most of the cells in our eyes sense luminosity – black and white, and we have quite a few neurons and groups of neurons that compensate for differences in light – the camera doesn’t have this feature … although we can use current digital technology to mimic our vision’s abilities today.
 
That said, the camera doesn’t need to develop a three dimensional model of the world so that it can act within it. If we were as ‘dumb’ as the camera, so to speak, we would only see darkness, white noise, or rather white noise including colors. We wouldn’t see objects, spaces, and scenes.
 
Somehow we have to disambiguate that noise into scenes, objects within those scenes, and the spaces between those objects, and hopefully recognize what objects and spaces can be acted upon within the scene.
 
The term we use for ‘importance’ is ‘salience’. So we have to disambiguate the important and salient from the noise.
 
In this sense we don’t see an illusion of the world, instead we see what the camera sees – but by ‘see’, we mean we only can focus our attention, and gather more information, about those elements of that world we are both able to disambiguate, and determine are salient.
 
And so we filter out ‘noise’ or ‘clutter’ by NOT disambiguating it from the background – or ‘scene’, but perceiving scene elements more like variations in texture than individual objects.  
 
There is no use disambiguating every leaf on every tree until we need to act upon the leaf, branch, or tree, rather than simply the background consisting of a wall of green trees.
 
And our eyes and cortex do the work of disambiguating our field of vision into a scene, using our talent for prediction, into the background, objects, and spaces is accomplished by incremental means of edges, shapes, textures, contrast, color, and time.  
 
What we have learned over the past decades is that we don’t see scenes and objects and then determine spaces, but that we determine scenes, backgrounds, objects and entities along with their textures, volumes, and spaces. Because we are not, like the camera, looking for pictures we are discovering opportunities to act.
 
So let’s look at some examples of each of these DIMENSIONS by which we can disambiguate the visual world into meaning and noise.
 
[ REVIEW TERMS ]
 
And lets review our hierarchy for a moment….
1. Point, or pixel, or cone in the eye
2. Edge or vertex
3. Fragment 
4. Texture or surface or face
5. Feature 
6. Object – meaning a volume that occupies space.
7. Model (category say – cats)
8. Entity (existential reference – cat we know)
9. Path 
10. Space 
11. background 
12. Scene
 
Here we go.
 
—[ OPTICAL ILLUSIONS ] —
 
[ WHITE NOISE ]
 
This is a video of white noise. You’ll either disambiguate it into space like a window, a surface like texture, or retain emphasis on the background and see just a rectangle of noise, depending upon depending upon whatever habitual bias you’ve developed.
 
 
[DEPTH OF FIELD]
This is an image of zebras in stereoscope. it illustrates disambiguating by depth of field. It takes some concentration, but most of us can alter depth of field (meaning cross our eyes) and disambiguate the zebras from the noise of the background.
 
[ ZEBRAS ]
 
This image of zebras is easier to disambiguate – partly because of their manes. If it was a video rather than a still image it would be harder. But do you notice how your separation of any zebra from the background requires finding it’s head, then toggling back to the ‘scene’, and searching for another head to disambiguate? 
 
Zebra manes, stripes, and eyes are pretty hard to disambiguate. Because the stripes look like edges and overload our edge finding ability – they constitute a texture rather than an edge.
 
A lion faces the challenge of not only disambiguating a scene into surfaces, surfaces into models of a zebra, and even finding a space to act between them.
 
[ DALMATION – TEXTURE ]
 
In this image we see another difficult disambiguation of texture into image – this time, instead of overloading with etches, because we lack edges, and are overloaded by texture.
 
We have to predict (infer) the edges by the pattern of the texture – in this case – splotches and dots.
 
Some of us will succeed in predicting the Dalmation from the texture rather quickly, some will take longer and some will give up.
 
[DALMATION – TEXTURE SHOWN ]
 
And here we add a little contrast to show those edges that we predicted.
 
[ DALMATION – SECOND ]
 
In this image we learn a little more about using textures to predict edges, and with those edges the figure inside. Is this example any easier?
 
So, to explain, our receptors, neurons, fibers look for the relations between nearby pixes (meaning sensors), and try to find a higher level pattern using them, that we can disambiguate into scene, background, space, and object, using edge, texture, luminosity and color.
 
[ CAT ]
 
Even with ambiguous edges we can still disambiguate a common model in motion, because the relative consistency of related pixels over time.  
 
Although, in  a little bit, we’ll show the limits of that ability.
 
 
[ BARS ]
 
This is the most common optical illusion – it shows how we disambiguate the feature at one end of the bars, and the feature at the other end of the bars, and the texture between them, and predict them into an object, that isn’t internally coherent. 
 
[ ELEPHANT ]
 
Again, we see we correctly disambiguate the elephant from the background using major edges like upper outline, the head, trunk  and legs as separate features, and we probably sense something is incoherent (or incommensurable), and when we put our focus on the legs we see they are. 
 
So again, I use this example to show you how you can disambiguate the features of the body, head, feet, and predict a coherent model despite internal incoherence of the features of the model.
 
[ TEXT ]
 
One of the best illustrations of our predictive ability is this popular example from the early days of the internet, where as long as the first and last letters are in position the rest don’t need to be, any more than the elephant’s legs or the bars need to be, for us to disambiguate the word from other words and understand them.
 
But we can also identify some textures, edges, fragments and objects, and still correctly predict the entity even if the objects are not correctly constructed.
 
So hopefully we are succeeding in helping you grasp how pixel-to-pixel relations over time accumulate into edge or texture, and then into sufficient fragments, then to objects, then to entities, by expanding increases in the scope of our vision.
 
And that that while these predictions can err in one way or another, we find that their ability to succeed in predicting something actionable despite errors or missing information is an advantage to us.
 
WE don’t think of this imprecision as an advantage, but it is.  We can always focus our attention further and get more information if we want, but at the same time we can rapidly disambiguate the world by prediction from sparse information.
 
 
[ CHESS ]
 
Now in this image we will note that the squares labeled A and B are actually the same color, but in order to compensate for differences in light and dark, we evolved the ability to compensate in ways that the camera does not – at least without digital manipulation.
 
if you place the two squares next to each other the similarity will appear obvious, but it’s important that we note that our eyes and brains predict fragments, objects, and entities within a scene differently when it comes to light and dark versus color.  This is one of the advantages to the structure of our eyes, which detect light and dark separately from color.  
 
 
[ BALLS ]
 
The same applies for color.
 
All of the balls in this picture are the same color: brown. 
 
Your vision predicts the general color of the balls from fragmentary information.  The bars are more consistent than the light and dark of the balls, which, being ‘brown’ are a muddy combination of many colors.
 
 
 
[ BALLS WITH MASKS ]
 
Personally, I didn’t believe it until I saw THIS IMAGE with masks that limited us to the background color. Even now that I understand it I cannot visualize it without the masks.
 
 
[ TIME VIDEO ]
 
Now we come to the problem of time. Time between exposures so to speak.
 
The next few images are separated by darkness. See if you can identify what is changing between the two images. 
 
(I couldn’t do it.)
 
 
So did you see it?  Ok, so look at the bottom of the wing of the plane just to the right of the stairs.  
 
Now do you see it?  
 
The engine was not there, then there, then not. But you can’t notice the change because of the time delay.  
 
Why? Because it is very hard to identify a change in background feature amidst the other noisy objects and entities.
 
So our visual system fails to disambiguate some contents from the scene, and leaves them as noise. 
 
This is important for us.  We see what we disambiguate. what we fail to disambiguate might just be noise.
 
This sounds like a bad thing but it’s just efficiency, because we would have to further disambiguate the scene into meaningful objects and spaces if we didn’t minimize the stuff that competes for our attention.
 
In other words, the engine wasn’t sufficiently disambiguated to compete for our attention, given all the stuff in the scene (particularly people) that must be disambiguated.  
 
This is why where’s waldo pictures are so successful as puzzles. most of what we think we see is predictions.
 
[ BACKGROUND ]
 
This is a fairly well known case, with examples dating back to the nineties, seventies, and sixties. 
 
The AI software in this recent photo-classification exercise was incorrectly identifying this husky as a wolf.  
 
When they looked into what parts of the image the software was using to make the decision it turned out to be the snow. 
 
The sample data had included wolves in snow. So it sorted the picture by the background rather than the foreground.
 
I included this because it’s an example of how people as well as AI’s can be confused by context and background.
 
[ SALIENCE ]
 
This image shows eye tracking as the viewer sequentially focuses his 5 degrees of detail vision on features of the image. 
 
This image shows how men and women valuate the female body differently – the shoes always make me laugh. As does the reflection in the mirror.
 
Here is another one.
 
This image shows how men and women valuate each other’s bodies differently. Which is about what you expect. Triangular men, and breast, hip, waist ration of women, with both using our most accurate measure of genetic health, and condition in life – our facial construction and expression.
 
So Salience varies. For example. On average, little girls are a lot more afraid of spiders and snakes than little boys. Movement is more important that stasis.  Predators and prey more important than background. Humans are the most important of all.
 
[ TEXT REVIEW ]
 
So far we have covered Dimensions of prediction: 
1. Pixels, points, cells, or cones
2. Edges, 
3. Textures, 
4. Motion
5. Features, coherence of incommensurable features
5. Scrambled fragments in text, 
6. gray-scale, luminosity, 
7. color
8. Time
 
 
Now let’s look at the underlying biology, and tie it back to our model using geometry.
 
[ PRIMARY VISUAL PATHWAYS ]
 
Vision alone uses half of our nerves and almost as much of our processing power. Since ‘everything rolls forward’ in the cortex, vision is eventually associated with everything.
 
There are about thirty regions in the brain that participate somehow in vision. We are not going to cover all of them, (I don’t know how) and so we’ll limit ourselves to understanding how vision enters the common cortical algorithm.
 
So in this image of the Primary visual pathwways, Light enters the eye and strikes the photo-receptors on the retina – meaning pixes or points. The retina does do some processing and transmits the information to the thalamus, which regulates it what passes thru to the visual cortex at the back of the brain, and the visual cortext divides the information into vast parallel processing and quckly assembles entire scenes from it.
 
It appears that when the thalamus is regulating information it is controlling the ratio between observation and visualization – or external and internal vision. This allows us to pay greater attention to what we observe or what we visualize, imagine, or ‘think’.
 
Again I’m going to reiterate that some people have no inner visualization ability, and others near perfect, and others have no inner voice (dont talk to themselves) and some of us near perfect. 
 
Rough numbers – that I put little trust in – are ten percent no inner vision, and twenty percent no inner voice. (Something those of us with both find terrifying.) 
 
As someone that spent a lot of time drawing everything possible, and clearly understanding that you must visualize on the page what you wish to render there, but who has some difficulty visualizing decor (colors and patterns) I can sort of appreciate this while being horrified by it.
 
[ EYE ]
 
The eye is a complex organ. The muscles point the eye. The lens controls depth of field. the retina takes in light and returns information.
 
So our starting dimensions are 
1. Position of the eye (or direction)
2. Depth of field
3. Sensitivity to light.
 
[ RETINA ]
 
The eye contains layers of cells, and strangely, they’re layered backwards from what we’d assume, with the transport nerves on top, the nerve cells – that are transparent – and the photoreceptor cells at the back of the eye against the pigmentation layer.
 
There are two categories of photo-receptors in the retina: cones and rods. 
 
There are 100 million rods and cones on the retina of each eye. Rods are sensitive to dim light but insensitive to color. (On a moonlit night the light is too dim to see color.) 
 
Cones on the other hand operate in bright light and discriminate between colors. 
 
Humans have three different cone cell types, One most sensitive to red light, one to green and one to blue. 
 
The rods outnumber the cones 10 to 1 except in the center of our view, near the macula where the cones predominate. (This is why you have to look to the side of a dim star to see it.)
 
After light is detected by the cones, neurotransmitters carry the signal to the next layer in the retina. Here nerve cells called “opponent” cells compare the activity of the red versus the green cones. Then this combined “yellow” signal is compared with the blue cone by a second set of “opponent” cells. The result of these color differences is then sent to the brain. At the present time there is a lot of controversy over color processing after the retina. I can’t opine on it – I don’t know enough to do so.
 
7% of men and 0.4% of women see red and green differently from the rest of the population. They are red-green color blind. The recipe for the red and green cones reside in genes on the X chromosome. Women have two of these while men have only one. The DNA sequences for the red and green receptors differ by only 2%, evidence that they diverged recently.
 
Peripheral vision is sensitive to scenes, movement, and low light, while central vision concerned with spaces, objects, textures, and colors.
 
My understanding is that in addition to transporting information, the retinal ganglion cells detect micro motion, just as the pre-frontal cortex’s eye field function is to track object motion.
 
Retinal Bipolar cells are the only neurons that connect the outer retina to the inner retina. They implement an ‘extra’ layer of processing that is not typically found in other sensory organs.
 
There are at least 13 distinct types of bipolar cells that systematically transform the photoreceptor input and generate specific channels that encode properties, such as 
– on or off (polarity), 
– contrast, 
– continuous or transient
– light, dark, and color composition.
As a result, bipolar cell output signals represent elementary ‘building blocks’ or ‘fragments’ that the inner retina disambiguates it features.
 
So, in our geometric language, we see the eyes performing at least the following dimensions:
 
So our starting dimensions are 
1. Position of the eye (or direction)
2. Depth of field
3. Sensitivity to light.
Where sensitivity to light dimensions are:
1. Change (on, off, sufficient, insufficient)
2. Proximity between cells (constant relations)
3. Light and Dark 
4. Color 
5. Motion
6. Distance (focal length)
 
The first order neurons terminate in the second order neurons, which reach the optic nerve, and the third order neurons terminate in the primary visual cortex. So information is organized, consolidated, and relayed to the visual cortex.
 
So, much like the cortical columns process information between, in, and out of layers so does the retina.  
 
Our primary concern here is that even in the eye there are cells performing calculation, and preprocessing information in a division of labor, and that this division of labor consists of taking fragmentary information and disambiguating it – again by prediction -, and that there are both goods and bads that result from this prediction using fragmentary information – goods and bads that can only be resolved by our use of attention to produce direct observation of a limited field of vision, with more dense and color-sensitive neurons using continuous recursive sampling over time.
 
(This is about the depth of my knowledge of the eye and optic nerve. I am not skilled enough to cover the internal operations of they eye, or the optic nerve in any greater detail, and it is not necessary for the work we are doing – but, that said – of the senses, and of the brain in general, this is one of the best understood areas of the nervous system, that like speech recognition, we have been most successful in implementing using artificial intelligence – and you will easily find videos and papers by the top people in the field in anatomy, medical, cognitive, and computer science fields.)
 
[ VISUAL PATHWAY ]
 
Continuing with the visual pathway, we’ move through the thalamus ( that section of the thalamus called the Lateral Gen-IH-cue-late Nuclei ), through the optic radiations ( meaning nerves) and into the primary visual cortex, in the occipital lobes.
 
[ VISUAL COLUMNS ]
 
Unlike other cortical columns, columns in the visual cortex are specialized to identify angular relations between pixels (cells, fragments), from fields in both eyes, and organized into dense corticle modules. 
 
The work accumulates into ever larger receptive fields. meaning that features are combined into objects, large objects crossing our field of attention, into spaces across our field of vision, into that include scenes, and finally into simulations whether in our field of vision or not). 
 
Note that color is separately processed, so like our 3d modeling analogy, faces, objects, and surfaces are ‘painted’ separately. 
 
And that as we construct objects we seek to unify and make commensurable by detection of motion (as we saw with our cat and stripes). And that we attribute greater salience to whatever’s in motion.
 
In this sense you can think of the visual cortex as reflecting the organization of the eye, with ever greater fields of vision radiating outward in a sort of target shape, trying to construct something coherent out of pixes, fragments, features, objects, models, entities, and scenes.
 
Each iteration passing it’s puzzle pieces forward to the next, and competing for attention by successful prediction over time.
 
 
[ DORSAL PATHWAY VS VENTRAL 1 ]
 
Our brains divide the work along the dorsal and ventral pathways 
 
……..SALIENCE GOES HERE…….
 
 
[ DORSAL PATHWAY VS VENTRAL  2 ]
 
The objects that we successfully disambiguate into as candidates for action are processed laterally on the ventral (WHAT) pathway.
 
The spacial locations we can cannot disambiguate into objects remain in the scene and are processed vertically on the dorsal (WHERE) pathway, so that we can navigate through space.
 
So, again we see a division of labor between environment, spaces, and objects of interest 
 
(I think in terms of routes(WHERE) vs prey(WHAT)).
 
What I’d like to get across to you is that the spatial information is something, like peripheral vision that we more ‘feel’ than ‘see’ In other words, peripheral, boundary, background, scene and spatial information is always at lower cognitive resolution that objects, and objects in the periphery our attention at lower cognitive resolution that whatever we focus our attention upon. So we rely more on prediction the farther away from the narrow field of our attention, and even lower when we are splitting our attention between what we are doing and what we are imagining, and even lower when we are doing nothing and almost entirely imagining.
 
So it’s those three axis I’d like you to internalize.
1. paths, spaces, periphery, and boundary 
2. object we’re manipulating, object of our attention, objects competing for our attention. 
3. the distribution of our attention from external observation to external imagination.
 
[ FEED FORWARD BRAIN CIRCLES ]
 
And we feed forward this information from neurons, to columns, to collections of columns, to regions, to networks, and then progressively forward in greater and greater associations until we have constucted a model of possible actions.
 
[ FRAGMENTARY MAN ]
 
And so lets return to our artwork example, which is pretty much what our brains to to compose an entity that we act in response to.
 
 
[ COHERENT ]
 
So coherent patterns whether in focus, border of our attention, periphery or out of range of our attention, like the legs of the elephant or the engine on the plane resolve or they fail to resolve and are ignored as error.
 
[ SHPERE OF COHERENCE ]
 
In the engine on the plane example the difference between states is interpreted by your brain as irrelevant error since it is only background, and did not survive in short term memory.
 
[ WORLD VIEW ]
So our world is constructed from predictions that are determined by their action ability, our attention, and the distribution of our attention between observation and imagination.
 
[ REVIEW ]
 
Our goal has been to provide you with a basic understanding of how information moves through the visual system of the brain, and the dimensions of causality at the eye and at the visual cortex, and how our attention using the thalamus affects resolution, and lastly how whether in the eye or visual cortex the common cortical algorithm is the same: disambiguation of stimuli using recursion in to relations between our senses, body, objects, and environment.
 
[ ARTICULATE IDEAS ] 
 
If we’ve been successful you will be able to articulate experiences people describe in operational terms using those dimensions of causality we’ve covered.
 
GEOMETRY
1. Point, or pixel, or cone in the eye
2. Edge or vertex
3. Fragment – (part of the “4”, a spot, “more than an edge”)
4. Texture or surface or face – (stripes on a zebra, splotches on a dog)
5. Feature – (tail of a cat, edge of the bars, trunk of elephant)
6. Object – (a volume that occupies space. part of something, a rock, a bush)
7. Model – (category say – cats)
8. Entity – (existential reference – cat we know)
9. Path – Navigable space between occupied space.
10. Space – (a space without need for a path)
11. Background – (wall of trees, buildings, limits to movement)
12. Scene – Everything in a field of view.
 
EYE
1. Change (on, off, sufficient, insufficient)
2. Proximity between cells (constant relations)
3. Light and Dark 
4. Color 
5. Motion
6. Distance (focal length)
7. Position or direction
 
THALAMUS
1. Attention (importance, salience)
2. Distribution of Attention to observation or imagination.
 
OCCIPITAL LOBE (VISUAL CORTEX)
1. Edges > Fragments > Features > THEN
…… WHERE: Backgrounds > Spaces > Paths > Location
…… WHAT: Objects > Models > Entities > Position
…… THEN 
…… …… Manipulatability?
2. Textures 
3. Colors 
 
FIELDS OF VISION, PREDICTION, AND SIMULATION
1. Our narrow field of vision
1. Our eye’s peripheral field of vision
2. Our predicted scene assembled from that vision
4. OUr predicted Simulation assembled from those scenes
 
TIME
1. By ever increasing field of vision
2. Over continuous recursive resupply of information
3. By movements of the body, head, eye, and subject matter.
4. Creating a continuously modified simulation of the world.
5. Where we discover continuously changing opportunities to act, to look, manipulate, or move.
 
(information density vs prediction =…)
 
PREDICTION ERRORS 
But where we are victim to errors in prediction by
1. Depth of FIeld 
2. inability to disambiguate edges (zebras)
3. inability to disambiguate textures (dogs)
4. inability to disambiguate features of objects (bars and elephants)
5. inability to disambiguate features thereby overlooking details of them. (scrambled words)
6. inability to disambiguate shades because of compensation for light and dark (chessboard)
7. Inability to disambiguate colors (balls) 
8. Inability to disambiguate features over time (engine)
9. inability to disambiguate context because of background (wolf/husky)
 
[ UP NEXT ]
 
We probably have a good handle by now on how we disambiguate objects (the WHAT). Next we’ll take a look at the dorsal pathway (THE WHERE) and the other major discovery of the past few years – position, path, location.
 
[ CLOSING ]
 
This is Curt Doolittle for the Propertarian institute.
Was this page helpful?

Leave a Reply

. . .