Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

The computational magic of the ventral stream

View through CrossRef
AbstractI argue that the sample complexity of (biological, feedforward) object recognition is mostly due to geometric image transformations and conjecture that a main goal of the ventral stream – V1, V2, V4 and IT – is to learn-and-discount image transformations.In the first part of the paper I describe a class of simple and biologically plausible memory-based modules that learn transformations from unsupervised visual experience. The main theorems show that these modules provide (for every object) a signature which is invariant to local affine transformations and approximately invariant for other transformations. I also prove that,in a broad class of hierarchical architectures, signatures remain invariant from layer to layer. The identification of these memory-based modules with complex (and simple) cells in visual areas leads to a theory of invariant recognition for the ventral stream.In the second part, I outline a theory about hierarchical architectures that can learn invariance to transformations. I show that the memory complexity of learning affine transformations is drastically reduced in a hierarchical architecture that factorizes transformations in terms of the subgroup of translations and the subgroups of rotations and scalings. I then show how translations are automatically selected as the only learnable transformations during development by enforcing small apertures – eg small receptive fields – in the first layer.In a third part I show that the transformations represented in each area can be optimized in terms of storage and robustness, as a consequence determining the tuning of the neurons in the area, rather independently (under normal conditions) of the statistics of natural images. I describe a model of learning that can be proved to have this property, linking in an elegant way the spectral properties of the signatures with the tuning of receptive fields in different areas. A surprising implication of these theoretical results is that the computational goals and some of the tuning properties of cells in the ventral stream may follow from symmetry properties (in the sense of physics) of the visual world through a process of unsupervised correlational learning, based on Hebbian synapses. In particular, simple and complex cells do not directly care about oriented bars: their tuning is a side effect of their role in translation invariance. Across the whole ventral stream the preferred features reported for neurons in different areas are only a symptom of the invariances computed and represented.The results of each of the three parts stand on their own independently of each other. Together this theory-in-fieri makes several broad predictions, some of which are:-invariance to small transformations in early areas (eg translations in V1) may underly stability of visual perception (suggested by Stu Geman);-each cell’s tuning properties are shaped by visual experience of image transformations during developmental and adult plasticity;-simple cells are likely to be the same population as complex cells, arising from different convergence of the Hebbian learning rule. The input to complex “complex” cells are dendritic branches with simple cell properties;-class-specific transformations are learned and represented at the top of the ventral stream hierarchy; thus class-specific modules such as faces, places and possibly body areas should exist in IT;-the type of transformations that are learned from visual experience depend on the size of the receptive fields and thus on the area (layer in the models) – assuming that the size increases with layers;-the mix of transformations learned in each area influences the tuning properties of the cells oriented bars in V1+V2, radial and spiral patterns in V4 up to class specific tuning in AIT (eg face tuned cells);-features must be discriminative and invariant: invariance to transformations is the primary determinant of the tuning of cortical neurons rather than statistics of natural images.The theory is broadly consistent with the current version of HMAX. It explains it and extend it in terms of unsupervised learning, a broader class of transformation invariance and higher level modules. The goal of this paper is to sketch a comprehensive theory with little regard for mathematical niceties. If the theory turns out to be useful there will be scope for deep mathematics, ranging from group representation tools to wavelet theory to dynamics of learning.
Springer Science and Business Media LLC
Title: The computational magic of the ventral stream
Description:
AbstractI argue that the sample complexity of (biological, feedforward) object recognition is mostly due to geometric image transformations and conjecture that a main goal of the ventral stream – V1, V2, V4 and IT – is to learn-and-discount image transformations.
In the first part of the paper I describe a class of simple and biologically plausible memory-based modules that learn transformations from unsupervised visual experience.
The main theorems show that these modules provide (for every object) a signature which is invariant to local affine transformations and approximately invariant for other transformations.
I also prove that,in a broad class of hierarchical architectures, signatures remain invariant from layer to layer.
The identification of these memory-based modules with complex (and simple) cells in visual areas leads to a theory of invariant recognition for the ventral stream.
In the second part, I outline a theory about hierarchical architectures that can learn invariance to transformations.
I show that the memory complexity of learning affine transformations is drastically reduced in a hierarchical architecture that factorizes transformations in terms of the subgroup of translations and the subgroups of rotations and scalings.
I then show how translations are automatically selected as the only learnable transformations during development by enforcing small apertures – eg small receptive fields – in the first layer.
In a third part I show that the transformations represented in each area can be optimized in terms of storage and robustness, as a consequence determining the tuning of the neurons in the area, rather independently (under normal conditions) of the statistics of natural images.
I describe a model of learning that can be proved to have this property, linking in an elegant way the spectral properties of the signatures with the tuning of receptive fields in different areas.
A surprising implication of these theoretical results is that the computational goals and some of the tuning properties of cells in the ventral stream may follow from symmetry properties (in the sense of physics) of the visual world through a process of unsupervised correlational learning, based on Hebbian synapses.
In particular, simple and complex cells do not directly care about oriented bars: their tuning is a side effect of their role in translation invariance.
Across the whole ventral stream the preferred features reported for neurons in different areas are only a symptom of the invariances computed and represented.
The results of each of the three parts stand on their own independently of each other.
Together this theory-in-fieri makes several broad predictions, some of which are:-invariance to small transformations in early areas (eg translations in V1) may underly stability of visual perception (suggested by Stu Geman);-each cell’s tuning properties are shaped by visual experience of image transformations during developmental and adult plasticity;-simple cells are likely to be the same population as complex cells, arising from different convergence of the Hebbian learning rule.
The input to complex “complex” cells are dendritic branches with simple cell properties;-class-specific transformations are learned and represented at the top of the ventral stream hierarchy; thus class-specific modules such as faces, places and possibly body areas should exist in IT;-the type of transformations that are learned from visual experience depend on the size of the receptive fields and thus on the area (layer in the models) – assuming that the size increases with layers;-the mix of transformations learned in each area influences the tuning properties of the cells oriented bars in V1+V2, radial and spiral patterns in V4 up to class specific tuning in AIT (eg face tuned cells);-features must be discriminative and invariant: invariance to transformations is the primary determinant of the tuning of cortical neurons rather than statistics of natural images.
The theory is broadly consistent with the current version of HMAX.
It explains it and extend it in terms of unsupervised learning, a broader class of transformation invariance and higher level modules.
The goal of this paper is to sketch a comprehensive theory with little regard for mathematical niceties.
If the theory turns out to be useful there will be scope for deep mathematics, ranging from group representation tools to wavelet theory to dynamics of learning.

Related Results

Visual Cortex and Deep Networks
Visual Cortex and Deep Networks
A mathematical framework that describes learning of invariant representations in the ventral stream, offering both theoretical development and applications. The vent...
The ventral stream receives spatial information from the dorsal stream during configural face processing
The ventral stream receives spatial information from the dorsal stream during configural face processing
AbstractConfigural face processing is considered to be vital for face perception. If configural face processing requires an evaluation of spatial information, might this process in...
SIHIR DALAM AL-QUR’AN: KAJIAN TAFSIR TEMATIK
SIHIR DALAM AL-QUR’AN: KAJIAN TAFSIR TEMATIK
This research examines magic in holy Qur’an, which the magic is a maksiat and a great sin, because magic is an odd thing that seems to be an an extraordinary thing but not extraord...
A mid-level organization of the ventral stream
A mid-level organization of the ventral stream
ABSTRACTHuman object-selective cortex shows a large-scale organization characterized by the high-level properties of both animacy and object-size. To what extent are these neural r...
Incidence of Surgical Site Infections in Laparoscopic VS Open Primary Ventral Hernia Repair
Incidence of Surgical Site Infections in Laparoscopic VS Open Primary Ventral Hernia Repair
Background: The ventral hernias found along the midline in anterior abdominal wall, known as “primary midline ventral hernias”, are frequent in the general population. These hernia...
Expected performance of future MAGIC data-assimilated Terrestrial Water Storage (TWS) products
Expected performance of future MAGIC data-assimilated Terrestrial Water Storage (TWS) products
The planned MAGIC mission, a collaboration between ESA and NASA, is expected to deliver an extended record of the global mass transport time series with improved accuracy,...
PECULIARITIES OF MORPHOLOGICAL CHANGES OF TISSUES AFTER HERNOPLASTIC DURING RECURRENCE OF POSTOPERATIVE VENTRAL HERNIAS
PECULIARITIES OF MORPHOLOGICAL CHANGES OF TISSUES AFTER HERNOPLASTIC DURING RECURRENCE OF POSTOPERATIVE VENTRAL HERNIAS
Considering the results of surgical treatment of postoperative ventral hernias, a significant number of relapses, comprising 4.3-46 %, should be noted, and for large and giant post...
Phoronidea from Brazil
Phoronidea from Brazil
1. The demarcation of the species of Phoronidea is difficult because of the great variability of the characters which does not enable the fixing of good differential characters yet...

Back to Top