User Tools

Site Tools


A Deep ConvNet Trained for Object Recognition Recapitulates the Hierarchy of Visual Representations in the Human Brain

Presenter Pulkit Agrawal
Context NIPS 2015 Workshop on Statistical Methods for Understanding Neural Systems
Date 12/11/15

In an image of street musicians with animal masks on, it is easy for a human to infer that the humans are not animals. Recently, with a lot of data and computational power, we have been able to train models which can replicate this kind of behavior. One such model is AlexNet, which won the ImageNet challenge a few years ago. Now, the human performance on ImageNet is close to computer performance. A natural question is then whether the behavior of these models is similar to that of the human's brain.

In a brain, light passes through a retina, and passes through “visual areas” of the brain which are considered low to mid to high-level. We do not totally understand what exactly the mid-level areas are performing. At a high level, the human brain and convolutional networks are good at visual recognition. The first layer of a convolutional network performs edge detection-like computations, which is what the very low-levels in the brain are performing. Similarly, the higher levels in both cases are used for final inference of what objects are in the image. If it is the case that mid-level representations in both cases are similar, it would facilitate studying the behavior of the mid-levels in the brain, and would address questions like “what makes a cat look like a cat to the human brain”.

Human subjects had their brain activity measured using fMRI while looking at different images. In a typical fMRI scan, the brain is segmented into small “voxels”, so that the activity in different portions of the brain can be measured. An AlexNet was then trained, and linear transformations were made to map from intermediate representations to voxel activities in the brain. The correlation between actual brain activity and the linearly predicted activity from the network was found to choose which layer was best at predicting each voxel's activity. The brain can be artificially flattened, and the resulting map can be segmented into regions whose location roughly encodes their function and high- or low-levelness. The AlexNet layers were able to predict activity in each of the areas, and there was a strong correlation between the layer in the AlexNet which correctly predicted brain activity and the location in the flat map. This result is consistent across different subjects, and whether movies were shown to subjects or not, and was true in a quantitative analysis too.

It is possible to predict the mid-level activities with the resulting AlexNet-plus-linear-transformation model more accurately than with existing model. This can suggest that a certain voxel in a human is searching for different mid-level characteristics. This allows the conclusion that certain voxels are sensitive to faces, textures, places, large and small circular shapes, etc. This allows deep neural networks to serve as a high-throughput model of studying visual representations in the human brain.

There are many differences in the way that the AlexNet was trained and the way that the human brain learns. It is natural then to ask how important the architecture of the AlexNet and the task it was trained on. Different convolutional networks yielded similar similarities between their activations and brain activations, suggesting that the similarity of representations is not an artifact of the architecture. Interestingly, training an autoencoder instead of training for an image classification task resulted in a model which was much worse at predicting activity. This is natural considering that the representation capabilities of the model depends mainly on the task. So, biological neural systems can be understood by constructing artificial neural systems that perform the same tasks.

a_deep_convnet_trained_for_object_recognition_recapitulates_the_hierarchy_of_visual_representations_in_the_human_brain.txt · Last modified: 2015/12/17 21:59 (external edit)