Convolutional Neural Network- Inspired by the Brain
Convolutional Neural Network – The Start
In 1988 Yann Le Cun drew inspiration from the working of the visual cortex of the brain to create the first version of the convolution neural networks. Alex Krizhevsky and Ilya Sutskever PhD students in Geoffrey Hinton’s lab improved the convolutional neural networks algorithm to make it more efficient. They used the convolutional neural nets to win ImageNet competition where they dropped the classification error from 25% to 16%. And this also kickstarted the popularity of neural nets. Since 2012 companies have been increasingly using neural nets at their core of the services. For example, Google for their photo search, Amazon for their product recommendations and Facebook uses neural nets for their automatic tagging algorithms.
Influence of the Brain
The first thing that humans do when they are born is to start recognizing or identifying things. The visual cortex of the brain plays an important role in this identification. This mechanism helped inspire the working of the convolutional neural networks. The visual cortex contains a complex arrangement of cells. These cells are sensitive to some regions of the visual field, called a receptive field. These cells act as local filters over the input space and exploit the strong spatially local correlation present in natural images. Essentially they trigger some neurons as a response to the visual stimuli.
Hubel and Wiesel found out that neurons in the visual context were organized in a columnar architecture and together they produced visual perception. The idea was that each layer performed a specific task. For example, when a cricket player is seeing an incoming ball lots of things happen in the visual cortex of the brain. In the brain we have layers known as v1,v2,v3,v4 and v5 so one layer detects the speed, the other layer detects the color, the other the shape and so on. Together this information in different layers are combined in the visual cortex of the brain after which the humans will be able to detect the object. This concept of visual cortex in the brain is an inspiration for convolutional neural networks.
The Architecture of Convolutional Neural Network
The first two layers are convolutional layers and pooling layers. Units in a convolutional layer are organized in feature maps, within which each unit is connected to local patches through a set of weights called a filter bank. The result of this locally weighted sum is then passed through a non-linear function such as a ReLU, sigmoid, hyperbolic tan etc. All units in a feature map have same filter banks but all feature maps will have different filter banks in a layer.
The architecture of a Convolutional Neural Network (CNN) is designed to take advantage of the 2D structure of an input image. This is achieved with local connections and tied weights followed by some form of pooling. This results in translation of invariant features. Another benefit of CNNs is that they are easier to train and have many fewer parameters than fully connected networks with the same number of hidden units. CNNs are basically just several layers of convolutions with nonlinear activation functions applied to the results.
In a traditional feedforward neural network, we connect each input neuron to each output neuron in the next layer. In CNNs we don’t do that. Instead, we use convolutions over the input layer to compute the output. This results in local connections, where each region of the input is connected to a neuron in the output. Each layer applies different filters, typically hundreds or thousands like the ones showed above, and combines their results.
The architecture of typical convolutional neural networks are a series of well-defined stages as shown in the figure below
The Layers in Convolutional Neural Network
A convolutional neural network consists of several layers. These layers can be of three types:
1. Convolutional: Convolutional layers consist of a rectangular grid of neurons. It requires that the previous layer also is a rectangular grid of neurons. Each neuron takes inputs from a rectangular section of the previous layer. The weights for this rectangular section are the same for each neuron in the convolutional layer. Thus, the convolutional layer is just an image convolution of the previous layer, where the weights specify the convolution filter.
2. Max-Pooling: After each convolutional layer, there may be a pooling layer. The pooling layer takes small rectangular blocks from the convolutional layer and subsamples it to produce a single output from that block. There are several ways to do this pooling, such as taking the average or the maximum, or a learned linear combination of the neurons in the block. Our pooling layers will always be max-pooling layers. That is, they take the maximum of the block they are pooling.
3.Fully-Connected: Finally, after several convolutional and max-pooling layers, the high-level reasoning in the neural network is done via fully connected layers. A fully connected layer takes all neurons in the previous layer and connects it to every single neuron it has. Fully connected layers are not spatially located anymore (you can visualize them as one-dimensional), so there can be no convolutional layers after a fully connected layer
Convolutional neural nets (CNN) are automatically trained to learn the values of its filters based on the task assigned. For example, in Image Classification, a CNN may learn to detect edges from raw pixels in the first layer, then use the edges to detect simple shapes in the second layer, and then use these shapes to deter higher-level features, such as facial shapes in higher layers. The last layer is then a classifier that uses these high-level features basically just several layers of convolutions with nonlinear activation functions applied to the results.
This was a brief introduction into CNN’s. In my upcoming posts, I will explain about its workings in more detail with specifc examples.