ML Stories-1: Convolution in Layman’s Terms.

3 min read6 days ago

Convolution Neural Network or CNN is the first name that comes up while someone is dealing with image data. CNNs are used for image classification, detection and segmentation tasks. The main idea they revolve around is convolution, hence the name Convolutional Neural Networks. CNNs are well known for extracting features from an image.

The roots of Machine Learning and Deep Learning is just math, images are just matrices with pixel values. If you are familiar with terms like ANN (Artificial Neural Network) or MLP (Multi Layered Perceptron), we can agree upon the fact that they work well with lower dimensions of data. As the dimensions of data goes higher the networks start getting more dense and respectively the weights and biases and the computation get more difficult.

Basically CNNs are made up of few ideas -

1. Convolution -

Convolution is where we extract features from an image. We take help of filters/kernels to do so. We slide the filters over image one section after another. We take sections of image and perform dot product with kernel. Output value will be highest when the pixel values best align with the filters. The values that the kernel function give out will make up the feature map. The end size of feature map depends on the size of filter chosen.

2. Pooling -

Main intuition behind pooling is reducing dimensions. We use one of the three pooling techniques — max pooling, min pooling, average pooling to reduce the dimension. Max pooling is used in most of the cases. When we use 2x2 pooling, the image size reduces four times. This helps us retain the highest value i.e., features as we shrink down the image size.

3. Padding -

Sometimes we add few layers around the actual image to preserve the features that are present at the edges of an image. If there is no padding involved, the pixels present at the edges get operated with kernels fewer times compared to the central ones.

4. Fully connected neural network -

After all the dimension reduction, we flatten the values into 1-D array and pass the values through a fully connected neural network to get the output.

So to wrap it all up, CNNs save us a lot of effort and time while dealing with image data. The ideas of convolution, pooling, padding make this possible.

Note: all the illustrations provided above are made using Excalidraw.