“The term ‘Deep Learning’ has become the headline news, dominating technology press, venture capitalist meetings and software product launches all throughout the United States. It is the cutting-edge technology driving self-driving automobiles on the US roads, face recognition unlock software on smartphones and generators of artificial intelligent poetry or artworks”. Despite this somewhat futuristic-sounding terminology, deep learning is far from some kind of magical phenomenon – neither a mystical, conscious digital brain. After stripping off fancy corporate buzz words and complex computer-science terminology, deep learning appears as a highly sophisticated and beautiful statistical analysis methodology. This article explains everything there is about deep learning for a total beginner.
1. The Russian Nesting Dolls of Modern Computing
In order to understand deep learning, you should realize where it fits within the whole scope of Computer Science. Very often, people tend to use Artificial Intelligence (AI), Machine Learning (ML) and Deep Learning (DL) synonymously – interchangeably. But in truth, these concepts belong to each other as traditional Russian nesting dolls: DL is nested inside ML, which, in turn, is nested inside AI.
The biggest nesting doll is Artificial Intelligence. AI is a big-picture approach of developing systems with human-like capabilities – performing cognitive tasks, such as logical reasoning, abstract problem solving or even recognizing visual objects. Such systems could either operate on pre-programmed, manually coded, rigid logic rules or learn and adapt from the vast amount of data.
The smaller nesting doll is Machine Learning. This is a particular way of implementing the AI goal by relying on massive amounts of data to analyze rather than hand-crafting thousands of logical rules by engineers. ML system analyzes historical data and learns to make predictions or decisions based on statistics rather than manually written if-else rules.
The smallest doll is Deep Learning. Deep learning is a specialized, advanced version of machine learning, utilizing multi-layered artificial neural networks as the backbone.
2. What Makes It ‘Deep’? – The Architecture
Human brain consists of billions of tiny neural cells passing electrical impulses around to solve tasks and perceive the environment. The goal of deep learning is to mimic the neural architecture of the brain with software components to solve tasks without human intervention. In the software world, a single neuron is a tiny mathematical element, which accepts input data, applies some operation and passes the result on to another neuron.
When stacking many mathematical neurons together in layers, you get artificial neural network architectures. Deep learning architectures are composed of three distinct regions:
The Input Layer: Input layer is the first entry point to a neural network architecture, which gets access to raw data, such as photographs. For example, when you train a neural net to detect objects in images, the input layer will break down photographs into millions of pixels and analyze their color values.
The Hidden Layers: This is the central core, where all the magic happens. If there is more than one column of hidden layers between input and output zones, then this network is called deep – regardless of the number of layers.
The Output Layer: Output layer is the final column of the net, delivering the results. In the case of object detection neural network, it will produce some confidence measure: “there is 98% probability, that this photograph is a golden retriever”.
3. How Does It Work? – The Story of the Pizza Critic
To understand, how the network uses information from hidden layers to make its decisions automatically, let us consider a particular story of the Pizza Critic. Imagine that you need a system, which will tell whether a particular pizza from a restaurant will become a culinary sensation or total flop. With classic machine learning, it would be a responsibility of a human expert to define features, such as pizza crispness, size, crust thickness or even sauce acidity.
Deep learning works without this tedious stage. You just provide raw data in the form of thousands of pizza descriptions. Then the neural layers will figure out on their own, what is important in each particular pizza.
4. The Training Loop: Learning from Mistakes
When created initially, deep learning net is a blank slate. So, if you pass a photo of a pizza, it will confidently declare that it is an image of sail boat. Magic of deep learning comes into play when this neural network starts correcting its mistakes through constant feedback loop.
The training process includes two main procedures:
Forward Propagation: When input data (pizza photo) is fed into input layer, it passes through many hidden layers, during which a series of mathematical operations (with so-called weights) is performed. The output layer gives its prediction.
Loss Evaluation: A loss function evaluates how wrong the net was about that prediction. It calculates an exact error value, depending on how close predicted and true labels were.
Backward Propagation: This is an inverse procedure, when the error message propagates back through hidden layers, moving from output back to input side. Every single layer updates the importance of different mathematical nodes (weights).
So, after millions of repeated cycles, the system learns to minimize the error rate, making its guesses accurate enough to be commercial-grade.
5. Why Deep Learning Explodes Today?
Although artificial neural network ideas have appeared back in mid-20th century, only recently this technique exploded. All due to two crucial aspects, which emerged now: huge amount of data, gathered by social media, smartphones and digital corporation records, and powerful computing capacities provided by Graphics Processing Units.
Because deep learning neural networks contain many millions of elements, they are very hungry for large datasets. While traditional machine learning algorithms reach certain accuracy, which does not depend on the amount of data, deep learning scales with it. More data means better precision of predictions.
FAQ
What is the difference between machine learning and deep learning?
The key difference lies in preparing data and feature engineering. In machine learning, data needs to be manually organized by humans and features to be chosen by them. Deep learning, in turn, feeds neural net with raw data (for example, raw photographs or texts) – it does all data cleaning and feature discovering on its own.
Is it necessary to have an expensive supercomputer to learn deep learning?
No. There is no need for expensive computing devices to start with deep learning and write basic scripts. Though training commercial-grade models requires industrial data center infrastructure, as a beginner, you could benefit from such cloud-based environments as Google Colab or Kaggle Notebooks – free access to high-performance GPUs right through a browser.
Why does deep learning need so much data to work?
It is because deep learning networks contain lots of nodes distributed across many layers. Thus, millions of weights must be configured in order for the system to make reasonable predictions. The less amount of training data, the worse statistics will be used for learning – net will overfit, that is learn to perfectly reproduce training samples, but totally fail with new data.
What is ‘black box’ problem with deep learning?
‘Black box’ means inability of humans to understand, how deep learning model works internally. Due to millions of elements and hidden layers, it is extremely hard to follow the logic of a neural net. Lack of clear explanations in cases of critical importance, such as medicine or legal domain, makes the issue urgent.
What kinds of deep learning models exist?
Nowadays, the most famous models are Convolutional Neural Networks (CNN) for spatial tasks and Recurrent Neural Networks (RNN) and its descendants for sequential data. CNNs specialize on image analysis tasks (image recognition, segmentation etc.). RNNs work best with texts and generate text data.


