Understanding Deep Learning: A Comprehensive Overview
Written on
Authors: Saniya Parveez, Roberto Iriondo
This tutorial's code is accessible on GitHub and fully implemented on Google Colab.
Join Us!
Towards AI is a community focusing on discussions about artificial intelligence, data science, visualization, and deep learning. Visit us at: towardsai.net
Introduction
Over the past decade, deep learning has become a pivotal component in various real-world applications, including advancements in image recognition, transfer learning, computer vision, recommendation systems, natural language processing, healthcare, and machine intelligence.
For instance, users of Facebook may have noticed the platform's ability to recognize individuals in photographs, even when the images are not clear. This high accuracy in auto-tagging is a prime example of deep learning in action.
Yet, deep learning's applications extend well beyond image recognition. This tutorial aims to provide a clear and basic understanding of deep learning, including its workings, applications, and implementations.
Let’s explore further!
What is Deep Learning?
Deep Learning (DL) is a specialized area within Machine Learning (ML) that employs Artificial Neural Networks (ANNs), which are subtly inspired by the neuronal structure of the human brain.
The term deep signifies the numerous layers found in an artificial neural network. Today, deep learning has proven effective, primarily due to the abundance of training data and the affordability of GPUs that facilitate efficient numerical computations.
Essentially, deep learning seeks to derive representations of the world from experiences, functioning much like statistics within a black box, but exceptionally adept at identifying patterns.
In summary, deep learning can be viewed as a series of layers situated between input and output, progressively identifying and processing features in a manner akin to human cognition.
Machine Learning vs. Deep Learning
Figure 2 illustrates the conceptual differences in the workflows of machine learning and deep learning:
A Brief History of Deep Learning
The origins of deep learning can be traced back to 1943, when Walter Pitts and Warren McCulloch developed a computational model inspired by the neural circuits of the human brain. Their work integrated algorithms to emulate brain functions, which later laid the groundwork for artificial neural networks, further advanced by Alexey Grigoryevich Ivakhnenko.
In 1979, Fukushima introduced neural networks incorporating multiple pooling and convolutional layers, enabling computers to learn and recognize objects more effectively. The first practical demonstration of backpropagation was provided by Yann LeCun in 1989, utilizing convolutional neural networks to interpret handwritten digits.
Key Terminology in Deep Learning
Here are some essential terms associated with deep learning:
- Model: A specific representation derived from data through an algorithm; often termed a hypothesis.
- Feature: A measurable attribute of data, which can be summarized in a feature vector. Deep learning models utilize these vectors as inputs.
- Target (Label): The output value that the model is designed to predict.
- Training: The process of providing a set of inputs (features) along with expected outputs (labels) so that the model can learn to map new data to trained categories.
- Prediction: Once a model is trained, it can accept inputs and generate predicted outputs (labels).
- Epoch: A hyperparameter representing a complete cycle through the dataset, which is often divided into batches for processing.
- Neuron: A mathematical function in deep learning that simulates biological neuron functions, typically calculating a weighted average of its inputs.
- Axon: A conceptual representation within the neural system, connecting neurons.
- Layer: A fundamental building block in deep learning, which transforms input through various functions before passing it to the next layer.
- Dense Layer: A layer that represents a standard, fully connected neural network layer.
Neurons
Neurons in deep learning are modeled after biological neurons, which consist of:
- A cell body (soma)
- One or more dendrites: These receive signals from other neurons.
- An axon: This transmits signals from one neuron to others.
The transition between states in a neuron is triggered by external signals received by the dendrites, which can have either excitatory or inhibitory effects. The neuron remains idle, accumulating all incoming signals.
Artificial Neurons
Similar to biological neurons, artificial neurons include:
- Input connections that carry signals from other neurons.
- Each connection has an associated weight that influences the signal's significance.
- Output connections that transmit signals to other neurons.
- An activation function that determines the output based on incoming signals.
The equation governing an artificial neuron can be summarized as follows:
In conclusion, a neuron serves as the basic unit within a neural network. It processes inputs, performs calculations, and generates a single output.
Figure 7 illustrates a two-input neuron:
The following computations occur within the neuron depicted in Figure 6:
- Each input is multiplied by a weight:
- All weighted inputs are summed with a bias `b`:
- Finally, the sum is processed through an activation function:
The activation function transforms an unrestricted input into a predictable output form.
Combining Neurons into a Neural Network
A neural network is simply a collection of interconnected neurons. Figure 11 shows a basic structure of an artificial neural network:
Perceptron
A perceptron is a foundational single-layer neural network characterized by a simple activation function that produces binary outputs. Developed by Frank Rosenblatt in the 1950s and 1960s, it accepts multiple binary inputs and generates a single binary output.
Components of a perceptron include:
- Logit
- Step activation function
Logit
The logit function mirrors the equation of a linear line:
The logit function can be represented as follows:
Where w represents the weight applied to each input, and b denotes the bias term.
Step Activation Function
The step activation function determines whether a neuron should activate based on the logit value. Figure 16 illustrates this function:
A neuron activates only if the logit value is greater than or equal to zero.
Decision Boundary Cases
- The decision boundary is a linear line for a single-input perceptron.
- In the case of multiple inputs, the decision boundary becomes a hyperplane, which has one less dimension than the space it occupies.
In summary:
> Perceptron = Logit + Step Function
Layer
A layer in deep learning refers to a set of nodes working collectively at a specific depth within a neural network. Each layer aims to learn distinct features of the data by minimizing an error or cost function.
For example, in image recognition, the first layer might identify edges, the second layer could detect eyes, while the third focuses on identifying noses.
Common Layers
- Input layer
- Hidden layer
- Output layer
Input Layer (Input Cells)
This layer consists of raw data and serves as the starting point for further processing by subsequent layers of artificial neurons.
Output Layer (Output Cells)
The output layer, typically consisting of a single output for classification tasks, is essential in producing the final result. Even though it may contain multiple nodes, it is regarded as a single layer within the neural network.
Weight
Weight signifies the strength of the connection between neurons. It influences how much an input affects the output. Essentially, weight determines the impact of inputs on outputs.
Example
Consider three tasks: 1) Play, 2) Work, 3) Sleep. If "play" is prioritized, it will have a higher weight than the others.
Thus, the equation can be represented as:
> Y = F(x1, x2, x3) = w1*x1 + w2*x2 + w3*x3
In this equation, x1 carries more significance than x2 and x3, demonstrating the function of play.
Learning Rate
The learning rate is a parameter defining how much each update step influences the current weight values. It serves as a tuning parameter in optimization algorithms, determining the size of each step towards minimizing the loss function.
In summary:
> The learning rate dictates how fast or slow a neural network model acquires knowledge about a problem.
How Do Artificial Neural Networks (ANNs) Learn?
The learning process in an ANN is iterative, focusing on optimizing its weights, and is typically supervised.
Weight Decay
Weight decay is a regularization technique aimed at preventing overfitting by reducing model complexity. It can be avoided by initializing weights and biases to minimal random values and gradually adjusting them during learning.
Steps of Weight Decay
- Monitor performance on a validation set and implement early stopping if necessary.
- Adjust the update rule to discourage excessively large weights:
- Use cross-validation to set the value of lambda.
Implementation of a Neuron
The following Python code snippet illustrates the creation of an artificial neuron.
Import NumPy:
import numpy as np
Create a Sigmoid function:
def sigmoid(x):
return 1 / (1 + np.exp(-x))
Define the Neuron class:
class Neuron:
def __init__(self, weights, bias):
self.weights = weights
self.bias = bias
def feedforward(self, inputs):
total = np.dot(self.weights, inputs) + self.bias
return sigmoid(total)
Input and execution of the neuron:
weights = np.array([0, 1]) bias = 4 neuron = Neuron(weights, bias) x = np.array([2, 3]) forward = neuron.feedforward(x) print(forward)
Backpropagation
In deep learning, backpropagation is a supervised learning algorithm utilized for training multilayer perceptrons.
Why is Backpropagation Necessary?
When constructing a neural network, weights are initially assigned random values. However, these values might not optimally fit the model, resulting in significant discrepancies between actual and predicted outputs.
How do we reduce the error?
Data scientists must adjust the model's parameters (weights) to minimize this error.
Steps to Train a Model
- Compute the error.
- Aim for minimal error.
- Update parameters — If the error is substantial, modify weights and biases, then reassess the error. Repeat until the error is minimized.
- The model is prepared for predictions.
Consequently, backpropagation serves as a method to train the model, searching for the minimum value of the error function using techniques like gradient descent.
Steps of Backpropagation
- Initialize weights with random values and propagate forward.
- Identify the error, and propagate it backward to refine weight values.
- If the error increases, reduce the weight value.
- This process continues until the error is minimized.
Ultimately, the goal is to reach the Global Loss Minimum.
Conclusion
Deep learning, as a subset of machine learning, focuses on artificial neural networks. It empowers computers to execute specific tasks akin to human capabilities.
Deep learning models are increasingly recognized for their high accuracy, often surpassing human performance. These models can classify images, text, sounds, and more.
Currently, deep learning is applied across various industries for a multitude of tasks. Neural networks offer a flexible approach to modeling input/output functions, proving resilient against noisy data. Overfitting can be mitigated through weight decay or early stopping techniques.
DISCLAIMER: The opinions expressed in this article belong to the authors and do not reflect the views of Carnegie Mellon University or any associated entities. These writings are intended as a representation of current thinking and a basis for discussion and improvement.
All images are credited to the authors unless otherwise specified.
Published via Towards AI
Resources
Tutorial’s Companion
- GitHub Repository.
- Google Colab Implementation.
Recommended Reading
For those genuinely interested in deep learning, the following book is an excellent resource and is freely accessible:
- Dive Into Deep Learning by Aston Zhang, Zachary C. Lipton, Mu Li, and Alexander J. Smola.
References
- Candela, J. (2017). Building scalable systems to understand content — Facebook Engineering. Retrieved January 15, 2021, from [Facebook Engineering](https://engineering.fb.com/2017/02/02/ml-applications/building-scalable-systems-to-understand-content/).
- A Survey on Deep Learning: Algorithms, Techniques, and Applications, Poyanfar et al., (2021). Retrieved January 15, 2021, from [Duke University](https://www2.cs.duke.edu/courses/spring20/compsci527/papers/Pouyanfar.pdf).
- Dive into Deep Learning — Dive into Deep Learning 0.16.0 documentation, Aston Zhang et al. (2021). Retrieved January 15, 2021, from [Dive into Deep Learning](https://d2l.ai/).
- Zaccone, G. et al. Deep Learning with TensorFlow: Take Your Machine Learning Knowledge to the Next Level with the Power of TensorFlow. Packt Publishing, 2017.
- Victor Zhou. “Machine Learning for Beginners: An Introduction to Neural Networks.” [Victor Zhou](https://victorzhou.com/blog/intro-to-neural-networks/).
- “Learning Rate.” Wikipedia, Wikimedia Foundation, January 4, 2021, [Wikipedia Learning Rate](https://en.wikipedia.org/wiki/Learning_rate).
- Alexey Grigorev, (2021). Retrieved January 11, 2021, from [GitHub](https://github.com/alexeygrigorev/data-science-interviews/blob/master/theory.md).
- Main Types of Neural Networks and its Applications — Tutorial. (2021). Retrieved January 15, 2021, from [Towards AI](https://towardsai.net/p/machine-learning/main-types-of-neural-networks-and-its-applications-tutorial-734480d7ec8e).
- Deep Learning Part 1, Ruslan Salakhutdinov, (2021). Retrieved January 15, 2021, from [YouTube](https://www.youtube.com/watch?v=TFlV57P8JKo).
- Intro to Deep Learning, Quan Geng, Columbia University, (2021). Retrieved January 15, 2021, from [Columbia University](https://dreaven.github.io/papers/deep_learning_lecture_at_columbia.pdf).
- Instance Segmentation of Point Clouds using Deep Learning, Gerardo Francisco Perez Layedra, UPC, (2021). Retrieved January 15, 2021, from [UPC](https://upcommons.upc.edu/bitstream/handle/2117/117737/131440.pdf?sequence=1&isAllowed=y).
- Victor Zhou, MIT License, GitHub, [Victor Zhou GitHub](https://github.com/vzhou842/victorzhou.com/blob/master/NOTICE).
- Hyperparameters and Validation Sets, Sargur N. Srihari, (2021). Retrieved January 15, 2021, from [University of Buffalo](https://cedar.buffalo.edu/~srihari/CSE676/5.3%20MLBasics-Hyperparam.pdf).
- An Introduction to Deep Learning, David Wolf Corne, Open Courseware, (2021). Retrieved January 15, 2021, from [Brandeis University](https://www.cs.brandeis.edu/~cs136a/CS136a_Slides/DeepLearning_Corne.pdf).
- LeCun et al.: Handwritten Digit Recognition: Applications of Neural Net Chips and Automatic Learning, in Fogelman, F. et al. (Eds), Neurocomputing, Algorithms, Architectures and Applications, Springer, Les Arcs, France, 1989.
- Frank Rosenblatt’s Mark I Perceptron at the Cornell Aeronautical Laboratory. Buffalo, New York, 1960. [Instagram](https://www.instagram.com/p/Bn_s3bjBA7n/).
- Backpropagation | Wikipedia | [Wikipedia Backpropagation](https://en.wikipedia.org/wiki/Backpropagation).