Neural Networks

The computational foundation of modern AI

deep-learningneuronsbackpropagation

Overview

A neural network is a layered system of mathematical functions loosely inspired by biological neurons. Each layer transforms its input through weighted connections and non-linear activation functions, learning representations of increasing abstraction. Deep neural networks—those with many layers—power virtually every modern AI system from image recognition to language models.

Key Concepts

Forward pass: input data flows through layers, each computing a weighted sum then applying an activation function
Activation functions (ReLU, sigmoid, softmax): introduce non-linearity that allows the network to learn complex patterns
Backpropagation: computes gradients of the loss function with respect to each weight using the chain rule
Gradient descent: iteratively adjusts weights in the direction that reduces the loss
Convolutional layers (CNNs): detect spatial patterns in images; recurrent layers (RNNs/LSTMs): model sequential dependencies

Key Facts

The universal approximation theorem states a single hidden layer can approximate any continuous function—but deep networks are far more efficient
AlexNet (2012) sparked the deep learning revolution by winning ImageNet with a 10-point accuracy gap over classical methods
Modern large language models contain billions of learned parameters—GPT-3 has 175 billion
Dropout regularisation randomly zeroes neurons during training to prevent the network from memorising rather than generalising
Batch normalisation, introduced in 2015, stabilises training and allows much higher learning rates

Neural Networks

Overview

Key Concepts

Key Facts

Related

Transformers

Embeddings

Large Language Models

Fine-tuning