AI
Neural Networks
The computational foundation of modern AI
Overview
A neural network is a layered system of mathematical functions loosely inspired by biological neurons. Each layer transforms its input through weighted connections and non-linear activation functions, learning representations of increasing abstraction. Deep neural networks—those with many layers—power virtually every modern AI system from image recognition to language models.
Key Concepts
- Forward pass: input data flows through layers, each computing a weighted sum then applying an activation function
- Activation functions (ReLU, sigmoid, softmax): introduce non-linearity that allows the network to learn complex patterns
- Backpropagation: computes gradients of the loss function with respect to each weight using the chain rule
- Gradient descent: iteratively adjusts weights in the direction that reduces the loss
- Convolutional layers (CNNs): detect spatial patterns in images; recurrent layers (RNNs/LSTMs): model sequential dependencies
Key Facts
- The universal approximation theorem states a single hidden layer can approximate any continuous function—but deep networks are far more efficient
- AlexNet (2012) sparked the deep learning revolution by winning ImageNet with a 10-point accuracy gap over classical methods
- Modern large language models contain billions of learned parameters—GPT-3 has 175 billion
- Dropout regularisation randomly zeroes neurons during training to prevent the network from memorising rather than generalising
- Batch normalisation, introduced in 2015, stabilises training and allows much higher learning rates