Neural Network Foundations

Basic Neural Network From Scratch

A tiny MLP build path: linear layers, activations, losses, manual backprop, train/test behavior, and simple regularization.

Time

~10 hours

Core loop

forward pass → loss → backward pass → update

Topics

10 ordered topics

End state

You can build a tiny MLP, explain every tensor shape, and debug whether training is failing because of data, loss, gradients, or updates.

Checkpoint 1

Linear Layer and Activations

Build the smallest useful neural block and track the shape of every tensor.

Step 1

Linear Layer: Shapes and Memory

Implement Y = XW + b and explain every shape in the forward and backward pass.

Open Linear Layer: Shapes and Memory

Practice: Write the shape ledger for X, W, b, Y, dY, dW, db, and dX.

Step 2

Perceptron

Understand the simplest linear classifier before adding hidden layers.

Open Perceptron

Practice: Explain when a perceptron can and cannot separate two classes.

Step 3

Activation Functions

Know why nonlinearities make a network more than one big linear map.

Open Activation Functions

Practice: Compare ReLU, sigmoid, and GELU failure modes.

Checkpoint 2

Loss and Update Rule

Turn predictions into a scalar loss and use gradient descent to change parameters.

Step 1

Loss Functions

Pick a scalar objective that matches regression or classification.

Open Loss Functions

Practice: Decide whether MSE or cross-entropy fits a given target.

Step 2

Gradient Descent Variants

Update parameters using gradients, learning rates, momentum, and Adam-style variants.

Open Gradient Descent Variants

Practice: Explain what changes when the learning rate is too large.

Checkpoint 3

Backprop and Gradient Checks

Verify gradients numerically before trusting a training loop.

Step 1

Feedforward Networks and Backpropagation

Compose layers, cache forward values, and send gradients backward.

Open Feedforward Networks and Backpropagation

Practice: Identify which tensors must be saved for a two-layer MLP backward pass.

Step 2

Softmax and Numerical Stability

Turn logits into probabilities without overflow or underflow.

Open Softmax and Numerical Stability

Practice: Explain why subtracting the maximum logit does not change softmax probabilities.

Step 3

Cross-Entropy Loss

Connect logits, labels, likelihood, and gradients for classification.

Open Cross-Entropy Loss

Practice: State what the loss penalizes when the correct class logit is too low.

Checkpoint 4

Generalization Basics

Separate fitting the toy data from learning a rule that survives held-out examples.

Step 1

Train/Test Split and Data Leakage

Check whether a tiny network learned a rule or just memorized examples.

Open Train/Test Split and Data Leakage

Practice: Name one preprocessing step that must be fit only on training data.

Step 2

Regularization in Practice

Use weight decay, dropout, and early stopping as concrete controls.

Open Regularization in Practice

Practice: Explain what validation loss tells you that training loss cannot.

How to use this path

Do not only read the pages. For each step, write the shape ledger, answer the practice prompt, and then run a small quiz or diagnostic. The goal is operational fluency: you should be able to predict what changes before code or algebra tells you.

Back to reading paths →