A single path from a single neuron to a full ChatGPT — and onward to the agent systems that wrap it. Every idea built from scratch, drawn, and made interactive. No black boxes. No hand-waving.
loading…
A tree that branches into smaller copies of itself. A language model has the same shape — the one weighted-sum-and-bend you learn in lesson one, stacked and repeated, billions of times. Master the leaf and you can read the whole tree.
Four groups, twenty-six lessons. Each one is a self-contained, interactive page that builds the next. You climb from the smallest idea to the largest system. The first rung is open — the rest are being authored from the lab's research wiki.
The single weight, the chain rule, the loop. The one mechanism that scales — unchanged — all the way up.
A neuron, a layer, a loss, backprop — and a real classifier you train in the page.
Vectors, matrices, gradients — only the parts you actually use.
LiveA whole tiny GPT in ~200 lines — same autograd, plus attention.
LiveA 10M-param model trained on a laptop, end to end.
The pieces that turn the primitive into a language model — and the frontier variants pushing past it.
Query, key, value — the operation that changed everything.
LiveBytes → tokens. BPE, and why the vocabulary matters.
LiveTurning a token into a vector of meaning.
LiveTeaching attention where each token sits.
LiveRoute each token to a few specialist sub-networks.
LiveMLA + MoE — an efficient frontier model, dissected.
LiveSequence modelling without attention.
LiveThe IO-aware kernel that made long contexts cheap.
From a raw next-token predictor to an aligned model that reasons — and how to do it on a budget.
Reward models, PPO, DPO — shaping behaviour from preferences.
LiveFine-tune billions of params by training only a few.
LiveRun a big model in small precision, without losing it.
LiveThe math that predicts performance from compute.
LiveThink longer at inference — the o1/R1 idea.
Many models, memory, retrieval, routing — the systems that turn a model into something that acts.
Stack models in layers so they refine each other.
LiveMany models deliberate, then merge into one answer.
LiveRetrieval over a knowledge graph, not flat chunks.
LiveHow an agent remembers across turns and sessions.
LiveA structured, navigable long-term memory.
LiveSend each query to the cheapest model that can answer it.
LiveTurn a codebase into a queryable call-graph.
LiveA living knowledge base an LLM keeps fresh.
Most material tells you what a transformer is. We make you build one — the smallest working version of every idea, drawn and interactive, with the real code on the page.
Every concept is an interactive figure — drag the weights, step the forward pass, watch the loss fall. Intuition before notation.
We build the autograd, the attention, the training loop ourselves — in plain code you can read top to bottom. The library comes after you understand the engine.
No "and then it just works." When something is hard we slow down and derive it, one local slope at a time, until it's obvious.
You don't write the rules — you show it examples and it tunes itself until it's right. One page from here, you'll understand backpropagation, the engine under every model on this path.