OpenAlice Academy — Become an LLM engineer, from a single neuron up.

↳ THE PATH

Small → big · leaf → tree

The curriculum, as a climbing ladder.

Four groups, twenty-six lessons. Each one is a self-contained, interactive page that builds the next. You climb from the smallest idea to the largest system. The first rung is open — the rest are being authored from the lab's research wiki.

00 Foundations the universal primitive

The single weight, the chain rule, the loop. The one mechanism that scales — unchanged — all the way up.

✦ Live

00 · 01

Neural Network from Scratch

A neuron, a layer, a loss, backprop — and a real classifier you train in the page.

Open lesson →

Live

00 · 02

Math for ML

Vectors, matrices, gradients — only the parts you actually use.

Live

00 · 03

microGPT

A whole tiny GPT in ~200 lines — same autograd, plus attention.

Live

00 · 04

LLM from Scratch

A 10M-param model trained on a laptop, end to end.

01 Architecture how a transformer is built

The pieces that turn the primitive into a language model — and the frontier variants pushing past it.

Live

01 · 01

Attention & Transformers

Query, key, value — the operation that changed everything.

Live

01 · 02

Tokenization

Bytes → tokens. BPE, and why the vocabulary matters.

Live

01 · 03

Embeddings

Turning a token into a vector of meaning.

Live

01 · 04

Positional Encoding (RoPE)

Teaching attention where each token sits.

Live

01 · 05

Mixture-of-Experts

Route each token to a few specialist sub-networks.

Live

01 · 06

DeepSeek Architecture

MLA + MoE — an efficient frontier model, dissected.

Live

01 · 07

State-Space Models (Mamba)

Sequence modelling without attention.

Live

01 · 08

FlashAttention

The IO-aware kernel that made long contexts cheap.

02 Training & Reasoning teaching it to be useful

From a raw next-token predictor to an aligned model that reasons — and how to do it on a budget.

Live

02 · 01

RLHF & Alignment

Reward models, PPO, DPO — shaping behaviour from preferences.

Live

02 · 02

LoRA & PEFT

Fine-tune billions of params by training only a few.

Live

02 · 03

Quantization

Run a big model in small precision, without losing it.

Live

02 · 04

Scaling Laws

The math that predicts performance from compute.

Live

02 · 05

Test-time Compute & Reasoning

Think longer at inference — the o1/R1 idea.

03 Agents & Systems the tree, in production

Many models, memory, retrieval, routing — the systems that turn a model into something that acts.

Live

03 · 01

Mixture-of-Agents

Stack models in layers so they refine each other.

Live

03 · 02

LLM Councils & Fusion

Many models deliberate, then merge into one answer.

Live

03 · 03

GraphRAG

Retrieval over a knowledge graph, not flat chunks.

Live

03 · 04

Agent Memory

How an agent remembers across turns and sessions.

Live

03 · 05

MemPalace

A structured, navigable long-term memory.

Live

03 · 06

Model Routing

Send each query to the cheapest model that can answer it.

Live

03 · 07

Graphify

Turn a codebase into a queryable call-graph.

Live

03 · 08

The LLM-maintained Wiki

A living knowledge base an LLM keeps fresh.

↳ ETHOS

How the Academy teaches

Built, not described.

Most material tells you what a transformer is. We make you build one — the smallest working version of every idea, drawn and interactive, with the real code on the page.

01 — Visual

You see it move.

Every concept is an interactive figure — drag the weights, step the forward pass, watch the loss fall. Intuition before notation.

●━━┓ ┣━▶ ● ━━▶ ŷ ●━━┛

02 — From scratch

No imported magic.

We build the autograd, the attention, the training loop ourselves — in plain code you can read top to bottom. The library comes after you understand the engine.

grad += local · upstream # the chain rule, literally

03 — No hand-waving

Every step is shown.

No "and then it just works." When something is hard we slow down and derive it, one local slope at a time, until it's obvious.

◆ ┌┴┐ ◆ ◆ ┌┴┐┌┴┐ ◆ ◆◆ ◆ self-similar

Become an
LLM engineer.
From one neuron up.

The curriculum, as a climbing ladder.

Neural Network from Scratch

Math for ML

microGPT

LLM from Scratch

Attention & Transformers

Tokenization

Embeddings

Positional Encoding (RoPE)

Mixture-of-Experts

DeepSeek Architecture

State-Space Models (Mamba)

FlashAttention

RLHF & Alignment

LoRA & PEFT

Quantization

Scaling Laws

Test-time Compute & Reasoning

Mixture-of-Agents

LLM Councils & Fusion

GraphRAG

Agent Memory

MemPalace

Model Routing

Graphify

The LLM-maintained Wiki

Built, not described.

You see it move.

No imported magic.

Every step is shown.

Start with a
single neuron.

The curriculum, as a climbing ladder.

Neural Network from Scratch

Math for ML

microGPT

LLM from Scratch

Attention & Transformers

Tokenization

Embeddings

Positional Encoding (RoPE)

Mixture-of-Experts

DeepSeek Architecture

State-Space Models (Mamba)

FlashAttention

RLHF & Alignment

LoRA & PEFT

Quantization

Scaling Laws

Test-time Compute & Reasoning

Mixture-of-Agents

LLM Councils & Fusion

GraphRAG

Agent Memory

MemPalace

Model Routing

Graphify

The LLM-maintained Wiki

Built, not described.

You see it move.

No imported magic.

Every step is shown.

Start with asingle neuron.

Start with a
single neuron.