Notebooks¶
Six interactive Jupyter notebooks under
examples/notebooks/
for single-GPU exploration. All use tiny 1–5M-param configs sized for
interactive use — each runs end-to-end in well under a minute, except
notebook 5 (optimizer comparison, ~2 min).
Every notebook opens with the same header:
Objectives — what you’ll learn
Requirements — hardware, data, prerequisites
Runtime — approximate wall time for Run All
Running¶
From the repo root:
uv run jupyter lab examples/notebooks/
Or execute a single notebook non-interactively:
uv run jupyter nbconvert --to notebook --execute \
examples/notebooks/01_inspect_model.ipynb
Catalogue¶
# |
Notebook |
What it shows |
|---|---|---|
1 |
Build a model from |
|
2 |
Capture attention weights per layer and head, plot heatmaps |
|
3 |
Extract intermediate activations via |
|
4 |
Train a tiny model, save a checkpoint, load it back, generate text |
|
5 |
Train the same model with AdamW / Muon / Lion / Schedule-Free AdamW, plot loss curves |
|
6 |
Build an MoE model, visualize per-layer expert utilization |
When to open which¶
Debugging a config: start with notebook 1 — it builds the model from your config and prints every layer’s shape.
Interpretability setup: notebooks 2 and 3 cover the attention-capture and activation-extraction APIs you’ll use in a larger probing pipeline.
Checkpoint round-trips: notebook 4 is the minimal reproduction of “train → save → load → generate” that you can adapt for evaluating any checkpoint.
Optimizer ablations: notebook 5 is the reference pattern for a controlled comparison with per-optimizer LR sweeps.
MoE diagnostics: notebook 6 shows how to read
get_expert_counts()output and spot dead or hot experts.
Requirements¶
1 GPU (falls back to CPU where possible, but attention and training are slow).
Project dev dependencies installed via
uv syncfrom the repo root.
Notebook outputs are stripped on commit (via the nbstripout pre-commit
hook) to keep diffs clean, so you’ll always see empty outputs until you
run them.
Note
These notebooks are meant to be run, not just read. The Sphinx site does not execute them or embed their outputs — open them in JupyterLab.