Notebooks¶

Six interactive Jupyter notebooks under examples/notebooks/ for single-GPU exploration. All use tiny 1–5M-param configs sized for interactive use — each runs end-to-end in well under a minute, except notebook 5 (optimizer comparison, ~2 min).

Every notebook opens with the same header:

Objectives — what you’ll learn
Requirements — hardware, data, prerequisites
Runtime — approximate wall time for Run All

Running¶

From the repo root:

uv run jupyter lab examples/notebooks/

Or execute a single notebook non-interactively:

uv run jupyter nbconvert --to notebook --execute \
  examples/notebooks/01_inspect_model.ipynb

Catalogue¶

#	Notebook	What it shows
1	`01_inspect_model.ipynb`	Build a model from `ModelConfig`, inspect layer shapes, run a forward pass
2	`02_attention_visualization.ipynb`	Capture attention weights per layer and head, plot heatmaps
3	`03_activation_extraction.ipynb`	Extract intermediate activations via `ActivationStore` and `extract_representations()`, save to `.npz`
4	`04_checkpoint_analysis.ipynb`	Train a tiny model, save a checkpoint, load it back, generate text
5	`05_optimizer_comparison.ipynb`	Train the same model with AdamW / Muon / Lion / Schedule-Free AdamW, plot loss curves
6	`06_moe_routing.ipynb`	Build an MoE model, visualize per-layer expert utilization

When to open which¶

Debugging a config: start with notebook 1 — it builds the model from your config and prints every layer’s shape.
Interpretability setup: notebooks 2 and 3 cover the attention-capture and activation-extraction APIs you’ll use in a larger probing pipeline.
Checkpoint round-trips: notebook 4 is the minimal reproduction of “train → save → load → generate” that you can adapt for evaluating any checkpoint.
Optimizer ablations: notebook 5 is the reference pattern for a controlled comparison with per-optimizer LR sweeps.
MoE diagnostics: notebook 6 shows how to read get_expert_counts() output and spot dead or hot experts.

Requirements¶

1 GPU (falls back to CPU where possible, but attention and training are slow).
Project dev dependencies installed via uv sync from the repo root.

Notebook outputs are stripped on commit (via the nbstripout pre-commit hook) to keep diffs clean, so you’ll always see empty outputs until you run them.

Note

These notebooks are meant to be run, not just read. The Sphinx site does not execute them or embed their outputs — open them in JupyterLab.