Reference¶
Curated tables and exhaustive lists that don’t fit a narrative but are useful: every config preset, every proven parallelism combination at each GPU count, every env var the framework reads.
Available configs — the full
configs/train/*.tomlandconfigs/model/*.tomltables, with “what this config exists to prove” per row.Parallelism recipes — (model, GPU count, parallelism) combinations that we’ve actually run end-to-end, indexed by model rather than by filename.
Benchmarks — summaries and reproduction commands for
benchmarks/mfu_scaling/(dense 7B/13B/70B MFU scaling) andbenchmarks/moe_expert_parallel/(MoE Expert Parallelism with per-sub-module FSDP wrapping).Environment variables — every env var the framework reads, grouped by source (torchrun / SLURM / NCCL / logging) with who-sets-what.
See also¶
README § Training Configurations — the current configs table at the repo root.
README § Benchmarks — the MFU table at the repo root.