Install¶
KempnerForge uses uv for dependency management.
There is no pip install kempnerforge — you work from a clone.
Prerequisites¶
Python 3.12+ —
pyproject.tomlpins the target interpreter.CUDA 12.8 toolkit — PyTorch wheels are pulled from the CUDA 12.8 index.
NVIDIA GPU for any training or distributed test (H100, H200, or A100). Unit tests and the inspection notebooks run on CPU.
uv — install once per machine:
curl -LsSf https://astral.sh/uv/install.sh | sh
Clone and sync¶
git clone git@github.com:KempnerInstitute/KempnerForge.git
cd KempnerForge
# Creates .venv, installs runtime + dev dependencies, resolves torch from the
# CUDA 12.8 index pinned in pyproject.toml.
uv sync
The .venv is project-local. Every command in the docs runs through
uv run so the venv is activated automatically — you do not need to
source .venv/bin/activate.
Verify¶
Run the same checks CI runs on every PR:
uv run ruff check kempnerforge/ tests/ scripts/
uv run ruff format --check kempnerforge/ tests/ scripts/
uv run pyright kempnerforge/
uv run pytest tests/unit/ -v --timeout=60
If all four pass, your environment is ready.
Optional: docs dependencies¶
If you plan to preview docs locally, add the docs dependency group:
uv sync --group docs
uv run make -C docs live # live-reload server on http://127.0.0.1:8000
See the “Writing Docs” section of
CONTRIBUTING.md
for the full editor loop.
SLURM clusters (Kempner and similar)¶
Unit tests run on CPU login nodes. Anything that needs a GPU (integration, distributed, end-to-end tests, and actual training) has to run inside a SLURM allocation.
Interactive allocation for development¶
# 1 node, 4 GPUs — enough for integration + distributed tests
salloc --partition=<partition-name> --account=<account-name> \
--nodes=1 --gpus-per-node=4 --cpus-per-task=16 --mem=256G --time=2:00:00
Replace <partition-name> and <account-name> with values your PI gave
you. Once the prompt returns, you are on a compute node and can run
uv run torchrun ... directly.
Modules and environment¶
On Kempner clusters, the provided scripts/slurm/*.sh wrappers detect the
InfiniBand interface and export NCCL_SOCKET_IFNAME / GLOO_SOCKET_IFNAME
before invoking torchrun or srun. They do not module load anything —
uv sync brings in the CUDA-enabled PyTorch wheel, and the CUDA driver is
already present on compute nodes.
If you launch bare torchrun from an interactive shell, the venv from
uv sync is all you need.
Next steps¶
Quickstart — run a tiny model end-to-end in under a minute.
Your First Training Run — understand what that run actually did.
CONTRIBUTING.mdfor the contributor workflow (branch naming, PR template, pre-push checklist).