KempnerPulse¶
A terminal dashboard for NVIDIA DCGM hardware-counter metrics, with SLURM/CUDA GPU-visibility awareness.
KempnerPulse reads DCGM profiling counters — SM Active, Tensor Active, DRAM Active, GR Engine Active, the precision pipes, PCIe/NVLink throughput, power, thermals, and clocks — and renders them live in the terminal. It synthesizes a weighted Real Utilization score and a 12-category workload classification so you can tell idle GPUs from real compute, memory pressure, transfer/copy pressure, and hardware-health issues at a glance.
Where nvidia-smi reads NVML and reports a single high-level GPU-Util
time-fraction (“was a kernel running?”), KempnerPulse reads DCGM and exposes the
composition of active GPU time (“which functional units are busy, and how
hard?”). The two are complementary; KempnerPulse focuses on the fine-grained
hardware-counter view.
New here?¶
Installation — install with
uvorpip, prerequisites (the DCGM host engine for thedcgmbackend), and SLURM notes.Quickstart — launch the live dashboard, take a one-shot snapshot, export CSV, switch backends, and pick a weight preset.
How it works¶
Architecture — the four-layer pipeline (Read → Translate → Compute → Present) and the cross-cutting tier.
Workload Classification & Health States — the Real Utilization composite and the 12-category workload taxonomy with their thresholds.
Canonical record schema — the canonical record: the internal, vendor-neutral vocabulary every layer above Read depends on.
Using it¶
Command-line reference — the full command-line surface (
kempnerpulse/kp), every flag, and the interactive key commands.Backends —
dcgm(directdcgmi dmon),prometheus(dcgm-exporter), andreplay(a saved capture, no GPU needed).Running on a SLURM compute node — launch the dashboard on an allocated compute node from the login node.
CSV Export Reference — the CSV export schema and column reference.
Reference¶
DCGM Metrics Reference — every DCGM field KempnerPulse consumes, with units and practical peaks.
API reference — API reference, auto-generated from the package source.