KempnerPulse¶

A terminal dashboard for NVIDIA DCGM hardware-counter metrics, with SLURM/CUDA GPU-visibility awareness.

KempnerPulse reads DCGM profiling counters — SM Active, Tensor Active, DRAM Active, GR Engine Active, the precision pipes, PCIe/NVLink throughput, power, thermals, and clocks — and renders them live in the terminal. It synthesizes a weighted Real Utilization score and a 12-category workload classification so you can tell idle GPUs from real compute, memory pressure, transfer/copy pressure, and hardware-health issues at a glance.

Where nvidia-smi reads NVML and reports a single high-level GPU-Util time-fraction (“was a kernel running?”), KempnerPulse reads DCGM and exposes the composition of active GPU time (“which functional units are busy, and how hard?”). The two are complementary; KempnerPulse focuses on the fine-grained hardware-counter view.

New here?¶

Installation — install with uv or pip, prerequisites (the DCGM host engine for the dcgm backend), and SLURM notes.
Quickstart — launch the live dashboard, take a one-shot snapshot, export CSV, switch backends, and pick a weight preset.

How it works¶

Architecture — the four-layer pipeline (Read → Translate → Compute → Present) and the cross-cutting tier.
Workload Classification & Health States — the Real Utilization composite and the 12-category workload taxonomy with their thresholds.
Canonical record schema — the canonical record: the internal, vendor-neutral vocabulary every layer above Read depends on.

Using it¶

Command-line reference — the full command-line surface (kempnerpulse / kp), every flag, and the interactive key commands.
Backends — dcgm (direct dcgmi dmon), prometheus (dcgm-exporter), and replay (a saved capture, no GPU needed).
Running on a SLURM compute node — launch the dashboard on an allocated compute node from the login node.
CSV Export Reference — the CSV export schema and column reference.

Reference¶

DCGM Metrics Reference — every DCGM field KempnerPulse consumes, with units and practical peaks.
API reference — API reference, auto-generated from the package source.