Architecture¶
KempnerPulse is organized as a four-layer data-flow pipeline over a cross-cutting tier. Data flows strictly top to bottom; each layer depends only on the ones above it, and each has a single responsibility.
source
│
▼
┌─────────────────────────────────────────────┐
│ Read reader/ → RawRecord │
│ Translate translate/ → CanonicalRecord │
│ Compute compute/ → ComputedRecord │
│ Present present/ → terminal UI / CSV │
└─────────────────────────────────────────────┘
cross-cutting: config · identification · selection · system_queries · lifecycle
Layer 1 — Read (kempnerpulse.reader)¶
Acquires raw data from one source and emits a stream of opaque RawRecords
keyed by the source’s own field names. This layer:
never coerces an
N/Areading to0.0— it usesNone;never interprets field meanings (naming, units, missing-value policy are Layer 2’s job);
has no source-vocabulary leakage above it.
Backends implement a common Backend protocol: dcgmi (a persistent
dcgmi dmon subprocess), prometheus (a dcgm-exporter scrape), and replay
(a saved capture). See Backends.
Layer 2 — Translate (kempnerpulse.translate)¶
Maps each RawRecord to a CanonicalRecord — a single, vendor-neutral internal
vocabulary. It owns everything backend-, version-, and unit-specific: source
field names, unit normalization (percentages → fractions in [0, 1],
MB/s → bytes/second, millijoules → joules), and missing-value policy. Nothing
above this layer ever sees a DCGM field identifier again. See Canonical record schema for
the canonical record contract.
Layer 3 — Compute (kempnerpulse.compute)¶
Pure-functional domain logic over canonical records, producing a
ComputedRecord: the weighted Real Utilization score, the 12-category
workload classification, and health. No I/O, no source vocabulary, no
UI. This is the layer that’s fully testable without a GPU. See
Workload Classification & Health States for the composite formula and the taxonomy.
Layer 4 — Present (kempnerpulse.present)¶
Consumes ComputedRecords and renders them: the Rich terminal UI (fleet, focus,
plot, and job views) and the CSV writer. It converts canonical fractions/SI to
display units (percent, GB/s, …) at render time and never reaches back into
source vocabulary.
Cross-cutting tier¶
config— parses the command line into an immutableConfig.identification— resolves device identity and capabilities at startup (GPU model/UUID, power and bandwidth limits, GPU-id resolution, SLURM metadata) vianvidia-smi/dcgmi discovery.selection— resolves which GPUs to show, honoring--gpus,--show-all, and theCUDA_VISIBLE_DEVICES/SLURM_*_GPUSenvironment.system_queries— per-sample host stats (CPU, RAM, GPU processes).lifecycle— the run loop (live TUI, one-shot, CSV export), a threaded tick reader, and centralized signal handling / teardown.
Why the strict boundaries¶
A change to a DCGM field name stops at Layer 2; a change to the classification cascade stops at Layer 3; a vendor or backend addition is a drop-in at Layer 1. Layer 3 stays testable without hardware, and Layer 4 stays stable across backend or driver upgrades.