Canonical record schema¶
CanonicalRecord (in kempnerpulse.translate.schema) is KempnerPulse’s
internal, vendor-neutral vocabulary for one GPU reading at one instant. The
Translate layer converts each backend’s raw output (DCGM field IDs, Prometheus
metric names, …) into this single shape; everything downstream — the Real
Utilization composite, the workload classification, the terminal UI, and the
CSV export — reads canonical fields and never sees a vendor identifier.
Current SCHEMA_VERSION: 1.
Naming convention¶
Every field is snake_case and follows <scope>_<subsystem>_<aspect>_<unit>:
scope —
record_(record metadata),entity_(GPU / MIG identity),gpu_(per-GPU hardware reading).ratios end in
_fractionand lie in[0.0, 1.0]— never_pct. A consumer that wants a 0–100 percentage multiplies by 100 itself.throughputs carry
_bytes_per_second; cumulative counters use the bare unit (_joules); event counts use_count.units are spelled out:
_celsius,_megahertz,_mebibytes,_watts,_microseconds.
None always means the source did not provide this reading. It is never
silently coerced to 0; a real zero stays 0.
Enums¶
class AggregationMode(Enum):
POINT = "point" # one ~100 ms snapshot (e.g. dcgmi at --poll 0.1)
WINDOW = "window" # time-average over record_window_microseconds (e.g. prometheus)
class Provenance(Enum):
DCGMI = "dcgmi"
PROMETHEUS = "prometheus"
NVML_FALLBACK = "nvml_fallback" # a counter was substituted from NVML
REPLAY = "replay"
Record metadata (required)¶
Field |
Type |
Unit / range |
|---|---|---|
|
|
≥ 1 |
|
|
seconds since reader start |
|
|
unix seconds |
|
|
|
|
|
integration window (µs) |
|
|
staleness at delivery (µs) |
|
|
source of the record |
|
|
always populated |
Entity identity¶
Field |
Type |
Notes |
|---|---|---|
|
|
logical index as the reader sees it (required) |
|
|
hardware-stable identifier (required) |
|
|
|
|
|
reserved (per-process attribution) |
|
|
reserved |
Cluster / Slurm / MPI metadata (optional)¶
record_slurm_job_id, record_slurm_step_id, record_slurm_array_job_id,
record_slurm_array_task_id, record_slurm_restart_count,
record_node_index_in_job, record_mpi_rank,
record_capture_clock_offset_microseconds — all Optional, tolerate absence so
local / non-Slurm runs still produce valid records.
Reserved user-annotation metadata (optional)¶
record_user_annotation_iteration_index, _phase_label, _step_count,
_request_id, _token_count — defined for forward compatibility; emitted as
None in v0.5.0.
GPU hardware readings (all Optional)¶
Field |
Unit / range |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
bytes/s (differenced) |
|
bytes/s (differenced) |
|
count (cumulative) |
|
bytes/s (differenced) |
|
watts |
|
joules (cumulative) |
|
watts |
|
watts |
|
°C |
|
°C |
|
MHz |
|
MHz |
|
MiB |
|
MiB |
|
MiB |
|
MiB (derived: used + free + reserved) |
|
|
|
count (cumulative) |
|
count |
|
count |
|
bool |
Per-link NVLink rates are defined as a reserved naming pattern but are not collected in the default schema.
Invariants¶
CanonicalRecord.validate() raises TranslateError if any single-record
invariant fails:
record_schema_version >= 1.Every
_fractionfield, if notNone, lies in[0.0, 1.0].Every count and physical magnitude (throughput, power, energy, clock, framebuffer), if not
None, is>= 0.record_window_microsecondsandrecord_freshness_microsecondsare>= 0.POINTrecords haverecord_window_microseconds <= 200_000;WINDOWrecords haverecord_window_microseconds > 200_000.
One invariant is cross-record and therefore enforced by the Translate
differencer (which sees the sequence), not by validate():
gpu_board_total_energy_joules is monotonically non-decreasing per entity.
Versioning¶
Adding a field is a minor SCHEMA_VERSION bump (N → N+1); readers on an
older version tolerate unknown extra fields. Removing or renaming a field is a
major bump. The classification labels and the Real Utilization composite are
Compute-layer outputs, not part of CanonicalRecord.