kempnerpulse.translate.schema¶
The canonical schema — the inter-layer contract.
CanonicalRecord is the single internal vocabulary that Layers 3 (Compute)
and 4 (Present) depend on. Layer 2 (Translate) is the only layer that knows
about source vocabularies (DCGM field IDs, Prometheus names), units, and
backend quirks; it emits CanonicalRecord objects and nothing above it ever sees a
vendor identifier again.
Field-naming convention (every field follows <scope>_<subsystem>_<aspect>_<unit>):
record_*— record-level metadata;entity_*— GPU / MIG identity;gpu_*— per-GPU hardware readings.Ratios are
..._fractionin[0.0, 1.0]— never_pct. A presenter that wants 0–100 multiplies by 100 itself.Throughputs carry
_bytes_per_second; cumulative counters use the bare unit (_joules); event counts use_count.Nonemeans “the source did not provide this reading” — never coerced to 0.
The names are long and explicit on purpose: a reader of
gpu_streaming_multiprocessor_active_cycle_fraction needs no glossary.
Functions
Every |
Classes
How a record's metric values are integrated over time. |
|
One fully-translated reading for one entity, in canonical vocabulary. |
|
Where a record came from. |
Exceptions
A canonical record violated a schema invariant (see |
- exception kempnerpulse.translate.schema.TranslateError[source]¶
Bases:
ValueErrorA canonical record violated a schema invariant (see
validate).
- class kempnerpulse.translate.schema.AggregationMode[source]¶
Bases:
EnumHow a record’s metric values are integrated over time.
- POINT = 'point'¶
- WINDOW = 'window'¶
- class kempnerpulse.translate.schema.Provenance[source]¶
Bases:
EnumWhere a record came from.
- DCGMI = 'dcgmi'¶
- PROMETHEUS = 'prometheus'¶
- NVML_FALLBACK = 'nvml_fallback'¶
- REPLAY = 'replay'¶
- class kempnerpulse.translate.schema.CanonicalRecord[source]¶
Bases:
objectOne fully-translated reading for one entity, in canonical vocabulary.
All
Optional[float]metric fields areNoneunless the source provided them. The required block (no defaults) is the metadata every record must carry; the optional block is the per-subsystem readings plus cluster and reserved metadata that tolerate absence.- record_aggregation_mode: AggregationMode¶
- record_provenance: Provenance¶
- validate()[source]¶
Raise
TranslateErrorif any single-record invariant is violated.Single-record invariants only. The cross-record invariant — energy is monotonically non-decreasing per entity — is enforced upstream by the Translate differencer, which is the only component that sees the sequence; it cannot be checked from one record in isolation.
- Return type:
None
- __init__(record_schema_version, record_timestamp_monotonic_seconds, record_timestamp_wallclock_unix_seconds, record_aggregation_mode, record_window_microseconds, record_freshness_microseconds, record_provenance, record_hostname, entity_gpu_index, entity_gpu_uuid, entity_mig_instance_index=None, entity_process_id=None, entity_process_command_line_truncated=None, record_slurm_job_id=None, record_slurm_step_id=None, record_slurm_array_job_id=None, record_slurm_array_task_id=None, record_slurm_restart_count=None, record_node_index_in_job=None, record_mpi_rank=None, record_capture_clock_offset_microseconds=None, record_user_annotation_iteration_index=None, record_user_annotation_phase_label=None, record_user_annotation_step_count=None, record_user_annotation_request_id=None, record_user_annotation_token_count=None, gpu_streaming_multiprocessor_active_cycle_fraction=None, gpu_streaming_multiprocessor_warp_occupancy_fraction=None, gpu_tensor_core_pipe_active_cycle_fraction=None, gpu_tensor_core_half_precision_mma_active_cycle_fraction=None, gpu_tensor_core_integer_mma_active_cycle_fraction=None, gpu_tensor_core_double_precision_fma_active_cycle_fraction=None, gpu_tensor_core_double_mma_active_cycle_fraction=None, gpu_tensor_core_quarter_mma_active_cycle_fraction=None, gpu_cuda_core_floating_point_64bit_pipe_active_cycle_fraction=None, gpu_cuda_core_floating_point_32bit_pipe_active_cycle_fraction=None, gpu_cuda_core_floating_point_16bit_pipe_active_cycle_fraction=None, gpu_graphics_compute_engine_active_cycle_fraction=None, gpu_dram_controller_active_cycle_fraction=None, gpu_memory_copy_engine_busy_time_fraction=None, gpu_pcie_transmit_throughput_bytes_per_second=None, gpu_pcie_receive_throughput_bytes_per_second=None, gpu_pcie_replay_count=None, gpu_nvlink_aggregate_throughput_bytes_per_second=None, gpu_board_power_draw_watts=None, gpu_board_total_energy_joules=None, gpu_board_enforced_power_limit_watts=None, gpu_board_default_power_limit_watts=None, gpu_die_temperature_celsius=None, gpu_memory_die_temperature_celsius=None, gpu_streaming_multiprocessor_clock_frequency_megahertz=None, gpu_memory_clock_frequency_megahertz=None, gpu_framebuffer_used_mebibytes=None, gpu_framebuffer_free_mebibytes=None, gpu_framebuffer_reserved_mebibytes=None, gpu_framebuffer_total_mebibytes=None, gpu_nvml_busy_time_fraction=None, gpu_xid_error_count=None, gpu_uncorrectable_remapped_row_count=None, gpu_correctable_remapped_row_count=None, gpu_row_remap_failure_flag=None)¶
- Parameters:
record_schema_version (int)
record_timestamp_monotonic_seconds (float)
record_timestamp_wallclock_unix_seconds (float)
record_aggregation_mode (AggregationMode)
record_window_microseconds (int)
record_freshness_microseconds (int)
record_provenance (Provenance)
record_hostname (str)
entity_gpu_index (int)
entity_gpu_uuid (str)
entity_mig_instance_index (int | None)
entity_process_id (int | None)
entity_process_command_line_truncated (str | None)
record_slurm_job_id (str | None)
record_slurm_step_id (str | None)
record_slurm_array_job_id (str | None)
record_slurm_array_task_id (str | None)
record_slurm_restart_count (int | None)
record_node_index_in_job (int | None)
record_mpi_rank (int | None)
record_capture_clock_offset_microseconds (int | None)
record_user_annotation_iteration_index (int | None)
record_user_annotation_phase_label (str | None)
record_user_annotation_step_count (int | None)
record_user_annotation_request_id (str | None)
record_user_annotation_token_count (int | None)
gpu_streaming_multiprocessor_active_cycle_fraction (float | None)
gpu_streaming_multiprocessor_warp_occupancy_fraction (float | None)
gpu_tensor_core_pipe_active_cycle_fraction (float | None)
gpu_tensor_core_half_precision_mma_active_cycle_fraction (float | None)
gpu_tensor_core_integer_mma_active_cycle_fraction (float | None)
gpu_tensor_core_double_precision_fma_active_cycle_fraction (float | None)
gpu_tensor_core_double_mma_active_cycle_fraction (float | None)
gpu_tensor_core_quarter_mma_active_cycle_fraction (float | None)
gpu_cuda_core_floating_point_64bit_pipe_active_cycle_fraction (float | None)
gpu_cuda_core_floating_point_32bit_pipe_active_cycle_fraction (float | None)
gpu_cuda_core_floating_point_16bit_pipe_active_cycle_fraction (float | None)
gpu_graphics_compute_engine_active_cycle_fraction (float | None)
gpu_dram_controller_active_cycle_fraction (float | None)
gpu_memory_copy_engine_busy_time_fraction (float | None)
gpu_pcie_transmit_throughput_bytes_per_second (float | None)
gpu_pcie_receive_throughput_bytes_per_second (float | None)
gpu_pcie_replay_count (int | None)
gpu_nvlink_aggregate_throughput_bytes_per_second (float | None)
gpu_board_power_draw_watts (float | None)
gpu_board_total_energy_joules (float | None)
gpu_board_enforced_power_limit_watts (float | None)
gpu_board_default_power_limit_watts (float | None)
gpu_die_temperature_celsius (float | None)
gpu_memory_die_temperature_celsius (float | None)
gpu_streaming_multiprocessor_clock_frequency_megahertz (float | None)
gpu_memory_clock_frequency_megahertz (float | None)
gpu_framebuffer_used_mebibytes (float | None)
gpu_framebuffer_free_mebibytes (float | None)
gpu_framebuffer_reserved_mebibytes (float | None)
gpu_framebuffer_total_mebibytes (float | None)
gpu_nvml_busy_time_fraction (float | None)
gpu_xid_error_count (int | None)
gpu_uncorrectable_remapped_row_count (int | None)
gpu_correctable_remapped_row_count (int | None)
gpu_row_remap_failure_flag (bool | None)
- Return type:
None