CSV Export Reference

KempnerPulse can export GPU metrics as CSV for offline analysis or terminal monitoring. Rows are emitted for every GPU in the visibility set (CUDA_VISIBLE_DEVICES / SLURM_JOB_GPUS / --gpus / --show-all), regardless of whether a compute process is currently running. This lets you start the recorder before a job launches so the trace covers job startup.

Usage

# Default columns — pipe to file or watch on terminal
kempnerpulse --export > metrics.csv

# All 34 columns
kempnerpulse --export all > metrics.csv

# Custom column selection
kempnerpulse --export timestamp,gpu_id,real_util_pct,tensor_active_pct > metrics.csv

# Single snapshot
kempnerpulse --export --once

# Combine with other flags
kempnerpulse --export all --poll 5 --gpus 0,1 > metrics.csv

# High-resolution sampling via the dcgm backend (down to 100ms)
kempnerpulse --backend dcgm --export all --poll 0.1 > metrics.csv

Sampling Rate (--poll)

--poll semantics depend on the backend:

Backend

Effective range

Notes

dcgm (recommended for export)

0.1s – any

Drives a persistent dcgmi dmon stream at the requested interval. Values below 100ms are clamped with a notice — DCGM’s profiling counters (DCGM_FI_PROF_*, i.e. SM/Tensor/DRAM Active and friends) refresh at ~10Hz via the shared hardware-counter multiplexer, so smaller intervals just produce blank profiling rows. One CSV row-set is emitted per dcgmi tick — no spawned subprocess per cycle, no skew.

prometheus (default)

>= 1.0s

dcgm-exporter scrapes profiling fields at ~30s, so sub-second --poll values produce duplicate rows with no new data. Sub-second values are rejected with a warning.

For high-resolution profiling traces (e.g., capturing tensor activity at 100ms resolution to plot offline), use --backend dcgm --poll 0.1. Note that only the profiling columns are bounded by the 10Hz internal refresh; device columns (clocks, temps, power, framebuffer) are sampled every tick and would update faster if the floor were lowered — but we keep the floor at 100ms because Real Util and the workload classification depend on the profiling counters.

Default Columns

When using --export without arguments, the following 9 columns are exported:

timestamp, gpu_id, model, gpu_util_pct, mem_used_mib, real_util_pct, sm_active_pct, tensor_active_pct, dram_active_pct

All Available Columns

Use --export all to include every column, or --export col1,col2,... to pick a custom set.

Column

Description

timestamp

Unix epoch seconds

gpu_id

GPU index

model

GPU model (e.g. H100, A100)

real_util_pct

Weighted Real Utilization %

status

Workload classification

health

Health state (OK/WARN/HOT/CRIT)

sm_active_pct

SM Active %

tensor_active_pct

Tensor pipe active %

dram_active_pct

DRAM active %

gr_engine_active_pct

GR Engine active %

gpu_util_pct

GPU Utilization % (nvidia-smi)

mem_used_mib

Framebuffer used (MiB)

mem_total_mib

Framebuffer total (MiB)

mem_used_pct

Framebuffer used %

power_w

Power draw (W)

gpu_temp_c

GPU temperature (°C)

mem_temp_c

Memory temperature (°C)

sm_occupancy_pct

SM Occupancy %

fp16_pipe_pct

FP16 pipe active %

fp32_pipe_pct

FP32 pipe active %

fp64_pipe_pct

FP64 pipe active %

memcpy_util_pct

Memory copy utilization %

pcie_rx_bytes_s

PCIe receive (bytes/s)

pcie_tx_bytes_s

PCIe transmit (bytes/s)

nvlink_gbps

NVLink throughput (GB/s)

sm_clock_mhz

SM clock (MHz)

mem_clock_mhz

Memory clock (MHz)

pcie_replay_rate_s

PCIe replay rate (/s)

energy_j

Cumulative energy (J)

tc_hmma_pct

TC FP16/BF16 HMMA %

tc_imma_pct

TC INT8 IMMA %

tc_dfma_pct

TC FP64 DFMA %

tc_dmma_pct

TC TF32/FP32 DMMA %

tc_qmma_pct

TC FP8 QMMA %

Notes

  • Timestamp: Unix epoch seconds with centisecond precision (e.g. 1743782400.12). Convert with pd.to_datetime(df.timestamp, unit='s').

  • GPU filtering: Only GPUs where the current user has at least one running compute process are included. If no processes are found, only the header is output and a diagnostic message is printed to stderr.

  • Rate fields: pcie_replay_rate_s requires two samples to compute a rate, so it will be empty on the first row.

  • Missing values: Exported as empty strings in the CSV.

  • Pipe-friendly: Output is flushed after each poll interval. Handles BrokenPipeError gracefully (e.g. kempnerpulse --export | head -20).