CSV Export Reference¶
KempnerPulse can export GPU metrics as CSV for offline analysis or terminal
monitoring. Rows are emitted for every GPU in the visibility set
(CUDA_VISIBLE_DEVICES / SLURM_JOB_GPUS / --gpus / --show-all),
regardless of whether a compute process is currently running. This lets you
start the recorder before a job launches so the trace covers job startup.
Usage¶
# Default columns — pipe to file or watch on terminal
kempnerpulse --export > metrics.csv
# All 34 columns
kempnerpulse --export all > metrics.csv
# Custom column selection
kempnerpulse --export timestamp,gpu_id,real_util_pct,tensor_active_pct > metrics.csv
# Single snapshot
kempnerpulse --export --once
# Combine with other flags
kempnerpulse --export all --poll 5 --gpus 0,1 > metrics.csv
# High-resolution sampling via the dcgm backend (down to 100ms)
kempnerpulse --backend dcgm --export all --poll 0.1 > metrics.csv
Sampling Rate (--poll)¶
--poll semantics depend on the backend:
Backend |
Effective range |
Notes |
|---|---|---|
|
|
Drives a persistent |
|
|
dcgm-exporter scrapes profiling fields at ~30s, so sub-second |
For high-resolution profiling traces (e.g., capturing tensor activity at
100ms resolution to plot offline), use --backend dcgm --poll 0.1. Note
that only the profiling columns are bounded by the 10Hz internal
refresh; device columns (clocks, temps, power, framebuffer) are sampled
every tick and would update faster if the floor were lowered — but we
keep the floor at 100ms because Real Util and the workload
classification depend on the profiling counters.
Default Columns¶
When using --export without arguments, the following 9 columns are exported:
timestamp, gpu_id, model, gpu_util_pct, mem_used_mib, real_util_pct, sm_active_pct, tensor_active_pct, dram_active_pct
All Available Columns¶
Use --export all to include every column, or --export col1,col2,... to
pick a custom set.
Column |
Description |
|---|---|
|
Unix epoch seconds |
|
GPU index |
|
GPU model (e.g. H100, A100) |
|
Weighted Real Utilization % |
|
Workload classification |
|
Health state (OK/WARN/HOT/CRIT) |
|
SM Active % |
|
Tensor pipe active % |
|
DRAM active % |
|
GR Engine active % |
|
GPU Utilization % (nvidia-smi) |
|
Framebuffer used (MiB) |
|
Framebuffer total (MiB) |
|
Framebuffer used % |
|
Power draw (W) |
|
GPU temperature (°C) |
|
Memory temperature (°C) |
|
SM Occupancy % |
|
FP16 pipe active % |
|
FP32 pipe active % |
|
FP64 pipe active % |
|
Memory copy utilization % |
|
PCIe receive (bytes/s) |
|
PCIe transmit (bytes/s) |
|
NVLink throughput (GB/s) |
|
SM clock (MHz) |
|
Memory clock (MHz) |
|
PCIe replay rate (/s) |
|
Cumulative energy (J) |
|
TC FP16/BF16 HMMA % |
|
TC INT8 IMMA % |
|
TC FP64 DFMA % |
|
TC TF32/FP32 DMMA % |
|
TC FP8 QMMA % |
Notes¶
Timestamp: Unix epoch seconds with centisecond precision (e.g.
1743782400.12). Convert withpd.to_datetime(df.timestamp, unit='s').GPU filtering: Only GPUs where the current user has at least one running compute process are included. If no processes are found, only the header is output and a diagnostic message is printed to stderr.
Rate fields:
pcie_replay_rate_srequires two samples to compute a rate, so it will be empty on the first row.Missing values: Exported as empty strings in the CSV.
Pipe-friendly: Output is flushed after each poll interval. Handles
BrokenPipeErrorgracefully (e.g.kempnerpulse --export | head -20).