kempnerpulse.system_queries

Cross-cutting tier — per-sample host/process queries (best-effort).

These run on every sampling tick to enrich the display with host CPU/RAM load and the GPU compute processes. Unlike the reader layer (which raises typed errors so the lifecycle can surface remediation), everything here is best-effort: any missing command, permission error, timeout, or non-zero exit degrades to an empty / None result and is never raised. A monitoring tool must keep rendering GPU metrics even when host introspection is unavailable.

Runtime dependencies are the standard library only.

Functions

query_gpu_processes(bus_id_to_index)

List GPU compute processes via nvidia-smi, keyed by GPU index (best-effort).

query_system_ram()

Return (used_gb, total_gb) from /proc/meminfo (best-effort).

Classes

CpuSampler

Stateful sampler for host CPU load from /proc/stat.

GpuProcess

A single compute process running on a GPU.

class kempnerpulse.system_queries.GpuProcess[source]

Bases: object

A single compute process running on a GPU.

pid: int
user: str
gid: str
gpu_id: str
gpu_mem_mib: float | None
command: str
__init__(pid, user, gid, gpu_id, gpu_mem_mib, command)
Parameters:
Return type:

None

class kempnerpulse.system_queries.CpuSampler[source]

Bases: object

Stateful sampler for host CPU load from /proc/stat.

Each sample() reads /proc/stat and diffs against the previous snapshot to compute utilization, so the first call (no prior snapshot) returns None for the percentage and busy-core count. The logical-CPU count and the (Slurm-aware) physical core count are cached on the instance; no module- or function-level state is used.

__init__()[source]
Return type:

None

sample()[source]

Return (num_threads, num_cores, cpu_percent, busy_cores).

num_threads is os.cpu_count() (logical CPUs); num_cores is the nproc --all total; cpu_percent is overall utilization over the interval since the previous call; busy_cores counts cores above the busy threshold. cpu_percent and busy_cores are None on the first call and whenever /proc/stat is unreadable.

Return type:

Tuple[int | None, int | None, float | None, int | None]

kempnerpulse.system_queries.query_system_ram()[source]

Return (used_gb, total_gb) from /proc/meminfo (best-effort).

“Used” is MemTotal - MemAvailable (falling back to MemFree when MemAvailable is absent). Returns (None, None) if /proc/meminfo cannot be read or parsed.

Return type:

Tuple[float | None, float | None]

kempnerpulse.system_queries.query_gpu_processes(bus_id_to_index)[source]

List GPU compute processes via nvidia-smi, keyed by GPU index (best-effort).

Uses --query-compute-apps (instant, no sampling delay) and requires a bus_id_to_index mapping (uppercased PCI bus id -> GPU index) to attribute each process to a GPU. For each PID, the owning user/group are resolved from /proc/<pid> ownership and the full command line from /proc/<pid>/cmdline. Returns {gpu_index: [GpuProcess, ...]}; an empty dict if the mapping is empty or nvidia-smi is unavailable.

Parameters:

bus_id_to_index (Dict[str, str])

Return type:

Dict[str, List[GpuProcess]]