kempnerpulse.system_queries¶

Cross-cutting tier — per-sample host/process queries (best-effort).

These run on every sampling tick to enrich the display with host CPU/RAM load and the GPU compute processes. Unlike the reader layer (which raises typed errors so the lifecycle can surface remediation), everything here is best-effort: any missing command, permission error, timeout, or non-zero exit degrades to an empty / None result and is never raised. A monitoring tool must keep rendering GPU metrics even when host introspection is unavailable.

Runtime dependencies are the standard library only.

Functions

`query_gpu_processes`(bus_id_to_index)	List GPU compute processes via nvidia-smi, keyed by GPU index (best-effort).
`query_system_ram`()	Return `(used_gb, total_gb)` from `/proc/meminfo` (best-effort).

Classes

`CpuSampler`	Stateful sampler for host CPU load from `/proc/stat`.
`GpuProcess`	A single compute process running on a GPU.

class kempnerpulse.system_queries.GpuProcess[source]¶

Bases: object

A single compute process running on a GPU.

pid: int¶

user: str¶

gid: str¶

gpu_id: str¶

gpu_mem_mib: float | None¶

command: str¶

__init__(pid, user, gid, gpu_id, gpu_mem_mib, command)¶

Parameters:

pid (int)
user (str)
gid (str)
gpu_id (str)
gpu_mem_mib (float | None)
command (str)

Return type:

None

class kempnerpulse.system_queries.CpuSampler[source]¶

Bases: object

Stateful sampler for host CPU load from /proc/stat.

Each sample() reads /proc/stat and diffs against the previous snapshot to compute utilization, so the first call (no prior snapshot) returns None for the percentage and busy-core count. The logical-CPU count and the (Slurm-aware) physical core count are cached on the instance; no module- or function-level state is used.

__init__()[source]¶

Return type:: None

sample()[source]¶

Return (num_threads, num_cores, cpu_percent, busy_cores).

num_threads is os.cpu_count() (logical CPUs); num_cores is the nproc --all total; cpu_percent is overall utilization over the interval since the previous call; busy_cores counts cores above the busy threshold. cpu_percent and busy_cores are None on the first call and whenever /proc/stat is unreadable.

Return type:: Tuple[int | None, int | None, float | None, int | None]

kempnerpulse.system_queries.query_system_ram()[source]¶

Return (used_gb, total_gb) from /proc/meminfo (best-effort).

“Used” is MemTotal - MemAvailable (falling back to MemFree when MemAvailable is absent). Returns (None, None) if /proc/meminfo cannot be read or parsed.

Return type:: Tuple[float | None, float | None]

kempnerpulse.system_queries.query_gpu_processes(bus_id_to_index)[source]¶

List GPU compute processes via nvidia-smi, keyed by GPU index (best-effort).

Uses --query-compute-apps (instant, no sampling delay) and requires a bus_id_to_index mapping (uppercased PCI bus id -> GPU index) to attribute each process to a GPU. For each PID, the owning user/group are resolved from /proc/<pid> ownership and the full command line from /proc/<pid>/cmdline. Returns {gpu_index: [GpuProcess, ...]}; an empty dict if the mapping is empty or nvidia-smi is unavailable.

Parameters:: bus_id_to_index (Dict[str, str])
Return type:: Dict[str, List[GpuProcess]]