kempnerforge.profiling.cuda_timer¶

CUDA event-based timing utilities.

Provides lightweight, GPU-accurate timers for profiling specific regions of the training loop (forward pass, backward pass, communication, etc.). Uses CUDA events to avoid CPU synchronization overhead during measurement.

Usage:

timer = CUDATimer() timer.start() # … GPU work … timer.stop() elapsed_ms = timer.elapsed_ms()

# Or use the multi-region tracker: timers = CUDATimerCollection(regions=[“forward”, “backward”, “comm”]) timers.start(“forward”) # … forward pass … timers.stop(“forward”) report = timers.elapsed_all() # {“forward”: 12.3, “backward”: 0.0, …}

Classes

`CUDATimer`	CUDA event-based timer for accurate GPU timing.
`CUDATimerCollection`	Collection of named CUDA timers for profiling multiple regions.

class kempnerforge.profiling.cuda_timer.CUDATimer[source]¶

Bases: object

CUDA event-based timer for accurate GPU timing.

Uses CUDA events to measure elapsed time without CPU synchronization overhead (synchronizes only when reading the result).

__init__()[source]¶

Return type:: None

start()[source]¶

Record the start event on the current CUDA stream.

Return type:: None

stop()[source]¶

Record the end event on the current CUDA stream.

Return type:: None

elapsed_ms()[source]¶

Get elapsed time in milliseconds (synchronizes CUDA).

Return type:: float

class kempnerforge.profiling.cuda_timer.CUDATimerCollection[source]¶

Bases: object

Collection of named CUDA timers for profiling multiple regions.

Manages timers for distinct training phases (forward, backward, comm, etc.) and reports all elapsed times as a dictionary.

When enabled=False, all operations are no-ops with zero overhead — start/stop calls return immediately without recording CUDA events.

Parameters:

regions – List of region names to track.
enabled – Whether timing is active. When False, all calls are no-ops.

__init__(regions, enabled=True)[source]¶

Parameters:

regions (list[str])
enabled (bool)

Return type:

None

property enabled: bool¶

start(region)[source]¶

Start timing a named region.

Parameters:: region (str)
Return type:: None

stop(region)[source]¶

Stop timing a named region.

Parameters:: region (str)
Return type:: None

elapsed_ms(region)[source]¶

Get elapsed time for a specific region in milliseconds.

Parameters:: region (str)
Return type:: float

elapsed_all()[source]¶

Get elapsed times for all regions in milliseconds.

Returns a dict mapping region name → elapsed_ms. Regions that were never started/stopped return 0.0.

Return type:: dict[str, float]