kempnerforge.profiling.cuda_timer

CUDA event-based timing utilities.

Provides lightweight, GPU-accurate timers for profiling specific regions of the training loop (forward pass, backward pass, communication, etc.). Uses CUDA events to avoid CPU synchronization overhead during measurement.

Usage:

timer = CUDATimer() timer.start() # … GPU work … timer.stop() elapsed_ms = timer.elapsed_ms()

# Or use the multi-region tracker: timers = CUDATimerCollection(regions=[“forward”, “backward”, “comm”]) timers.start(“forward”) # … forward pass … timers.stop(“forward”) report = timers.elapsed_all() # {“forward”: 12.3, “backward”: 0.0, …}

Classes

CUDATimer

CUDA event-based timer for accurate GPU timing.

CUDATimerCollection

Collection of named CUDA timers for profiling multiple regions.

class kempnerforge.profiling.cuda_timer.CUDATimer[source]

Bases: object

CUDA event-based timer for accurate GPU timing.

Uses CUDA events to measure elapsed time without CPU synchronization overhead (synchronizes only when reading the result).

__init__()[source]
Return type:

None

start()[source]

Record the start event on the current CUDA stream.

Return type:

None

stop()[source]

Record the end event on the current CUDA stream.

Return type:

None

elapsed_ms()[source]

Get elapsed time in milliseconds (synchronizes CUDA).

Return type:

float

class kempnerforge.profiling.cuda_timer.CUDATimerCollection[source]

Bases: object

Collection of named CUDA timers for profiling multiple regions.

Manages timers for distinct training phases (forward, backward, comm, etc.) and reports all elapsed times as a dictionary.

When enabled=False, all operations are no-ops with zero overhead — start/stop calls return immediately without recording CUDA events.

Parameters:
  • regions – List of region names to track.

  • enabled – Whether timing is active. When False, all calls are no-ops.

__init__(regions, enabled=True)[source]
Parameters:
Return type:

None

property enabled: bool
start(region)[source]

Start timing a named region.

Parameters:

region (str)

Return type:

None

stop(region)[source]

Stop timing a named region.

Parameters:

region (str)

Return type:

None

elapsed_ms(region)[source]

Get elapsed time for a specific region in milliseconds.

Parameters:

region (str)

Return type:

float

elapsed_all()[source]

Get elapsed times for all regions in milliseconds.

Returns a dict mapping region name → elapsed_ms. Regions that were never started/stopped return 0.0.

Return type:

dict[str, float]