kempnerforge.profiling.cuda_timer¶
CUDA event-based timing utilities.
Provides lightweight, GPU-accurate timers for profiling specific regions of the training loop (forward pass, backward pass, communication, etc.). Uses CUDA events to avoid CPU synchronization overhead during measurement.
- Usage:
timer = CUDATimer() timer.start() # … GPU work … timer.stop() elapsed_ms = timer.elapsed_ms()
# Or use the multi-region tracker: timers = CUDATimerCollection(regions=[“forward”, “backward”, “comm”]) timers.start(“forward”) # … forward pass … timers.stop(“forward”) report = timers.elapsed_all() # {“forward”: 12.3, “backward”: 0.0, …}
Classes
CUDA event-based timer for accurate GPU timing. |
|
Collection of named CUDA timers for profiling multiple regions. |
- class kempnerforge.profiling.cuda_timer.CUDATimer[source]¶
Bases:
objectCUDA event-based timer for accurate GPU timing.
Uses CUDA events to measure elapsed time without CPU synchronization overhead (synchronizes only when reading the result).
- class kempnerforge.profiling.cuda_timer.CUDATimerCollection[source]¶
Bases:
objectCollection of named CUDA timers for profiling multiple regions.
Manages timers for distinct training phases (forward, backward, comm, etc.) and reports all elapsed times as a dictionary.
When
enabled=False, all operations are no-ops with zero overhead — start/stop calls return immediately without recording CUDA events.- Parameters:
regions – List of region names to track.
enabled – Whether timing is active. When False, all calls are no-ops.