kempnerforge.profiling.profiler

torch.profiler integration for KempnerForge.

Provides a step-aware profiler wrapper that activates only within a configured step range, exports Chrome traces, and integrates with the training loop via a simple .step() interface.

Functions

build_profiler(config[, rank])

Build a torch.profiler instance from config.

print_profiler_summary(prof[, trace_dir])

Print kernel-level GPU profiling summary and optionally save to file.

kempnerforge.profiling.profiler.build_profiler(config, rank=0)[source]

Build a torch.profiler instance from config.

Returns None if profiling is disabled.

Parameters:
  • config (ProfilingConfig) – Profiling configuration.

  • rank (int) – Current rank (for output directory naming).

Returns:

A torch.profiler.profile context manager, or None.

Return type:

torch.profiler.profile | None

kempnerforge.profiling.profiler.print_profiler_summary(prof, trace_dir=None)[source]

Print kernel-level GPU profiling summary and optionally save to file.

Prints top CUDA kernels by time and FLOPS, an aggregate GPU time breakdown (matmul, communication, memory, other), and achieved TFLOPS vs hardware peak.

If trace_dir is provided, writes a summary.md file alongside the traces.

Parameters:
  • prof (torch.profiler.profile) – A completed torch.profiler.profile instance.

  • trace_dir (str | None) – Optional directory to save summary.md report.

Return type:

None