kempnerforge.profiling.profiler¶
torch.profiler integration for KempnerForge.
Provides a step-aware profiler wrapper that activates only within a configured step range, exports Chrome traces, and integrates with the training loop via a simple .step() interface.
Functions
|
Build a torch.profiler instance from config. |
|
Print kernel-level GPU profiling summary and optionally save to file. |
- kempnerforge.profiling.profiler.build_profiler(config, rank=0)[source]¶
Build a torch.profiler instance from config.
Returns None if profiling is disabled.
- Parameters:
config (ProfilingConfig) – Profiling configuration.
rank (int) – Current rank (for output directory naming).
- Returns:
A torch.profiler.profile context manager, or None.
- Return type:
torch.profiler.profile | None
- kempnerforge.profiling.profiler.print_profiler_summary(prof, trace_dir=None)[source]¶
Print kernel-level GPU profiling summary and optionally save to file.
Prints top CUDA kernels by time and FLOPS, an aggregate GPU time breakdown (matmul, communication, memory, other), and achieved TFLOPS vs hardware peak.
If trace_dir is provided, writes a summary.md file alongside the traces.
- Parameters:
prof (torch.profiler.profile) – A completed torch.profiler.profile instance.
trace_dir (str | None) – Optional directory to save summary.md report.
- Return type:
None