kempnerforge.metrics.memory¶
GPU memory tracking and reporting.
- Provides utilities for monitoring GPU memory usage during training:
Current / peak / reserved memory
Memory utilization as a percentage of total
Human-readable formatting
Functions
|
Format memory stats as a human-readable string. |
|
Get current GPU memory statistics in GB. |
|
Get peak memory utilization as a fraction of total GPU memory. |
|
Reset peak memory tracking counter. |
Classes
Tracks GPU memory usage across training steps. |
- kempnerforge.metrics.memory.get_memory_stats(device=0)[source]¶
Get current GPU memory statistics in GB.
- kempnerforge.metrics.memory.get_memory_utilization(device=0)[source]¶
Get peak memory utilization as a fraction of total GPU memory.
- kempnerforge.metrics.memory.format_memory_stats(device=0)[source]¶
Format memory stats as a human-readable string.
- kempnerforge.metrics.memory.reset_peak_memory(device=0)[source]¶
Reset peak memory tracking counter.
- Parameters:
device (int)
- Return type:
None
- class kempnerforge.metrics.memory.DeviceMemoryMonitor[source]¶
Bases:
objectTracks GPU memory usage across training steps.
Resets peak memory stats at each reporting interval so that the peak reflects per-interval usage rather than all-time peak.
Supports memory snapshot capture at a configurable step for debugging OOM and memory fragmentation with pytorch.org/memory_viz.
- Parameters:
device – CUDA device index.
snapshot_step – Step at which to capture a memory snapshot. None to disable.
snapshot_dir – Directory to save snapshots.
- capture_snapshot(step)[source]¶
Capture a CUDA memory snapshot and save as pickle.
The snapshot can be visualized at https://pytorch.org/memory_viz