kempnerforge.config.checkpoint

Checkpoint configuration.

Classes

AsyncCheckpointMode

CheckpointConfig

Checkpointing settings.

DynamicCheckpointWindow

A bounded step range with a registered checkpoint strategy.

class kempnerforge.config.checkpoint.AsyncCheckpointMode[source]

Bases: StrEnum

disabled = 'disabled'
async_ = 'async'
async_pinned = 'async_with_pinned_mem'
__new__(value)
class kempnerforge.config.checkpoint.DynamicCheckpointWindow[source]

Bases: object

A bounded step range with a registered checkpoint strategy.

Inside [start, stop] the strategy decides which steps to save, and every such step is exempt from CheckpointConfig.keep_last_n retention. Outside the window the regular CheckpointConfig.interval cadence applies.

"power2" (default) saves at start and at every start + 2^k while <= stop – tight at the start of the window, doubling thereafter. New strategies register via @registry.register_dyn_ckpt_strategy(name) and become selectable by setting strategy.

start: int = 0
stop: int = 512
strategy: str = 'power2'
is_milestone(step)[source]

True iff the configured strategy fires at step.

Parameters:

step (int)

Return type:

bool

__init__(start=0, stop=512, strategy='power2')
Parameters:
Return type:

None

class kempnerforge.config.checkpoint.CheckpointConfig[source]

Bases: object

Checkpointing settings.

dir: str = 'checkpoints'
interval: int = 1000
dyn_ckpt_window: DynamicCheckpointWindow | None = None
async_mode: AsyncCheckpointMode = 'disabled'
keep_last_n: int = 3
load_path: str | None = None
export_dtype: Literal['float32', 'bfloat16'] = 'bfloat16'
exclude_from_loading: list[str]
ignore_freeze_mismatch: bool = False
should_save(step)[source]

Whether to write a checkpoint at step.

Inside dyn_ckpt_window: the registered strategy decides (default "power2" saves at start and each start + 2^k while <= stop). Outside the window: every interval steps. Dynamic milestones are exempt from keep_last_n (see CheckpointManager._cleanup).

Parameters:

step (int)

Return type:

bool

is_dynamic_milestone(step)[source]

True if step is a milestone of the configured dyn_ckpt_window.

CheckpointManager._cleanup excludes these from keep_last_n so the dense early-window checkpoints survive a finite retention.

Parameters:

step (int)

Return type:

bool

__init__(dir='checkpoints', interval=1000, dyn_ckpt_window=None, async_mode=AsyncCheckpointMode.disabled, keep_last_n=3, load_path=None, export_dtype='bfloat16', exclude_from_loading=<factory>, ignore_freeze_mismatch=False)
Parameters:
Return type:

None