kempnerforge.data.dataloader

Distributed, stateful DataLoader for KempnerForge.

Wraps PyTorch DataLoader with:
  • Distributed-aware setup (correct worker count, pinned memory)

  • Stateful iteration tracking for checkpoint/resume

  • Integration with DistributedSampler for rank-partitioned data

Classes

StatefulDataLoader

Stateful wrapper around PyTorch DataLoader.

class kempnerforge.data.dataloader.StatefulDataLoader[source]

Bases: object

Stateful wrapper around PyTorch DataLoader.

Tracks iteration progress so training can resume from the exact position after a checkpoint load.

Parameters:
  • dataset – Dataset to load from.

  • batch_size – Per-device micro-batch size.

  • sampler – Distributed sampler (created automatically if None).

  • config – Data pipeline configuration.

__init__(dataset, batch_size, sampler=None, config=None)[source]
Parameters:
Return type:

None

state_dict()[source]

Return checkpoint state. Keys: epoch, batches_yielded, sampler.

Return type:

dict

load_state_dict(state)[source]

Restore from checkpoint. Restores sampler state and skips to saved batch position.

Parameters:

state (dict)

Return type:

None