kempnerforge.data.dataloader

Distributed, stateful DataLoader for KempnerForge.

Wraps PyTorch DataLoader with:
  • Distributed-aware setup (correct worker count, pinned memory)

  • Stateful iteration tracking for checkpoint/resume

  • Integration with DistributedSampler for rank-partitioned data

Classes

StatefulDataLoader

Stateful wrapper around PyTorch DataLoader.

class kempnerforge.data.dataloader.StatefulDataLoader[source]

Bases: object

Stateful wrapper around PyTorch DataLoader.

Tracks iteration progress so training can resume from the exact position after a checkpoint load.

Parameters:
  • dataset – Dataset to load from.

  • batch_size – Per-device micro-batch size.

  • sampler – Distributed sampler (created automatically if None).

  • config – Data pipeline configuration.

  • collate_fn – Optional custom batch collator. When None, uses PyTorch’s default collation. VLM training passes VLMCollator so that the fixed-length padding and image_positions slot reach the batch.

__init__(dataset, batch_size, sampler=None, config=None, collate_fn=None)[source]
Parameters:
Return type:

None

state_dict()[source]

Return checkpoint state. Keys: epoch, batches_yielded, sampler.

Return type:

dict

load_state_dict(state)[source]

Restore from checkpoint. __iter__ re-applies the sampler skip from _batches_yielded, so double-resume within the same epoch stays aligned.

Parameters:

state (dict)

Return type:

None