kempnerforge.training.eval¶
Evaluation utilities for KempnerForge.
Provides run_eval for computing eval loss and perplexity on a held-out dataset. Works with any parallel model (FSDP, TP, PP) — same model reference, no unwrapping needed.
Functions
|
Decide whether to build an eval dataloader and whether to warn. |
- kempnerforge.training.eval.should_build_eval_dataloader(eval_enabled, is_vlm)[source]¶
Decide whether to build an eval dataloader and whether to warn.
The training loop calls
run_eval(model, eval_dataloader, ...)which invokesmodel(input_ids)— this does not matchVLMWrapper.forward(pixel_values, input_ids, labels). VLM configs witheval.enabled=truewould crash on the first eval interval. This helper gates the eval setup: for VLM configs it suppresses eval and flags that a warning should be logged so users see their eval setting was ignored. VLM eval support is a tracked follow-up.Returns
(should_build, should_warn_vlm_skip).
- kempnerforge.training.eval.run_eval(model, eval_dataloader, loss_fn, device, eval_steps, *, pp_schedule=None, pp_rank=None, pp_size=None, pp_group=None)¶
Run evaluation and return metrics.
- Parameters:
model (torch.nn.Module) – The model (FSDP-wrapped, TP-sharded, or plain).
eval_dataloader (torch.utils.data.DataLoader) – DataLoader yielding {“input_ids”, “labels”} batches.
loss_fn (callable) – Loss function (logits, labels) -> scalar tensor.
device (torch.device) – Device to move batches to.
eval_steps (int) – Number of eval batches to process.
pp_schedule – Pipeline parallel schedule (None for non-PP).
pp_rank (int | None) – This rank’s PP stage index.
pp_size (int | None) – Total number of PP stages.
pp_group – Process group for PP loss broadcast.
- Returns:
Dict with “eval/loss” and “eval/perplexity”.
- Return type: