kempnerforge.config¶
Configuration system for KempnerForge.
- class kempnerforge.config.CheckpointConfig[source]¶
Bases:
objectCheckpointing settings.
- async_mode: AsyncCheckpointMode = 'disabled'¶
- __init__(dir='checkpoints', interval=1000, async_mode=AsyncCheckpointMode.disabled, keep_last_n=3, load_path=None, export_dtype='bfloat16', exclude_from_loading=<factory>)¶
- class kempnerforge.config.DataConfig[source]¶
Bases:
objectData pipeline settings.
- datasets: list[DatasetSource]¶
- phases: list[TrainingPhase]¶
- __init__(dataset_path='', file_pattern='*.npy', tokenizer_path='', num_workers=4, pin_memory=True, prefetch_factor=2, hf_dataset_name=None, hf_dataset_config=None, hf_dataset_split='train', hf_dataset_text_field='text', hf_streaming=False, pack_sequences=False, datasets=<factory>, mix_temperature=1.0, phases=<factory>, anneal_start_step=0, anneal_weights=<factory>)¶
- Parameters:
dataset_path (str)
file_pattern (str)
tokenizer_path (str)
num_workers (int)
pin_memory (bool)
prefetch_factor (int)
hf_dataset_name (str | None)
hf_dataset_config (str | None)
hf_dataset_split (str)
hf_dataset_text_field (str)
hf_streaming (bool)
pack_sequences (bool)
datasets (list[DatasetSource])
mix_temperature (float)
phases (list[TrainingPhase])
anneal_start_step (int)
- Return type:
None
- class kempnerforge.config.DistributedConfig[source]¶
Bases:
objectParallelism dimensions and distributed settings.
- pp_schedule: PipelineSchedule = '1f1b'¶
- validate_world_size(world_size)[source]¶
Validate that parallelism dimensions match world size.
- Parameters:
world_size (int)
- Return type:
None
- resolve(world_size)[source]¶
Return a copy with dp_shard resolved to a concrete value.
- Parameters:
world_size (int)
- Return type:
- __init__(dp_shard=-1, dp_replicate=1, tp=1, pp=1, pp_schedule=PipelineSchedule.schedule_1f1b, cp=1, ep=1, nccl_timeout_sec=1800, backend='cpu:gloo,cuda:nccl')¶
- class kempnerforge.config.EvalConfig[source]¶
Bases:
objectEvaluation pipeline settings (disabled by default).
- __init__(enabled=False, interval=1000, steps=50, dataset_path='', file_pattern='*.npy', hf_dataset_name=None, hf_dataset_config=None, hf_dataset_split='validation')¶
- class kempnerforge.config.JobConfig[source]¶
Bases:
objectTop-level configuration aggregating all sub-configs.
- model: ModelConfig¶
- train: TrainConfig¶
- optimizer: OptimizerConfig¶
- scheduler: SchedulerConfig¶
- data: DataConfig¶
- eval: EvalConfig¶
- distributed: DistributedConfig¶
- checkpoint: CheckpointConfig¶
- metrics: MetricsConfig¶
- profiling: ProfilingConfig¶
- validate(world_size=1)[source]¶
Run cross-config validations.
- Parameters:
world_size (int)
- Return type:
None
- __init__(model=<factory>, train=<factory>, optimizer=<factory>, scheduler=<factory>, data=<factory>, eval=<factory>, distributed=<factory>, checkpoint=<factory>, metrics=<factory>, profiling=<factory>)¶
- Parameters:
model (ModelConfig)
train (TrainConfig)
optimizer (OptimizerConfig)
scheduler (SchedulerConfig)
data (DataConfig)
eval (EvalConfig)
distributed (DistributedConfig)
checkpoint (CheckpointConfig)
metrics (MetricsConfig)
profiling (ProfilingConfig)
- Return type:
None
- class kempnerforge.config.MetricsConfig[source]¶
Bases:
objectLogging and metrics settings.
- __init__(log_interval=10, enable_wandb=False, enable_tensorboard=False, wandb_project='kempnerforge', wandb_run_name=None, wandb_run_id='', tensorboard_dir='tb_logs')¶
- class kempnerforge.config.ModelConfig[source]¶
Bases:
objectArchitecture hyperparameters for a transformer model.
- activation: Activation = 'silu'¶
FFN hidden dimension, rounded to nearest multiple of 256 for hardware efficiency.
- property num_params_estimate: int¶
Rough total parameter count estimate (excluding embedding if tied).
For MoE models, counts all expert parameters (total, not active).
- __init__(dim=4096, n_layers=32, n_heads=32, n_kv_heads=None, vocab_size=32000, ffn_dim_multiplier=1.0, ffn_hidden_dim=None, norm_type=NormType.rmsnorm, norm_eps=1e-05, activation=Activation.silu, max_seq_len=2048, rope_theta=10000.0, tie_embeddings=False, qk_norm=False, init_std=0.02, model_type='transformer', sdpa_backend='auto', num_experts=0, moe_top_k=2, moe_frequency=1, moe_router='softmax_topk', moe_shared_experts=0, moe_aux_loss_weight=0.01, moe_capacity_factor=0.0, moe_sequence_aux_loss_weight=0.0, moe_gradient_scale=False, moe_bias_schedule='constant', moe_packed_experts=False)¶
- Parameters:
dim (int)
n_layers (int)
n_heads (int)
n_kv_heads (int | None)
vocab_size (int)
ffn_dim_multiplier (float)
ffn_hidden_dim (int | None)
norm_type (NormType)
norm_eps (float)
activation (Activation)
max_seq_len (int)
rope_theta (float)
tie_embeddings (bool)
qk_norm (bool)
init_std (float)
model_type (str)
sdpa_backend (str)
num_experts (int)
moe_top_k (int)
moe_frequency (int)
moe_router (str)
moe_shared_experts (int)
moe_aux_loss_weight (float)
moe_capacity_factor (float)
moe_sequence_aux_loss_weight (float)
moe_gradient_scale (bool)
moe_bias_schedule (str)
moe_packed_experts (bool)
- Return type:
None
- class kempnerforge.config.OptimizerConfig[source]¶
Bases:
objectOptimizer settings.
- __init__(name='adamw', lr=0.0003, weight_decay=0.1, betas=(0.9, 0.95), eps=1e-08, fused=True, muon_momentum=0.95, muon_ns_steps=5, muon_adam_lr=None, schedule_free_warmup_steps=0)¶
- class kempnerforge.config.SchedulerConfig[source]¶
Bases:
objectLearning rate schedule settings.
- name: SchedulerType = 'cosine'¶
- __init__(name=SchedulerType.cosine, warmup_steps=2000, decay_steps=None, min_lr_ratio=0.1, stable_steps=None, wsd_decay_type='cosine', rex_alpha=1.0)¶
- class kempnerforge.config.TrainConfig[source]¶
Bases:
objectTraining hyperparameters.
- activation_checkpointing: ActivationCheckpointing = 'none'¶
- property param_dtype: torch.dtype¶
Resolve mixed_precision to the master weight dtype.
FP8 uses bf16 master weights – FP8 is a compute mode, not a storage dtype.
- __init__(batch_size=8, seq_len=2048, max_steps=100000, grad_accum_steps=1, grad_clip_norm=1.0, seed=42, compile_model=True, mixed_precision='bf16', activation_checkpointing=ActivationCheckpointing.none, loss_fn='cross_entropy', z_loss_weight=0.0, ce_chunk_size=0, shutdown_timeout_sec=600.0, nccl_health_check_interval=0)¶
- Parameters:
batch_size (int)
seq_len (int)
max_steps (int)
grad_accum_steps (int)
grad_clip_norm (float)
seed (int)
compile_model (bool)
mixed_precision (Literal['bf16', 'fp16', 'fp32', 'fp8'])
activation_checkpointing (ActivationCheckpointing)
loss_fn (str)
z_loss_weight (float)
ce_chunk_size (int)
shutdown_timeout_sec (float)
nccl_health_check_interval (int)
- Return type:
None
- kempnerforge.config.load_config(config_path=None, cli_args=None)[source]¶
Load a JobConfig from optional TOML file + CLI overrides.
The returned config has all sub-config __post_init__ validations applied. Cross-config validation (e.g., parallelism vs world_size) requires calling config.validate(world_size=…) separately at distributed setup time.
Modules
Checkpoint configuration. |
|
Data pipeline configuration. |
|
Distributed parallelism configuration. |
|
Evaluation configuration. |
|
Top-level job configuration aggregating all sub-configs. |
|
Config loading: TOML files → dataclass configs with CLI overrides. |
|
Metrics configuration. |
|
Model architecture configuration. |
|
Optimizer configuration. |
|
Profiling configuration. |
|
Central registry for named components. |
|
LR scheduler configuration. |
|
Backward-compatible re-exports. |
|
Training configuration. |