kempnerforge.model¶

Model architectures for KempnerForge.

class kempnerforge.model.Transformer[source]¶

Bases: Module

Full transformer model built from ModelConfig.

Embedding → TransformerBlocks → Norm → Output Head

__init__(config, *, vlm_config=None, num_image_tokens=0)[source]¶

Parameters:

config (ModelConfig)
vlm_config (VLMConfig | None)
num_image_tokens (int)

Return type:

None

init_weights_and_freqs()[source]¶

Initialize weights and RoPE frequencies after meta-device materialization.

Called after model.to_empty(device=...) to fill in parameter values and compute RoPE frequency table. Safe to call on already-initialized models (skips if freqs are already computed).

Return type:: None

set_moe_step(step, max_steps)[source]¶

Set training step on all MoE routers for adaptive bias scheduling.

Parameters:

step (int)
max_steps (int)

Return type:

None

get_moe_aux_loss()[source]¶

Collect auxiliary losses from all MoE layers. Returns 0 if dense.

Return type:: torch.Tensor

get_expert_counts()[source]¶

Collect per-layer expert utilization. Returns {} if dense.

Return type:: dict[int, torch.Tensor]

forward(tokens=None, *, modality=None, kv_caches=None, doc_ids=None)[source]¶

Forward pass.

Exactly one of tokens or modality.inputs_embeds must be provided. Modality-injection routes (prefix_embeds, output_slice, image_features, image_mask, modality_ids) are grouped on the optional ModalityContext arg; see kempnerforge/model/modality.py for the full intra-context invariant table.

Parameters:

tokens (torch.Tensor | None) – Integer token ids, shape (batch, seq_len).
modality (ModalityContext | None) – Optional ModalityContext bundling pre-embedded inputs, prefix embeds, output slicing, image features, and modality routing tags for VLM arches. None is the pure text-only forward.
kv_caches (list[KVCache] | None) – Optional list of KVCache (one per layer) for generation. When provided, RoPE positions are offset by the current cache fill level. Cross-arg invariant: kv_caches forbids modality.prefix_embeds, modality.output_slice, modality.image_features, and modality.modality_ids (all training-only).
doc_ids (torch.Tensor | None) – Optional per-token document IDs for packed sequences, shape (batch, seq_len). Enables block-diagonal causal attention that isolates documents within packed sequences.

Returns:

Logits tensor of shape (batch, out_seq_len, vocab_size) where out_seq_len == seq_len normally or the sliced length when modality.output_slice is set.

Return type:

torch.Tensor

class kempnerforge.model.TransformerBlock[source]¶

Bases: Module

Single transformer block with pre-norm architecture.

Structure: norm → attention → residual, norm → mlp → residual

__init__(config, layer_idx)[source]¶

Parameters:

config (ModelConfig)
layer_idx (int)

Return type:

None

forward(x, rope_cos, rope_sin, *, kv_cache=None, doc_ids=None)[source]¶

Parameters:

x (torch.Tensor)
rope_cos (torch.Tensor)
rope_sin (torch.Tensor)
kv_cache (KVCache | None)
doc_ids (torch.Tensor | None)

Return type:

torch.Tensor

Modules

`adapter`	Vision-to-LLM adapter modules.
`attention`	Multi-head attention with Grouped-Query Attention (GQA) support.
`cross_attention`	Cross-attention block for VLM Cross-Attention architecture.
`embedding`	Token embedding and output head for KempnerForge models.
`generate`	Autoregressive text generation with KV-cache.
`hooks`	Activation extraction hooks for mechanistic interpretability.
`init`	Weight initialization strategies for KempnerForge models.
`mlp`	Feed-forward network implementations for KempnerForge models.
`modality`	Modality-injection container for `Transformer.forward`.
`moe`	Mixture-of-Experts feed-forward layer for KempnerForge models.
`mot`	Mixture-of-Transformers (MoT) operator and block.
`norm`	Normalization layers for KempnerForge models.
`position`	Rotary Position Embedding (RoPE) for KempnerForge models.
`router`	MoE router implementations for KempnerForge models.
`transformer`	Transformer model for KempnerForge.
`vision`	Vision encoders for VLM training.
`vlm`	Vision-language model wrapper.