kempnerforge.model.modality¶
Modality-injection container for Transformer.forward.
ModalityContext groups all “inputs that flow into the existing
residual stream” plus per-token routing tags consumed by the residual
stream, so Transformer.forward stays narrow regardless of how
many architectures are active. Each VLM arch fills the fields it
needs:
Joint-Decoder fills
prefix_embeds + output_slice(image tokens prepended to the text sequence;output_slicetrims them off the hidden state before the LM head).Cross-Attention fills
image_features + image_mask(image K/V flowing into separate cross-attention blocks; the residual stream itself carries text only).Mixture-of-Transformers fills
prefix_embeds + output_slice + modality_ids. The residual stream carries (image, text) concatenated;modality_idstags every position with its modality so each layer’s MoTBlock can route per-modality projections + global self-attention.Pipeline-parallel middle stages fill
inputs_embeds(pre-embedded hidden state passed across stage boundaries).
Cross-arg invariants involving kv_caches (a Transformer.forward
argument, not a ModalityContext field) are enforced at the top of
Transformer.forward, not in __post_init__.
Classes
Modality-injection container. |
- class kempnerforge.model.modality.ModalityContext[source]¶
Bases:
objectModality-injection container.
Invariants enforced in
__post_init__:At most one of
inputs_embeds,prefix_embeds,image_featuresmay be set; they are mutually exclusive composition routes into the residual stream.image_maskrequiresimage_featuresto be set (a free-standingimage_maskwith no features is a programming error).modality_idsrequiresprefix_embedsorinputs_embedsto be set (routing without a residual extension is meaningless).
output_slicecomposes with thetokenspath AND with theinputs_embedspath; it is not constrained intra-context. The cross-arg constraint (output_slicevskv_caches) lives onTransformer.forwardinstead.- inputs_embeds: torch.Tensor | None = None¶
- prefix_embeds: torch.Tensor | None = None¶
- image_features: torch.Tensor | None = None¶
- image_mask: torch.Tensor | None = None¶
- modality_ids: torch.Tensor | None = None¶
- __init__(inputs_embeds=None, prefix_embeds=None, output_slice=None, image_features=None, image_mask=None, modality_ids=None)¶
- Parameters:
inputs_embeds (torch.Tensor | None)
prefix_embeds (torch.Tensor | None)
output_slice (slice | None)
image_features (torch.Tensor | None)
image_mask (torch.Tensor | None)
modality_ids (torch.Tensor | None)
- Return type:
None