kempnerforge.config.vision¶
Vision-encoder configuration.
VisionEncoderConfig selects and parameterizes the vision encoder
that the VLMWrapper composes alongside the text backbone and
adapter. It is a top-level section in TOML ([vision_encoder]),
sibling to [model], [adapter], and [vlm].
Field summary:
typeselects the encoder by registry key (seeregistry.register_vision_encoder). Defaults to"random"for tests; production configs set"siglip2"/"clip"etc.pathis the HF Hub id or local path passed to the encoder builder. Empty string is accepted for stub encoders ("random").feature_dimis the output feature dim of the encoder.0means “infer from the encoder at build time”.num_tokensis the number of image tokens the encoder produces per image.0means “infer at build time”. When> 0it is cross- checked againstmodel.max_seq_lenat config time insideJobConfig.__post_init__.
Classes
Configuration for the vision encoder component of a VLM. |