kempnerforge.model.adapter¶
Vision-to-LLM adapter modules.
The adapter projects image features (shape (B, num_tokens, feature_dim))
into the LLM embedding space (shape (B, num_tokens, model.dim)). It sits
between the vision encoder and the transformer in VLMWrapper.
Adapters register themselves under the adapter registry category. The
default is mlp_2layer (a 2-layer MLP, the canonical adapter shape across
LLaVA-family papers). linear is a single nn.Linear with no
activation, useful for ablations.
Functions
|
Dispatch to the registered adapter builder. |
Classes
Single |
|
2-layer MLP from image-feature dim to LLM embedding dim. |
- class kempnerforge.model.adapter.MLP2LayerAdapter[source]¶
Bases:
Module2-layer MLP from image-feature dim to LLM embedding dim.
Architecture:
Linear(in_dim, hidden) -> activation -> Linear(hidden, out_dim).hidden_dim=Nonedefaults toout_dim.reset_parametersis provided so callers that materialize adapters from meta can re-initialize weights with the standard Linear defaults.- reset_parameters()[source]¶
Re-run
nn.Lineardefault init on both projections.Used after
to_empty(device=...)on a meta-device build.- Return type:
None
- forward(x)[source]¶
- Parameters:
x (torch.Tensor)
- Return type:
- class kempnerforge.model.adapter.LinearAdapter[source]¶
Bases:
ModuleSingle
nn.Linearfrom image-feature dim to LLM embedding dim.No activation, no hidden layer. Useful as an ablation baseline against
MLP2LayerAdapter.- forward(x)[source]¶
- Parameters:
x (torch.Tensor)
- Return type: