kempnerforge.model.adapter¶

Vision-to-LLM adapter modules.

The adapter projects image features (shape (B, num_tokens, feature_dim)) into the LLM embedding space (shape (B, num_tokens, model.dim)). It sits between the vision encoder and the transformer in VLMWrapper.

Adapters register themselves under the adapter registry category. The default is mlp_2layer (a 2-layer MLP, the canonical adapter shape across LLaVA-family papers). linear is a single nn.Linear with no activation, useful for ablations.

Functions

build_adapter(adapter_config, in_dim, out_dim)

Dispatch to the registered adapter builder.

Classes

`LinearAdapter`	Single `nn.Linear` from image-feature dim to LLM embedding dim.
`MLP2LayerAdapter`	2-layer MLP from image-feature dim to LLM embedding dim.

class kempnerforge.model.adapter.MLP2LayerAdapter[source]¶

Bases: Module

2-layer MLP from image-feature dim to LLM embedding dim.

Architecture: Linear(in_dim, hidden) -> activation -> Linear(hidden, out_dim). hidden_dim=None defaults to out_dim.

reset_parameters is provided so callers that materialize adapters from meta can re-initialize weights with the standard Linear defaults.

__init__(in_dim, out_dim, hidden_dim=None, activation='gelu')[source]¶

Parameters:

in_dim (int)
out_dim (int)
hidden_dim (int | None)
activation (str)

Return type:

None

reset_parameters()[source]¶

Re-run nn.Linear default init on both projections.

Used after to_empty(device=...) on a meta-device build.

Return type:: None

forward(x)[source]¶

Parameters:: x (torch.Tensor)
Return type:: torch.Tensor

class kempnerforge.model.adapter.LinearAdapter[source]¶

Bases: Module

Single nn.Linear from image-feature dim to LLM embedding dim.

No activation, no hidden layer. Useful as an ablation baseline against MLP2LayerAdapter.

__init__(in_dim, out_dim)[source]¶

Parameters:

in_dim (int)
out_dim (int)

Return type:

None

reset_parameters()[source]¶

Return type:: None

forward(x)[source]¶

Parameters:: x (torch.Tensor)
Return type:: torch.Tensor

kempnerforge.model.adapter.build_adapter(adapter_config, in_dim, out_dim)[source]¶

Dispatch to the registered adapter builder.

Parameters:

adapter_config – AdapterConfig (or compatible object exposing type and extra_kwargs()).
in_dim (int) – Source feature dim (the vision encoder’s feature_dim).
out_dim (int) – Target embedding dim (the transformer’s dim).

Returns:

An nn.Module with signature (B, N, in_dim) -> (B, N, out_dim).

Return type:

torch.nn.Module