kempnerforge.model.adapter

Vision-to-LLM adapter modules.

The adapter projects image features (shape (B, num_tokens, feature_dim)) into the LLM embedding space (shape (B, num_tokens, model.dim)). It sits between the vision encoder and the transformer in VLMWrapper.

Adapters register themselves under the adapter registry category. The default is mlp_2layer (a 2-layer MLP, the canonical adapter shape across LLaVA-family papers). linear is a single nn.Linear with no activation, useful for ablations.

Functions

build_adapter(adapter_config, in_dim, out_dim)

Dispatch to the registered adapter builder.

Classes

LinearAdapter

Single nn.Linear from image-feature dim to LLM embedding dim.

MLP2LayerAdapter

2-layer MLP from image-feature dim to LLM embedding dim.

class kempnerforge.model.adapter.MLP2LayerAdapter[source]

Bases: Module

2-layer MLP from image-feature dim to LLM embedding dim.

Architecture: Linear(in_dim, hidden) -> activation -> Linear(hidden, out_dim). hidden_dim=None defaults to out_dim.

reset_parameters is provided so callers that materialize adapters from meta can re-initialize weights with the standard Linear defaults.

__init__(in_dim, out_dim, hidden_dim=None, activation='gelu')[source]
Parameters:
  • in_dim (int)

  • out_dim (int)

  • hidden_dim (int | None)

  • activation (str)

Return type:

None

reset_parameters()[source]

Re-run nn.Linear default init on both projections.

Used after to_empty(device=...) on a meta-device build.

Return type:

None

forward(x)[source]
Parameters:

x (torch.Tensor)

Return type:

torch.Tensor

class kempnerforge.model.adapter.LinearAdapter[source]

Bases: Module

Single nn.Linear from image-feature dim to LLM embedding dim.

No activation, no hidden layer. Useful as an ablation baseline against MLP2LayerAdapter.

__init__(in_dim, out_dim)[source]
Parameters:
Return type:

None

reset_parameters()[source]
Return type:

None

forward(x)[source]
Parameters:

x (torch.Tensor)

Return type:

torch.Tensor

kempnerforge.model.adapter.build_adapter(adapter_config, in_dim, out_dim)[source]

Dispatch to the registered adapter builder.

Parameters:
  • adapter_configAdapterConfig (or compatible object exposing type and extra_kwargs()).

  • in_dim (int) – Source feature dim (the vision encoder’s feature_dim).

  • out_dim (int) – Target embedding dim (the transformer’s dim).

Returns:

An nn.Module with signature (B, N, in_dim) -> (B, N, out_dim).

Return type:

torch.nn.Module