kempnerforge.model.mlp

Feed-forward network implementations for KempnerForge models.

Functions

build_mlp(dim, hidden_dim[, activation])

Build an MLP by activation name.

Classes

StandardMLP

Standard two-layer MLP with configurable activation.

SwiGLUMLP

SwiGLU feed-forward network (Llama-style).

class kempnerforge.model.mlp.SwiGLUMLP[source]

Bases: Module

SwiGLU feed-forward network (Llama-style).

Architecture: gate_proj + up_proj → SiLU(gate) * up → down_proj Uses 3 weight matrices instead of 2, with SiLU gating.

__init__(dim, hidden_dim)[source]
Parameters:
Return type:

None

forward(x)[source]
Parameters:

x (torch.Tensor)

Return type:

torch.Tensor

class kempnerforge.model.mlp.StandardMLP[source]

Bases: Module

Standard two-layer MLP with configurable activation.

Architecture: linear → activation → linear

__init__(dim, hidden_dim, activation='gelu')[source]
Parameters:
  • dim (int)

  • hidden_dim (int)

  • activation (str)

Return type:

None

forward(x)[source]
Parameters:

x (torch.Tensor)

Return type:

torch.Tensor

kempnerforge.model.mlp.build_mlp(dim, hidden_dim, activation='silu')[source]

Build an MLP by activation name.

SiLU activation uses SwiGLU (3 matrices); others use standard MLP (2 matrices).

Parameters:
  • dim (int)

  • hidden_dim (int)

  • activation (str)

Return type:

torch.nn.Module