kempnerforge.model.embedding

Token embedding and output head for KempnerForge models.

Classes

OutputHead

Linear output projection from hidden dim to vocab size.

TokenEmbedding

Token embedding layer.

class kempnerforge.model.embedding.TokenEmbedding[source]

Bases: Module

Token embedding layer.

Can be disabled (returns input unchanged) for pipeline parallelism middle stages where the embedding lives on a different stage.

__init__(vocab_size, dim)[source]
Parameters:
Return type:

None

forward(tokens)[source]

Embed token ids to vectors.

Parameters:

tokens (torch.Tensor) – Integer tensor of shape (batch, seq_len).

Returns:

Tensor of shape (batch, seq_len, dim).

Return type:

torch.Tensor

class kempnerforge.model.embedding.OutputHead[source]

Bases: Module

Linear output projection from hidden dim to vocab size.

Produces logits (no softmax). Can optionally share weights with an embedding layer.

__init__(dim, vocab_size)[source]
Parameters:
Return type:

None

forward(x)[source]

Project hidden states to logits.

Parameters:

x (torch.Tensor) – Tensor of shape (batch, seq_len, dim).

Returns:

Logits tensor of shape (batch, seq_len, vocab_size).

Return type:

torch.Tensor

tie_weights(embedding)[source]

Share the output projection weight with the embedding layer.

Parameters:

embedding (TokenEmbedding)

Return type:

None