kempnerforge.model.init

Weight initialization strategies for KempnerForge models.

Functions

init_weights(model, config)

Apply standard initialization to all parameters in a model.

kempnerforge.model.init.init_weights(model, config)[source]

Apply standard initialization to all parameters in a model.

Strategy (following GPT-2/Llama conventions):

  • Linear layers: normal(0, 0.02)

  • Embedding layers: normal(0, 0.02)

  • Residual output projections (o_proj, down_proj): scaled by 1/sqrt(2 * n_layers)

  • Cross-attention block residuals: zero-initialized (identity-at-init warm-start)

  • MoT per-modality residual projections: zero-initialized (identity-at-construction)

  • Norm layers: weight=1 (already default)

Parameters:
Return type:

None