kempnerforge.model.init¶
Weight initialization strategies for KempnerForge models.
Functions
|
Apply standard initialization to all parameters in a model. |
- kempnerforge.model.init.init_weights(model, config)[source]¶
Apply standard initialization to all parameters in a model.
Strategy (following GPT-2/Llama conventions):
Linear layers: normal(0, 0.02)
Embedding layers: normal(0, 0.02)
Residual output projections (o_proj, down_proj): scaled by 1/sqrt(2 * n_layers)
Cross-attention block residuals: zero-initialized (identity-at-init warm-start)
MoT per-modality residual projections: zero-initialized (identity-at-construction)
Norm layers: weight=1 (already default)
- Parameters:
model (torch.nn.Module)
config (ModelConfig)
- Return type:
None