kempnerforge.model.hooks¶
Activation extraction hooks for mechanistic interpretability.
Provides tools for capturing intermediate activations, attention patterns, and hidden states during inference — essential for probing, CKA analysis, SVCCA, and other interpretability research.
Usage:
store = ActivationStore(model, layers=["layers.0.attention", "layers.5.mlp"])
store.enable()
model(input_ids)
act = store.get("layers.0.attention") # (batch, seq, dim) on CPU
store.disable()
Functions
|
Run model over dataset and collect activations from specified layers. |
|
Save activations to a |
Classes
Register forward hooks on named modules to capture activations. |
- class kempnerforge.model.hooks.ActivationStore[source]¶
Bases:
objectRegister forward hooks on named modules to capture activations.
Captured tensors are moved to CPU to avoid GPU memory pressure. Use
enable()/disable()to control when hooks are active.- Parameters:
model – The model to instrument.
layers – List of module names (dot-separated FQNs) to capture. Example:
["layers.0.attention", "layers.5.mlp", "norm"]
- __init__(model, layers=None)[source]¶
- Parameters:
model (torch.nn.Module)
- Return type:
None
- property activations: dict[str, torch.Tensor]¶
Return a copy of captured activations.
- get(name)[source]¶
Get captured activation for a layer, or None if not captured.
- Parameters:
name (str)
- Return type:
torch.Tensor | None
- kempnerforge.model.hooks.extract_representations(model, dataset, layers, device, batch_size=32, max_samples=None)[source]¶
Run model over dataset and collect activations from specified layers.
Returns a dict mapping layer names to tensors of shape
(num_samples, seq_len, hidden_dim)(or whatever the layer outputs).- Parameters:
model (torch.nn.Module) – Model to extract from (should already be on
device).dataset (torch.utils.data.Dataset) – Map-style dataset returning dicts with
"input_ids".layers (list[str]) – Module FQNs to capture (e.g.
["layers.0.attention"]).device (torch.device) – Device to run inference on.
batch_size (int) – Batch size for extraction.
max_samples (int | None) – Stop after this many samples (None = full dataset).
- Returns:
Dict of
{layer_name: Tensor}with activations on CPU.- Return type:
- kempnerforge.model.hooks.save_activations(activations, path)[source]¶
Save activations to a
.npzfile.- Parameters:
activations (dict[str, torch.Tensor]) – Dict mapping layer names to tensors (from
ActivationStoreorextract_representations()).path (str | Path) – Output file path.
.npzextension added if missing.
- Return type:
None