kempnerforge.model.position

Rotary Position Embedding (RoPE) for KempnerForge models.

Uses real-valued sin/cos rotation (not complex arithmetic) for compatibility with DTensor and SequenceParallel.

Functions

apply_rope(x, cos, sin)

Apply rotary position embeddings using real-valued rotation.

precompute_rope_frequencies(head_dim, ...[, ...])

Precompute cos/sin RoPE frequency tables.

kempnerforge.model.position.precompute_rope_frequencies(head_dim, max_seq_len, theta=10000.0, device=None)[source]

Precompute cos/sin RoPE frequency tables.

Parameters:
  • head_dim (int) – Dimension per attention head (must be even).

  • max_seq_len (int) – Maximum sequence length to precompute.

  • theta (float) – Base frequency (10000.0 for standard RoPE).

  • device (torch.device | None) – Device to place the tensor on.

Returns:

Tuple of (cos, sin) tensors, each shape (max_seq_len, head_dim // 2).

Return type:

tuple[torch.Tensor, torch.Tensor]

kempnerforge.model.position.apply_rope(x, cos, sin)[source]

Apply rotary position embeddings using real-valued rotation.

Parameters:
  • x (torch.Tensor) – Input tensor of shape (…, seq_len, head_dim).

  • cos (torch.Tensor) – Cosine frequencies, shape (seq_len, head_dim // 2).

  • sin (torch.Tensor) – Sine frequencies, shape (seq_len, head_dim // 2).

Returns:

Tensor with RoPE applied, same shape and dtype as input.

Return type:

torch.Tensor