kempnerforge.config.video

Video input configuration.

VideoConfig is the [video] top-level section. When present, the job trains on a video dataset through the VLM wrapper: a clip is decoded into an ordered set of frames, each preprocessed like an image and fed to the vision encoder. The section is a sibling of [vision_encoder] / [adapter] / [vlm] and requires [vlm] to be set.

Frame-sampling defaults follow the Molmo2 paper (sample at fps per second, include the first and last frame, cap at max_frames). max_frames is the per-clip frame budget; the number of visual tokens it implies (max_frames * tokens_per_frame) feeds the residual-stream / sequence-length math once the model consumes video.

Classes

VideoConfig

Video dataset location and frame-sampling knobs.

class kempnerforge.config.video.VideoConfig[source]

Bases: object

Video dataset location and frame-sampling knobs.

Fields:

data_root: Root directory of the on-disk video dataset. dataset_type: Registry key for the dataset builder ("webvid" default). dataset_name: On-disk corpus name within a style (e.g. "webvid-10M"). sampling_policy: Registry key for the frame-sampling policy ("uniform"). split: Which split to read ("train" or "validation"). max_samples: Cap the manifest to this many examples (0 = all). max_frames: Maximum frames sampled per clip (the per-clip budget). min_frames: Minimum frames sampled per clip; short clips pad up to this. fps: Target sampling rate in frames per second (Molmo2 uses 2). frame_size: Square pixel size each frame is resized to. prompt: Optional instruction prepended to the target text, masked from loss.

data_root: str = ''
dataset_type: str = 'webvid'
dataset_name: str = 'webvid-10M'
sampling_policy: str = 'uniform'
split: str = 'train'
max_samples: int = 0
max_frames: int = 16
min_frames: int = 4
fps: float = 2.0
frame_size: int = 224
prompt: str = ''
__init__(data_root='', dataset_type='webvid', dataset_name='webvid-10M', sampling_policy='uniform', split='train', max_samples=0, max_frames=16, min_frames=4, fps=2.0, frame_size=224, prompt='')
Parameters:
  • data_root (str)

  • dataset_type (str)

  • dataset_name (str)

  • sampling_policy (str)

  • split (str)

  • max_samples (int)

  • max_frames (int)

  • min_frames (int)

  • fps (float)

  • frame_size (int)

  • prompt (str)

Return type:

None