kempnerforge.config.video¶

Video input configuration.

VideoConfig is the [video] top-level section. When present, the job trains on a video dataset through the VLM wrapper: a clip is decoded into an ordered set of frames, each preprocessed like an image and fed to the vision encoder. The section is a sibling of [vision_encoder] / [adapter] / [vlm] and requires [vlm] to be set.

Frame-sampling defaults follow the Molmo2 paper (sample at fps per second, include the first and last frame, cap at max_frames). max_frames is the per-clip frame budget; the number of visual tokens it implies (max_frames * tokens_per_frame) feeds the residual-stream / sequence-length math once the model consumes video.

Classes

VideoConfig

Video dataset location and frame-sampling knobs.

class kempnerforge.config.video.VideoConfig[source]¶

Bases: object

Video dataset location and frame-sampling knobs.

Fields:: data_root: Root directory of the on-disk video dataset. dataset_type: Registry key for the dataset builder ("webvid" default). dataset_name: On-disk corpus name within a style (e.g. "webvid-10M"). sampling_policy: Registry key for the frame-sampling policy ("uniform"). split: Which split to read ("train" or "validation"). max_samples: Cap the manifest to this many examples (0 = all). max_frames: Maximum frames sampled per clip (the per-clip budget). min_frames: Minimum frames sampled per clip; short clips pad up to this. fps: Target sampling rate in frames per second (Molmo2 uses 2). frame_size: Square pixel size each frame is resized to. prompt: Optional instruction prepended to the target text, masked from loss.

data_root: str = ''¶

dataset_type: str = 'webvid'¶

dataset_name: str = 'webvid-10M'¶

sampling_policy: str = 'uniform'¶

split: str = 'train'¶

max_samples: int = 0¶

max_frames: int = 16¶

min_frames: int = 4¶

fps: float = 2.0¶

frame_size: int = 224¶

prompt: str = ''¶

__init__(data_root='', dataset_type='webvid', dataset_name='webvid-10M', sampling_policy='uniform', split='train', max_samples=0, max_frames=16, min_frames=4, fps=2.0, frame_size=224, prompt='')¶

Parameters:

data_root (str)
dataset_type (str)
dataset_name (str)
sampling_policy (str)
split (str)
max_samples (int)
max_frames (int)
min_frames (int)
fps (float)
frame_size (int)
prompt (str)

Return type:

None