kempnerforge.config.distributed

Distributed parallelism configuration.

Classes

DistributedConfig

Parallelism dimensions and distributed settings.

PipelineSchedule

class kempnerforge.config.distributed.PipelineSchedule[source]

Bases: StrEnum

schedule_1f1b = '1f1b'
gpipe = 'gpipe'
interleaved_1f1b = 'interleaved_1f1b'
__new__(value)
class kempnerforge.config.distributed.DistributedConfig[source]

Bases: object

Parallelism dimensions and distributed settings.

dp_shard: int = -1
dp_replicate: int = 1
tp: int = 1
pp: int = 1
pp_schedule: PipelineSchedule = '1f1b'
cp: int = 1
ep: int = 1
nccl_timeout_sec: int = 1800
backend: str = 'cpu:gloo,cuda:nccl'
validate_world_size(world_size)[source]

Validate that parallelism dimensions match world size.

Parameters:

world_size (int)

Return type:

None

resolve(world_size)[source]

Return a copy with dp_shard resolved to a concrete value.

Parameters:

world_size (int)

Return type:

DistributedConfig

__init__(dp_shard=-1, dp_replicate=1, tp=1, pp=1, pp_schedule=PipelineSchedule.schedule_1f1b, cp=1, ep=1, nccl_timeout_sec=1800, backend='cpu:gloo,cuda:nccl')
Parameters:
Return type:

None