Shortcuts

Docker

class torchx.schedulers.docker_scheduler.DockerScheduler(session_name: str)[source]

Bases: DockerWorkspaceMixin, Scheduler[Opts]

DockerScheduler is a TorchX scheduling interface to Docker.

This is exposed via the scheduler local_docker.

This scheduler runs the provided app via the local docker runtime using the specified images in the AppDef. Docker must be installed and running. This provides the closest environment to schedulers that natively use Docker such as Kubernetes.

Note

docker doesn’t provide gang scheduling mechanisms. If one replica in a job fails, only that replica will be restarted.

Config Options

    usage:
        [copy_env=COPY_ENV],[env=ENV],[privileged=PRIVILEGED],[image_repo=IMAGE_REPO],[quiet=QUIET]

    optional arguments:
        copy_env=COPY_ENV (list, None)
            List of glob patterns of environment variables to copy if not set in AppDef. Ex: FOO_*
        env=ENV (dict, None)
            Environment variables to be passed to the run. The separator sign can be either comma or semicolon (e.g. ENV1:v1,ENV2:v2,ENV3:v3 or ENV1:V1;ENV2:V2). Environment variables from env will be applied on top of the ones from copy_env.
        privileged=PRIVILEGED (bool, False)
            If true runs the container with elevated permissions. Equivalent to running with `docker run --privileged`.
        image_repo=IMAGE_REPO (str, None)
            (remote jobs) the image repository to use when pushing patched images, must have push access. Ex: example.com/your/container
        quiet=QUIET (bool, False)
            whether to suppress verbose output for image building. Defaults to ``False``.

Mounts

This class supports bind mounting directories and named volumes.

  • bind mount: type=bind,src=<host path>,dst=<container path>[,readonly]

  • named volume: type=volume,src=<name>,dst=<container path>[,readonly]

  • devices: type=device,src=<name>[,dst=<container path>][,permissions=rwm]

See torchx.specs.parse_mounts() for more info.

Feature

Scheduler Support

Fetch Logs

✔️

Distributed Jobs

✔️

Cancel Job

✔️

Describe Job

Partial support. DockerScheduler will return job and replica status but does not provide the complete original AppSpec.

Workspaces / Patching

✔️

Mounts

✔️

Elasticity

describe(app_id: str) torchx.schedulers.api.DescribeAppResponse | None[source]

Returns app description, or None if it no longer exists.

list(cfg: Optional[Mapping[str, str | int | float | bool | list[str] | dict[str, str] | None]] = None) list[torchx.schedulers.api.ListAppResponse][source]

Lists jobs on this scheduler.

log_iter(app_id: str, role_name: str, k: int = 0, regex: str | None = None, since: datetime.datetime | None = None, until: datetime.datetime | None = None, should_tail: bool = False, streams: torchx.schedulers.api.Stream | None = None) Iterable[str][source]

Returns an iterator over log lines for the k-th replica of role_name.

Important

Not all schedulers support log iteration, tailing, or time-based cursors. Check the specific scheduler docs.

Lines include trailing whitespace (\n). When should_tail=True, the iterator blocks until the app reaches a terminal state.

Parameters:
  • k – replica (node) index

  • regex – optional filter pattern

  • since – start cursor (scheduler-dependent)

  • until – end cursor (scheduler-dependent)

  • should_tail – if True, follow output like tail -f

  • streamsstdout, stderr, or combined

Raises:

NotImplementedError – if the scheduler does not support log iteration

schedule(dryrun_info: AppDryRunInfo[DockerJob]) str[source]

Submits a previously dry-run request. Returns the app_id.

class torchx.schedulers.docker_scheduler.DockerJob(app_id: str, containers: list[torchx.schedulers.docker_scheduler.DockerContainer])[source]

Reference

torchx.schedulers.docker_scheduler.create_scheduler(session_name: str, **kwargs: Any) DockerScheduler[source]
class torchx.schedulers.docker_scheduler.DockerContainer(image: str, command: list[str], kwargs: dict[str, object])[source]
torchx.schedulers.docker_scheduler.has_docker() bool[source]

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources