Docker¶
- class torchx.schedulers.docker_scheduler.DockerScheduler(session_name: str)[source]¶
Bases:
DockerWorkspaceMixin,Scheduler[Opts]DockerScheduler is a TorchX scheduling interface to Docker.
This is exposed via the scheduler local_docker.
This scheduler runs the provided app via the local docker runtime using the specified images in the AppDef. Docker must be installed and running. This provides the closest environment to schedulers that natively use Docker such as Kubernetes.
Note
docker doesn’t provide gang scheduling mechanisms. If one replica in a job fails, only that replica will be restarted.
Config Options
usage: [copy_env=COPY_ENV],[env=ENV],[privileged=PRIVILEGED],[image_repo=IMAGE_REPO],[quiet=QUIET] optional arguments: copy_env=COPY_ENV (list, None) List of glob patterns of environment variables to copy if not set in AppDef. Ex: FOO_* env=ENV (dict, None) Environment variables to be passed to the run. The separator sign can be either comma or semicolon (e.g. ENV1:v1,ENV2:v2,ENV3:v3 or ENV1:V1;ENV2:V2). Environment variables from env will be applied on top of the ones from copy_env. privileged=PRIVILEGED (bool, False) If true runs the container with elevated permissions. Equivalent to running with `docker run --privileged`. image_repo=IMAGE_REPO (str, None) (remote jobs) the image repository to use when pushing patched images, must have push access. Ex: example.com/your/container quiet=QUIET (bool, False) whether to suppress verbose output for image building. Defaults to ``False``.Mounts
This class supports bind mounting directories and named volumes.
bind mount:
type=bind,src=<host path>,dst=<container path>[,readonly]named volume:
type=volume,src=<name>,dst=<container path>[,readonly]devices:
type=device,src=<name>[,dst=<container path>][,permissions=rwm]
See
torchx.specs.parse_mounts()for more info.Feature
Scheduler Support
Fetch Logs
✔️
Distributed Jobs
✔️
Cancel Job
✔️
Describe Job
Partial support. DockerScheduler will return job and replica status but does not provide the complete original AppSpec.
Workspaces / Patching
✔️
Mounts
✔️
Elasticity
❌
- describe(app_id: str) torchx.schedulers.api.DescribeAppResponse | None[source]¶
Returns app description, or
Noneif it no longer exists.
- list(cfg: Optional[Mapping[str, str | int | float | bool | list[str] | dict[str, str] | None]] = None) list[torchx.schedulers.api.ListAppResponse][source]¶
Lists jobs on this scheduler.
- log_iter(app_id: str, role_name: str, k: int = 0, regex: str | None = None, since: datetime.datetime | None = None, until: datetime.datetime | None = None, should_tail: bool = False, streams: torchx.schedulers.api.Stream | None = None) Iterable[str][source]¶
Returns an iterator over log lines for the
k-th replica ofrole_name.Important
Not all schedulers support log iteration, tailing, or time-based cursors. Check the specific scheduler docs.
Lines include trailing whitespace (
\n). Whenshould_tail=True, the iterator blocks until the app reaches a terminal state.- Parameters:
k – replica (node) index
regex – optional filter pattern
since – start cursor (scheduler-dependent)
until – end cursor (scheduler-dependent)
should_tail – if
True, follow output liketail -fstreams –
stdout,stderr, orcombined
- Raises:
NotImplementedError – if the scheduler does not support log iteration
- schedule(dryrun_info: AppDryRunInfo[DockerJob]) str[source]¶
Submits a previously dry-run request. Returns the app_id.
- class torchx.schedulers.docker_scheduler.DockerJob(app_id: str, containers: list[torchx.schedulers.docker_scheduler.DockerContainer])[source]¶
Reference¶
- torchx.schedulers.docker_scheduler.create_scheduler(session_name: str, **kwargs: Any) DockerScheduler[source]¶