Shortcuts

torchx.workspace

Tip

Workspaces handle automatic image patching – copying local code changes into the job’s runtime environment so users don’t need manual image rebuilds. Scheduler authors add workspace support by mixing in a WorkspaceMixin subclass alongside their Scheduler.

Why workspaces exist. Without workspaces, every code change requires rebuilding and pushing a Docker image before submitting a remote job. Workspaces automate this: TorchX overlays local changes onto the base image and submits the patched image in one step.

How It Works

User's workspace directory            Scheduler submission
┌──────────────────────┐
│  src/                │
│  train.py            │     build_workspaces()
│  Dockerfile.torchx   │ ──────────────────────────┐
│  .torchxignore       │                           │
└──────────────────────┘                           ▼
                               ┌──────────────────────────────────┐
                               │  For each Role with a workspace: │
                               │                                  │
                               │  1. Walk workspace               │
                               │     (respecting .torchxignore)   │
                               │                                  │
                               │  2. Build patched artifact       │
                               │     ┌────────────────────────┐   │
                               │     │ DockerWorkspaceMixin:  │   │
                               │     │   docker build + push  │   │
                               │     │ DirWorkspaceMixin:     │   │
                               │     │   copy to shared dir   │   │
                               │     └────────────────────────┘   │
                               │                                  │
                               │  3. Mutate role.image in-place   │
                               │     to reference patched image   │
                               └──────────────┬───────────────────┘
                                              │
                                              ▼
                               ┌──────────────────────────────────┐
                               │  Scheduler._submit_dryrun(app)   │
                               │  uses the patched role.image     │
                               └──────────────────────────────────┘

When workspace= is passed to run() (or --workspace on the CLI), TorchX patches the image before submission:

  1. build_workspaces() iterates over each role’s workspace.

  2. For each role with a workspace, it calls caching_build_workspace_and_update_role(), which builds the workspace and mutates role.image in-place to reference the patched artifact.

  3. For remote schedulers, dryrun_push_images() and push_images() handle pushing the built image to a remote registry.

Note

DockerWorkspaceMixin uses Dockerfile.torchx from the workspace root (if present) instead of the default Dockerfile.

Built-in Mixins

Mixin

Strategy

DockerWorkspaceMixin

Builds a Docker image from a Dockerfile.torchx in the workspace, tags it with a content hash, and pushes to the configured image_repo. Used by kubernetes, aws_batch, local_docker.

DirWorkspaceMixin

Copies workspace files into a shared job directory on the filesystem. Used by slurm.

Implementing a Custom WorkspaceMixin

Subclass WorkspaceMixin and implement caching_build_workspace_and_update_role():

from typing import Any, Mapping

from torchx.specs import CfgVal, Role, runopts
from torchx.workspace import WorkspaceMixin


class MyWorkspaceMixin(WorkspaceMixin[None]):
    """Patches images by uploading workspace to a custom artifact store."""

    def workspace_opts(self) -> runopts:
        opts = runopts()
        opts.add("artifact_bucket", type_=str, required=True, help="S3 bucket for workspace artifacts")
        return opts

    def caching_build_workspace_and_update_role(
        self,
        role: Role,
        cfg: Mapping[str, CfgVal],
        build_cache: dict[object, object],
    ) -> None:
        workspace = role.workspace
        if not workspace:
            return

        bucket = cfg.get("artifact_bucket")
        # ... upload workspace files to bucket ...
        # ... update role.image or role.env to reference the artifact ...
        role.env["WORKSPACE_ARTIFACT"] = f"s3://{bucket}/{role.name}/workspace.tar.gz"

Then mix it into your scheduler:

from torchx.schedulers.api import Scheduler

class MyScheduler(MyWorkspaceMixin, Scheduler[Mapping[str, CfgVal]]):
    def __init__(self, session_name: str, **kwargs: object) -> None:
        super().__init__("my_backend", session_name)
    # ... scheduler methods ...

The generic parameter T is the type returned by dryrun_push_images and consumed by push_images. Use None if no separate push step is needed.

Note

The build_cache dict is shared across all roles in a single build_workspaces call. Use it to skip redundant builds when roles share the same image and workspace.

Testing Your Workspace Mixin

Create a Role with a workspace and assert the role was mutated correctly:

import unittest
from torchx.specs import Role, Resource, Workspace

class MyWorkspaceMixinTest(unittest.TestCase):
    def test_build_updates_role(self) -> None:
        mixin = MyWorkspaceMixin()
        role = Role(
            name="worker", image="base:latest", entrypoint="echo",
            resource=Resource(cpu=1, gpu=0, memMB=512),
            workspace=Workspace.from_str("/tmp/my_workspace"),
        )
        mixin.caching_build_workspace_and_update_role(
            role, cfg={"artifact_bucket": "my-bucket"}, build_cache={},
        )
        self.assertIn("WORKSPACE_ARTIFACT", role.env)

See torchx/workspace/test/ for the built-in mixin tests.

Common Pitfalls

  • Implementing the deprecated method: Override caching_build_workspace_and_update_role (not the older build_workspace_and_update_role).

  • MRO: List the mixin first in the base class list: class MyScheduler(MyWorkspaceMixin, Scheduler[...]):. Python’s MRO requires cooperative super().__init__().

  • Forgetting to check ``role.workspace``: Guard with if not role.workspace: return – the method is called for every role.

.torchxignore

Place a .torchxignore file (same syntax as .dockerignore) at the workspace root to exclude files from the job image:

# Exclude version control and IDE files
.git
.vscode
__pycache__

# Exclude data directories
data/
*.csv

# But include a specific config file
!data/config.yaml

Lines starting with ! negate a previous pattern (include the file even if a prior rule excluded it). Blank lines and lines starting with # are ignored.

API Reference

class torchx.workspace.WorkspaceMixin(*args: object, **kwargs: object)[source]

Scheduler mix-in that auto-builds a local workspace into a deployable image or patch.

Warning

Prototype – this interface may change without notice.

Attach to a Scheduler so that local code changes in the workspace are automatically reflected at runtime (via a rebuilt image or an overlaid diff patch) without a manual image rebuild.

build_workspace_and_update_role(role: Role, workspace: str, cfg: Mapping[str, str | int | float | bool | list[str] | dict[str, str] | None]) None[source]

Build workspace and mutate role to reference the resulting artifact.

Deprecated since version Implement: caching_build_workspace_and_update_role() instead.

build_workspaces(roles: list[torchx.specs.api.Role], cfg: Mapping[str, str | int | float | bool | list[str] | dict[str, str] | None]) None[source]

Builds workspaces for each role and updates role.image in-place.

Important

Mutates the passed roles. May also add env vars (e.g. WORKSPACE_DIR) to role.env.

caching_build_workspace_and_update_role(role: Role, cfg: Mapping[str, str | int | float | bool | list[str] | dict[str, str] | None], build_cache: dict[object, object]) None[source]

Like build_workspace_and_update_role() but with a per-call build_cache.

Subclasses should implement this method instead of build_workspace_and_update_role(). The cache avoids redundant builds when multiple roles share the same image and workspace.

Important

build_cache lifetime is scoped to a single build_workspaces() call. What gets cached is up to the implementation.

The default implementation delegates to the (deprecated) build_workspace_and_update_role(), merging multi-dir workspaces into a single tmpdir first.

dryrun_push_images(app: AppDef, cfg: Mapping[str, str | int | float | bool | list[str] | dict[str, str] | None]) T[source]

Dry-run the image push: updates app with final image names.

Only called for remote jobs. push_images() must be called with the return value before scheduling.

push_images(images_to_push: T) None[source]

Pushes images (returned by dryrun_push_images()) to the remote repo.

workspace_opts() runopts[source]

Returns the runopts accepted by this workspace.

torchx.workspace.walk_workspace(fs: AbstractFileSystem, path: str, ignore_name: str = '.torchxignore') Iterable[tuple[str, Iterable[str], Mapping[str, Mapping[str, object]]]][source]

Walks path on fs, filtering entries via .dockerignore-style rules read from ignore_name.

torchx.workspace.docker_workspace

class torchx.workspace.docker_workspace.DockerWorkspaceMixin(*args: object, docker_client: DockerClient | None = None, **kwargs: object)[source]

Bases: WorkspaceMixin[dict[str, tuple[str, str]]]

Builds patched Docker images from the workspace.

Requires a local Docker daemon. For remote jobs, authenticate via docker login and set the image_repo runopt.

If Dockerfile.torchx exists in the workspace it is used as the Dockerfile; otherwise a default COPY . . Dockerfile is generated. Extra --build-arg values available in Dockerfile.torchx:

  • IMAGE – the role’s base image

  • WORKSPACE – the workspace path

Use .dockerignore to exclude files from the build context.

build_workspace_and_update_role(role: Role, workspace: str, cfg: Mapping[str, str | int | float | bool | list[str] | dict[str, str] | None]) None[source]

Builds a Docker image from workspace on top of role.image and updates role.image with the resulting image id.

dryrun_push_images(app: AppDef, cfg: Mapping[str, str | int | float | bool | list[str] | dict[str, str] | None]) dict[str, tuple[str, str]][source]

Replaces local sha256:... images in app with remote paths and returns a {local_image: (repo, tag)} mapping for push_images().

push_images(images_to_push: dict[str, tuple[str, str]]) None[source]

Pushes local images to a remote repository.

Requires docker login authentication to the target repo.

workspace_opts() runopts[source]

Returns the runopts accepted by this workspace.

torchx.workspace.dir_workspace

class torchx.workspace.dir_workspace.DirWorkspaceMixin(*args: object, **kwargs: object)[source]

Bases: WorkspaceMixin[None]

build_workspace_and_update_role(role: Role, workspace: str, cfg: Mapping[str, str | int | float | bool | list[str] | dict[str, str] | None]) None[source]

Copies workspace into cfg["job_dir"] and sets role.image to it.

No-op if job_dir is not set. Files matching .torchxignore patterns are skipped.

See also

torchx.schedulers

Scheduler API reference and implementation guide (including workspace integration).

Advanced Usage

Registering custom schedulers and workspace mixins via entry points.

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources