torchx.workspace¶
Tip
Workspaces handle automatic image patching – copying local code changes into
the job’s runtime environment so users don’t need manual image rebuilds.
Scheduler authors add workspace support by mixing in a WorkspaceMixin
subclass alongside their Scheduler.
Why workspaces exist. Without workspaces, every code change requires rebuilding and pushing a Docker image before submitting a remote job. Workspaces automate this: TorchX overlays local changes onto the base image and submits the patched image in one step.
How It Works¶
User's workspace directory Scheduler submission
┌──────────────────────┐
│ src/ │
│ train.py │ build_workspaces()
│ Dockerfile.torchx │ ──────────────────────────┐
│ .torchxignore │ │
└──────────────────────┘ ▼
┌──────────────────────────────────┐
│ For each Role with a workspace: │
│ │
│ 1. Walk workspace │
│ (respecting .torchxignore) │
│ │
│ 2. Build patched artifact │
│ ┌────────────────────────┐ │
│ │ DockerWorkspaceMixin: │ │
│ │ docker build + push │ │
│ │ DirWorkspaceMixin: │ │
│ │ copy to shared dir │ │
│ └────────────────────────┘ │
│ │
│ 3. Mutate role.image in-place │
│ to reference patched image │
└──────────────┬───────────────────┘
│
▼
┌──────────────────────────────────┐
│ Scheduler._submit_dryrun(app) │
│ uses the patched role.image │
└──────────────────────────────────┘
When workspace= is passed to run() (or
--workspace on the CLI), TorchX patches the image before submission:
build_workspaces()iterates over each role’sworkspace.For each role with a workspace, it calls
caching_build_workspace_and_update_role(), which builds the workspace and mutatesrole.imagein-place to reference the patched artifact.For remote schedulers,
dryrun_push_images()andpush_images()handle pushing the built image to a remote registry.
Note
DockerWorkspaceMixin uses Dockerfile.torchx from the workspace root
(if present) instead of the default Dockerfile.
Built-in Mixins¶
Mixin |
Strategy |
|---|---|
Builds a Docker image from a |
|
Copies workspace files into a shared job directory on the filesystem.
Used by |
Implementing a Custom WorkspaceMixin¶
Subclass WorkspaceMixin and implement
caching_build_workspace_and_update_role():
from typing import Any, Mapping
from torchx.specs import CfgVal, Role, runopts
from torchx.workspace import WorkspaceMixin
class MyWorkspaceMixin(WorkspaceMixin[None]):
"""Patches images by uploading workspace to a custom artifact store."""
def workspace_opts(self) -> runopts:
opts = runopts()
opts.add("artifact_bucket", type_=str, required=True, help="S3 bucket for workspace artifacts")
return opts
def caching_build_workspace_and_update_role(
self,
role: Role,
cfg: Mapping[str, CfgVal],
build_cache: dict[object, object],
) -> None:
workspace = role.workspace
if not workspace:
return
bucket = cfg.get("artifact_bucket")
# ... upload workspace files to bucket ...
# ... update role.image or role.env to reference the artifact ...
role.env["WORKSPACE_ARTIFACT"] = f"s3://{bucket}/{role.name}/workspace.tar.gz"
Then mix it into your scheduler:
from torchx.schedulers.api import Scheduler
class MyScheduler(MyWorkspaceMixin, Scheduler[Mapping[str, CfgVal]]):
def __init__(self, session_name: str, **kwargs: object) -> None:
super().__init__("my_backend", session_name)
# ... scheduler methods ...
The generic parameter T is the type returned by dryrun_push_images and
consumed by push_images. Use None if no separate push step is needed.
Note
The build_cache dict is shared across all roles in a single
build_workspaces call. Use it to skip redundant builds when roles share
the same image and workspace.
Testing Your Workspace Mixin¶
Create a Role with a workspace and assert the role was mutated correctly:
import unittest
from torchx.specs import Role, Resource, Workspace
class MyWorkspaceMixinTest(unittest.TestCase):
def test_build_updates_role(self) -> None:
mixin = MyWorkspaceMixin()
role = Role(
name="worker", image="base:latest", entrypoint="echo",
resource=Resource(cpu=1, gpu=0, memMB=512),
workspace=Workspace.from_str("/tmp/my_workspace"),
)
mixin.caching_build_workspace_and_update_role(
role, cfg={"artifact_bucket": "my-bucket"}, build_cache={},
)
self.assertIn("WORKSPACE_ARTIFACT", role.env)
See torchx/workspace/test/ for the built-in mixin tests.
Common Pitfalls¶
Implementing the deprecated method: Override
caching_build_workspace_and_update_role(not the olderbuild_workspace_and_update_role).MRO: List the mixin first in the base class list:
class MyScheduler(MyWorkspaceMixin, Scheduler[...]):. Python’s MRO requires cooperativesuper().__init__().Forgetting to check ``role.workspace``: Guard with
if not role.workspace: return– the method is called for every role.
.torchxignore¶
Place a .torchxignore file (same syntax as .dockerignore) at the
workspace root to exclude files from the job image:
# Exclude version control and IDE files
.git
.vscode
__pycache__
# Exclude data directories
data/
*.csv
# But include a specific config file
!data/config.yaml
Lines starting with ! negate a previous pattern (include the file even if a
prior rule excluded it). Blank lines and lines starting with # are ignored.
API Reference¶
- class torchx.workspace.WorkspaceMixin(*args: object, **kwargs: object)[source]¶
Scheduler mix-in that auto-builds a local workspace into a deployable image or patch.
Warning
Prototype – this interface may change without notice.
Attach to a
Schedulerso that local code changes in the workspace are automatically reflected at runtime (via a rebuilt image or an overlaid diff patch) without a manual image rebuild.- build_workspace_and_update_role(role: Role, workspace: str, cfg: Mapping[str, str | int | float | bool | list[str] | dict[str, str] | None]) None[source]¶
Build workspace and mutate role to reference the resulting artifact.
Deprecated since version Implement:
caching_build_workspace_and_update_role()instead.
- build_workspaces(roles: list[torchx.specs.api.Role], cfg: Mapping[str, str | int | float | bool | list[str] | dict[str, str] | None]) None[source]¶
Builds workspaces for each role and updates
role.imagein-place.Important
Mutates the passed roles. May also add env vars (e.g.
WORKSPACE_DIR) torole.env.
- caching_build_workspace_and_update_role(role: Role, cfg: Mapping[str, str | int | float | bool | list[str] | dict[str, str] | None], build_cache: dict[object, object]) None[source]¶
Like
build_workspace_and_update_role()but with a per-call build_cache.Subclasses should implement this method instead of
build_workspace_and_update_role(). The cache avoids redundant builds when multiple roles share the same image and workspace.Important
build_cache lifetime is scoped to a single
build_workspaces()call. What gets cached is up to the implementation.The default implementation delegates to the (deprecated)
build_workspace_and_update_role(), merging multi-dir workspaces into a single tmpdir first.
- dryrun_push_images(app: AppDef, cfg: Mapping[str, str | int | float | bool | list[str] | dict[str, str] | None]) T[source]¶
Dry-run the image push: updates app with final image names.
Only called for remote jobs.
push_images()must be called with the return value before scheduling.
- push_images(images_to_push: T) None[source]¶
Pushes images (returned by
dryrun_push_images()) to the remote repo.
torchx.workspace.docker_workspace¶
- class torchx.workspace.docker_workspace.DockerWorkspaceMixin(*args: object, docker_client: DockerClient | None = None, **kwargs: object)[source]
Bases:
WorkspaceMixin[dict[str,tuple[str,str]]]Builds patched Docker images from the workspace.
Requires a local Docker daemon. For remote jobs, authenticate via
docker loginand set theimage_reporunopt.If
Dockerfile.torchxexists in the workspace it is used as the Dockerfile; otherwise a defaultCOPY . .Dockerfile is generated. Extra--build-argvalues available inDockerfile.torchx:IMAGE– the role’s base imageWORKSPACE– the workspace path
Use
.dockerignoreto exclude files from the build context.- build_workspace_and_update_role(role: Role, workspace: str, cfg: Mapping[str, str | int | float | bool | list[str] | dict[str, str] | None]) None[source]
Builds a Docker image from workspace on top of
role.imageand updatesrole.imagewith the resulting image id.
- dryrun_push_images(app: AppDef, cfg: Mapping[str, str | int | float | bool | list[str] | dict[str, str] | None]) dict[str, tuple[str, str]][source]
Replaces local
sha256:...images in app with remote paths and returns a{local_image: (repo, tag)}mapping forpush_images().
torchx.workspace.dir_workspace¶
- class torchx.workspace.dir_workspace.DirWorkspaceMixin(*args: object, **kwargs: object)[source]¶
Bases:
WorkspaceMixin[None]
See also
- torchx.schedulers
Scheduler API reference and implementation guide (including workspace integration).
- Advanced Usage
Registering custom schedulers and workspace mixins via entry points.