Quickstart¶

Tip

Install TorchX, write a simple app, and launch it locally and remotely – including distributed jobs. Estimated time: 10–15 minutes.

Installation¶

Install TorchX (provides the torchx CLI and the Runner Python API):

$ pip install "torchx[dev]"

Verify the installation:

$ torchx --help

Hello World¶

Create a simple my_app.py:

import sys

print(f"Hello, {sys.argv[1]}!")

Launching¶

Launch the app with torchx run. The scheduler is the backend that runs the job – local_cwd runs it in your current directory. You’ll use the utils.python component (a reusable job template):

$ torchx run --scheduler local_cwd utils.python --help

The component takes a script name; extra arguments are passed through to the script.

$ torchx run --scheduler local_cwd utils.python --script my_app.py "your name"

Using the Python API¶

The same operations are available via get_runner():

from torchx.runner import get_runner

with get_runner() as runner:
    app_handle = runner.run_component(
        "utils.python",
        ["--script", "my_app.py", "your name"],
        scheduler="local_cwd",
    )
    # Wait for the job to complete and print its final status
    final_status = runner.wait(app_handle, wait_interval=1)
    print(final_status)

You can also construct an AppDef directly and pass it to run():

import torchx.specs as specs
from torchx.runner import get_runner

app = specs.AppDef(
    name="hello",
    roles=[
        specs.Role(
            name="worker",
            entrypoint="python",
            # "image" is the base runtime environment. For local schedulers
            # it's a filesystem path; for container schedulers it's a Docker
            # image name (e.g. "my_image:latest").
            image="/tmp",
            args=["my_app.py", "your name"],
        )
    ],
)

with get_runner() as runner:
    app_handle = runner.run(app, scheduler="local_cwd")

The local_docker scheduler packages your local workspace as a layer on top of the specified image – a close approximation of remote container environments.

Note

This requires Docker installed and won’t work in environments such as Google Colab. See the Docker install instructions: https://docs.docker.com/get-docker/

$ torchx run --scheduler local_docker utils.python --script my_app.py "your name"

TorchX defaults to using the ghcr.io/pytorch/torchx Docker container image which contains the PyTorch libraries, TorchX and related dependencies.

Distributed¶

The dist.ddp component (DDP = Distributed Data Parallel) uses TorchElastic to manage workers, enabling multi-node jobs on all supported schedulers.

$ torchx run --scheduler local_docker dist.ddp --help

Create dist_app.py:

import torch
import torch.distributed as dist

dist.init_process_group(backend="gloo")
print(f"I am worker {dist.get_rank()} of {dist.get_world_size()}!")

a = torch.tensor([dist.get_rank()])
dist.all_reduce(a)
print(f"all_reduce output = {a}")

Launch with 2 nodes and 2 workers per node (-j 2x2 = <nodes>x<workers_per_node>):

$ torchx run --scheduler local_docker dist.ddp -j 2x2 --script dist_app.py

Workspaces / Patching¶

TorchX uses workspaces to automatically overlay your local code onto the job’s base image, so you don’t need to rebuild and push a Docker image after every code change. See torchx.workspace for details.

`.torchxconfig`¶

Configure scheduler defaults in a .torchxconfig file instead of passing -cfg flags every time:

[kubernetes]
queue=torchx
image_repo=<your docker image repository>

[slurm]
partition=torchx

Remote Schedulers¶

The same torchx run command works on remote schedulers – only the --scheduler flag changes.

$ torchx run --scheduler slurm dist.ddp -j 2x2 --script dist_app.py
$ torchx run --scheduler kubernetes dist.ddp -j 2x2 --script dist_app.py
$ torchx run --scheduler aws_batch dist.ddp -j 2x2 --script dist_app.py

List all scheduler-specific options:

$ torchx runopts

Custom Images¶

Docker-based Schedulers¶

Provide a custom Dockerfile to add libraries beyond the standard PyTorch set.

Create timm_app.py:

import timm

print(timm.models.resnet18())

Create Dockerfile.torchx:

FROM pytorch/pytorch:2.6.0-cuda12.6-cudnn9-runtime

RUN pip install timm

COPY . .

TorchX uses this Dockerfile automatically:

$ torchx run --scheduler local_docker utils.python --script timm_app.py

Slurm¶

The slurm and local_cwd schedulers use the current environment, so pip and conda work as usual.

Next Steps¶

Explore the API Quick Reference for copy-pasteable recipes
Explore the torchx CLI and the Runner Python API
Review supported schedulers
Browse builtin components

Quickstart¶

Installation¶

Hello World¶

Launching¶

Using the Python API¶

Distributed¶

Workspaces / Patching¶

`.torchxconfig`¶

Remote Schedulers¶

Custom Images¶

Docker-based Schedulers¶

Slurm¶

Next Steps¶

Docs

Tutorials

Resources

Quickstart¶

Installation¶

Hello World¶

Launching¶

Using the Python API¶

Distributed¶

Workspaces / Patching¶

.torchxconfig¶

Remote Schedulers¶

Custom Images¶

Docker-based Schedulers¶

Slurm¶

Next Steps¶

Docs

Tutorials

Resources

`.torchxconfig`¶