Utils¶

This contains TorchX utility components that are ready-to-use out of the box. These are components that simply execute well known binaries (e.g. cp) and are meant to be used as tutorial materials or glue operations between meaningful stages in a workflow.

torchx.components.utils.echo(msg: str = 'hello world', image: str = 'ghcr.io/pytorch/torchx:0.8.0dev0', num_replicas: int = 1) → AppDef[source]¶

Echos a message to stdout (calls echo)

Parameters:

msg – message to echo
image – image to use
num_replicas – number of replicas to run

torchx.components.utils.touch(file: str, image: str = 'ghcr.io/pytorch/torchx:0.8.0dev0') → AppDef[source]¶

Touches a file (calls touch)

Parameters:

file – file to create
image – the image to use

torchx.components.utils.sh(*args: str, image: str = 'ghcr.io/pytorch/torchx:0.8.0dev0', num_replicas: int = 1, cpu: int = 1, gpu: int = 0, memMB: int = 1024, h: str | None = None, env: dict[str, str] | None = None, max_retries: int = 0, mounts: list[str] | None = None, entrypoint: str | None = None) → AppDef[source]¶

Runs the provided command via sh. Currently sh does not support environment variable substitution.

Parameters:

args – bash arguments
image – image to use
num_replicas – number of replicas to run
cpu – number of cpus per replica
gpu – number of gpus per replica
memMB – cpu memory in MB per replica
h – a registered named resource (if specified takes precedence over cpu, gpu, memMB)
env – environment varibles to be passed to the run (e.g. ENV1=v1,ENV2=v2,ENV3=v3)
max_retries – the number of scheduler retries allowed
mounts – mounts to mount into the worker environment/container (ex. type=<bind/volume>,src=/host,dst=/job[,readonly]). See scheduler documentation for more info.
entrypoint – the entrypoint to use for the command (defaults to sh)

torchx.components.utils.copy(src: str, dst: str, image: str = 'ghcr.io/pytorch/torchx:0.8.0dev0') → AppDef[source]¶

copy copies the file from src to dst. src and dst can be any valid fsspec url.

This does not support recursive copies or directories.

Parameters:

src – the source fsspec file location
dst – the destination fsspec file location
image – the image that contains the copy app

torchx.components.utils.python(*args: str, m: str | None = None, c: str | None = None, script: str | None = None, image: str = 'ghcr.io/pytorch/torchx:0.8.0dev0', name: str = 'torchx_utils_python', cpu: int = 1, gpu: int = 0, memMB: int = 1024, h: str | None = None, num_replicas: int = 1) → AppDef[source]¶

Runs python with the specified module, command or script on the specified image and host. Use -- to separate component args and program args (e.g. torchx run utils.python --m foo.main -- --args to --main)

Note: (cpu, gpu, memMB) parameters are mutually exclusive with h (named resource) where: h takes precedence if specified for setting resource requirements. See registering named resources.

Parameters:

args – arguments passed to the program in sys.argv[1:] (ignored with –c)
m – run library module as a script
c – program passed as string (may error if scheduler has a length limit on args)
script – .py script to run
image – image to run on
name – name of the job
cpu – number of cpus per replica
gpu – number of gpus per replica
memMB – cpu memory in MB per replica
h – a registered named resource (if specified takes precedence over cpu, gpu, memMB)
num_replicas – number of copies to run (each on its own container)

Returns:

utils.python works like regular python but supports remote launches. The torchx run command patches (overlays) your current working directory onto the image so local changes are reflected in the remote job.

# run inline code locally
$ torchx run utils.python -c "import torch; print(torch.__version__)"

# run a module locally
$ torchx run utils.python -m foo.bar.main

# run a script locally
$ torchx run utils.python --script my_app.py

# run on Kubernetes
$ torchx run -s kubernetes utils.python --script my_app.py

Important

Be careful with -c CMD – schedulers have a character limit on arguments. Prefer -m or --script for anything non-trivial.
Exactly one of -m, -c, or --script must be specified.

torchx.components.utils.booth(x1: float, x2: float, trial_idx: int = 0, tracker_base: str = '/tmp/torchx-util-booth', image: str = 'ghcr.io/pytorch/torchx:0.8.0dev0') → AppDef[source]¶

Evaluates the booth function, f(x1, x2) = (x1 + 2*x2 - 7)^2 + (2*x1 + x2 - 5)^2. Output result is accessible via FsspecResultTracker(outdir)[trial_idx]

Parameters:

x1 – x1
x2 – x2
trial_idx – ignore if not running hpo
tracker_base – URI of the tracker’s base output directory (e.g. s3://foo/bar)
image – the image that contains the booth app

torchx.components.utils.binary(*args: str, entrypoint: str, name: str = 'torchx_utils_binary', num_replicas: int = 1, cpu: int = 1, gpu: int = 0, memMB: int = 1024, h: str | None = None) → AppDef[source]¶

Test component

Parameters:

args – arguments passed to the program in sys.argv[1:] (ignored with –c)
name – name of the job
num_replicas – number of copies to run (each on its own container)
cpu – number of cpus per replica
gpu – number of gpus per replica
memMB – cpu memory in MB per replica
h – a registered named resource (if specified takes precedence over cpu, gpu, memMB)

Returns:

torchx.components.utils.hydra(*overrides: str, config_name: str, config_dir: str = '.torchx') → AppDef[source]¶

Build AppDef from Hydra configuration.

Config should have an ‘app’ key with _target_: torchx.specs.AppDef. Other top-level keys (like ‘role’) can be used for config groups and interpolation.

Example:

defaults:
  - role: python

app:
  _target_: torchx.specs.AppDef
  name: my_job
  roles:
    - ${role}

Parameters:

overrides – Hydra config overrides (e.g., role.num_replicas=2)
config_name – Config file name in config_dir
config_dir – Directory containing configs (default: .torchx)

Returns:

AppDef instantiated from configuration

utils.hydra builds an AppDef from a Hydra config, letting you declare jobs as YAML with config groups, interpolation, and CLI overrides instead of Python kwargs.

Important

utils.hydra requires hydra-core (which also pulls in omegaconf). It is not installed by default – install it explicitly:

pip install hydra-core

The config must have an app key whose _target_ is torchx.specs.AppDef. Example .torchx/my_job.yaml:

app:
  _target_: torchx.specs.AppDef
  name: my_job
  roles:
    - _target_: torchx.specs.Role
      name: trainer
      image: alpine:latest
      entrypoint: echo
      num_replicas: 1
      args:
        - hello

Run it (with optional Hydra-style overrides after --):

# uses config_dir=.torchx by default
$ torchx run utils.hydra -cn my_job

# override any field from the CLI
$ torchx run utils.hydra -cn my_job -- app.roles.0.num_replicas=3

TorchX macros are exposed as OmegaConf resolvers and can be referenced from configs: ${torchx.app_id:}, ${torchx.replica_id:}, ${torchx.rank0_env:}, ${torchx.img_root:}.

Utils¶

Docs

Tutorials

Resources