Component Best Practices¶

Tip

Best practices for authoring reusable TorchX components: entrypoints, simplicity, named resources, composition, and testing.

Prerequisites: Custom Components.

These practices reflect conventions used in the builtin components. Deviate when your use case demands it.

Entrypoints¶

Prefer python -m <module> over a path to the main module. Module resolution works across environments (Docker, Slurm) regardless of directory structure.

For non-Python apps, place the binary on PATH instead.

from torchx.specs import AppDef, Role

def trainer(img_name: str, img_version: str) -> AppDef:
    return AppDef(name="trainer", roles=[
        Role(
            name="trainer",
            image=f"{img_name}:{img_version}",
            entrypoint="python",
            args=["-m", "your.app"],
        )
    ])

Simplify¶

Keep each component as simple as possible.

Argument Processing¶

Pass image directly to AppDef without manipulation – processing breaks portability across environments.

def trainer(image: str) -> AppDef:
    return AppDef(name="trainer", roles=[Role(name="trainer", image=image)])

Branching Logic¶

Avoid if statements in components. Create multiple components with shared private helpers instead.

def trainer_test() -> AppDef:
    return _trainer(num_replicas=1)

def trainer_prod() -> AppDef:
    return _trainer(num_replicas=10)

# not a component — just a shared helper
def _trainer(num_replicas: int) -> AppDef:
    return AppDef(
        name="trainer",
        roles=[Role(name="trainer", image="my_image:latest", num_replicas=num_replicas)],
    )

Documentation¶

Document component functions. See Overview for examples.

Named Resources¶

Use named resources instead of hard-coding CPU and memory values:

from torchx.specs import resource

resource(h="aws_p3.2xlarge")

See Registering Named Resources for defining custom named resources.

Composing Components¶

Start from base component definitions rather than building AppDef from scratch:

Custom Components for simple single-node components.
torchx.components.dist.ddp() for distributed components.

You can also merge roles from multiple components to run sidecars alongside the main job.

Distributed Components¶

Use torchx.components.dist.ddp() for distributed training. Extend it by writing a wrapper that calls ddp with your configuration.

Define All Arguments¶

Define all arguments as function parameters instead of consuming a dictionary. This enables discoverability and static type checking.

Unit Tests¶

You can unit test the component definitions as you would normal Python code since they are valid Python definitions.

We do recommend using ComponentTestCase to ensure that your component can be parsed by the TorchX CLI. The CLI requires stricter formatting on the doc string than pure Python as the doc string is used for parsing CLI args.

class torchx.components.component_test_base.ComponentTestCase(methodName='runTest')[source]¶

run_component(component: Callable[[...], AppDef], args: dict[str, Any] | None = None, scheduler_params: dict[str, Any] | None = None, scheduler: str = 'local_cwd', interval: float = 0.1, timeout: float = 1) → torchx.specs.api.AppStatus | None[source]¶

Helper function that hides complexity of setting up the runner and polling results. Note: method is blocking until either scheduler exits or timeout is reached (for non-blocking schedulers).

Parameters:

components – component function, factory for AppDef
args – optional component factory arguments
scheduler_params – optional parameters for scheduler factory method
scheduler – scheduler name
interval – scheduler comppletion polling interval
timeout – max time for scheduler to complete

setUp() → None[source]¶: Hook method for setting up the test fixture before exercising it.

tearDown() → None[source]¶: Hook method for deconstructing the test fixture after testing it.

validate(module: module, function_name: str) → None[source]¶

Validates the component by effectively running:

$ torchx run COMPONENT.py:FN --help

Integration Tests¶

Use the Runner API or CLI scripts. See the scheduler integration tests for examples.