Component Best Practices¶
Tip
Best practices for authoring reusable TorchX components: entrypoints, simplicity, named resources, composition, and testing.
Prerequisites: Custom Components.
These practices reflect conventions used in the builtin components. Deviate when your use case demands it.
Entrypoints¶
Prefer python -m <module> over a path to the main module. Module
resolution works across environments (Docker, Slurm) regardless of directory
structure.
For non-Python apps, place the binary on PATH instead.
from torchx.specs import AppDef, Role
def trainer(img_name: str, img_version: str) -> AppDef:
return AppDef(name="trainer", roles=[
Role(
name="trainer",
image=f"{img_name}:{img_version}",
entrypoint="python",
args=["-m", "your.app"],
)
])
Simplify¶
Keep each component as simple as possible.
Argument Processing¶
Pass image directly to AppDef without manipulation – processing
breaks portability across environments.
def trainer(image: str) -> AppDef:
return AppDef(name="trainer", roles=[Role(name="trainer", image=image)])
Branching Logic¶
Avoid if statements in components. Create multiple components with shared
private helpers instead.
def trainer_test() -> AppDef:
return _trainer(num_replicas=1)
def trainer_prod() -> AppDef:
return _trainer(num_replicas=10)
# not a component — just a shared helper
def _trainer(num_replicas: int) -> AppDef:
return AppDef(
name="trainer",
roles=[Role(name="trainer", image="my_image:latest", num_replicas=num_replicas)],
)
Documentation¶
Document component functions. See Overview for examples.
Named Resources¶
Use named resources instead of hard-coding CPU and memory values:
from torchx.specs import resource
resource(h="aws_p3.2xlarge")
See Registering Named Resources for defining custom named resources.
Composing Components¶
Start from base component definitions rather than building AppDef from
scratch:
Custom Components for simple single-node components.
torchx.components.dist.ddp()for distributed components.
You can also merge roles from multiple components to run sidecars alongside the main job.
Distributed Components¶
Use torchx.components.dist.ddp() for distributed training. Extend it
by writing a wrapper that calls ddp with your configuration.
Define All Arguments¶
Define all arguments as function parameters instead of consuming a dictionary. This enables discoverability and static type checking.
Unit Tests¶
You can unit test the component definitions as you would normal Python code since they are valid Python definitions.
We do recommend using ComponentTestCase to ensure that your
component can be parsed by the TorchX CLI. The CLI requires stricter formatting
on the doc string than pure Python as the doc string is used for parsing CLI
args.
- class torchx.components.component_test_base.ComponentTestCase(methodName='runTest')[source]¶
- run_component(component: Callable[[...], AppDef], args: dict[str, Any] | None = None, scheduler_params: dict[str, Any] | None = None, scheduler: str = 'local_cwd', interval: float = 0.1, timeout: float = 1) torchx.specs.api.AppStatus | None[source]¶
Helper function that hides complexity of setting up the runner and polling results. Note: method is blocking until either scheduler exits or timeout is reached (for non-blocking schedulers).
- Parameters:
components – component function, factory for AppDef
args – optional component factory arguments
scheduler_params – optional parameters for scheduler factory method
scheduler – scheduler name
interval – scheduler comppletion polling interval
timeout – max time for scheduler to complete
Integration Tests¶
Use the Runner API or CLI scripts. See the
scheduler integration tests
for examples.
See also
- Quick Reference
Single-page reference with imports, types, and copy-pasteable recipes.
- App Best Practices
Best practices for writing TorchX applications.
- Custom Components
Step-by-step guide for building and launching a custom component.
- Overview
Browse the builtin component library.