Shortcuts

Slurm

class torchx.schedulers.slurm_scheduler.SlurmScheduler(session_name: str)[source]

SlurmScheduler is a TorchX scheduling interface to slurm. TorchX expects that slurm CLI tools are locally installed and job accounting is enabled.

Each app def is scheduled using a heterogenous job via sbatch. Each replica of each role has a unique shell script generated with it’s resource allocations and args and then sbatch is used to launch all of them together.

Logs are written to the default slurm log file.

Any scheduler options passed to it are added as SBATCH arguments to each replica.

For more info see:

$ torchx run --scheduler slurm utils.echo --msg hello
slurm://torchx_user/1234
$ torchx status slurm://torchx_user/1234
$ less slurm-1234.out
...
describe(app_id: str)Optional[torchx.schedulers.api.DescribeAppResponse][source]

Describes the specified application.

Returns

AppDef description or None if the app does not exist.

schedule(dryrun_info: torchx.specs.api.AppDryRunInfo[torchx.schedulers.slurm_scheduler.SlurmBatchRequest])str[source]

Same as submit except that it takes an AppDryRunInfo. Implementors are encouraged to implement this method rather than directly implementing submit since submit can be trivially implemented by:

dryrun_info = self.submit_dryrun(app, cfg)
return schedule(dryrun_info)

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources