Quickstart ============== Running a Builtin Component ----------------------------- Easiest way to get started with TorchX is through the provided CLI. .. code-block:: shell-session $ pip install torchx $ torchx --help With TorchX you can BYO app but TorchX has a collection of builtins. For now lets take a look at the builtins .. code-block:: shell-session $ torchx builtins Found builtin configs: ... i. utils.echo j. utils.touch ... Echo looks familiar and simple. Lets understand how to run ``utils.echo``. .. code-block:: shell-session $ torchx run --scheduler local utils.echo --help usage: torchx run echo [-h] [--msg MSG] Echos a message optional arguments: -h, --help show this help message and exit --msg MSG Message to echo We can see that it takes a ``--msg`` argument. Lets try running it locally .. code-block:: shell-session $ torchx run --scheduler local utils.echo --msg "hello world" .. note:: ``echo`` in this context is just an app spec. It is not the application logic itself but rather just the "job definition" for running `/bin/echo`. If you haven't done so already, this is a good time to read through the :ref:`Basic Concepts` familiarize yourself with the basic concepts. Defining Your Own Component ---------------------------- Now lets try to implement ``echo`` ourselves. To make things more interesting we'll add two more parameters to our version of ``echo``: 1. Number of replicas to run in parallel 2. Prefix the message with the replica id First we create an app spec file. This is just a regular python file where we define the app spec. .. code-block:: shell-session $ touch ~/test.py Now copy paste the following into test.py :: import torchx.specs as specs def echo(num_replicas: int, msg: str = "hello world") -> specs.AppDef: """ Echos a message to stdout (calls /bin/echo) Args: num_replicas: number of copies (in parallel) to run msg: message to echo """ return specs.AppDef( name="echo", roles=[ specs.Role( name="echo", entrypoint="/bin/echo", image="/tmp", args=[f"replica #{specs.macros.replica_id}: {msg}"], num_replicas=num_replicas, ) ], ) Notice that 1. Unlike ``--msg``, ``--num_replicas`` does not have a default value indicating that it is a required argument. 2. We use a local dir (``/tmp``) as the ``image``. In practice this will be the identifier of the package (e.g. Docker image) that the scheduler supports. 3. ``echo_torchx.py`` does **not** contain the logic of the app and is simply a job definition. Now lets try running our custom ``echo`` .. code-block:: shell-session $ torchx run --scheduler local ~/test.py:echo --num_replicas 4 --msg "foobar" replica #0: foobar replica #1: foobar replica #2: foobar replica #3: foobar Running on Other Images ----------------------------- So far we've run ``utils.echo`` with ``image=/tmp``. This means that the ``entrypoint`` we specified is relative to ``/tmp``. That did not matter for us since we specified an absolute path as the entrypoint (``entrypoint=/bin/echo``). Had we specified ``entrypoint=echo`` the local scheduler would have tried to invoke ``/tmp/echo``. If you have a pre-built application binary, setting the image to a local directory is a quick way to validate the application and the ``specs.AppDef``. But its not all that useful if you want to run the application on a remote scheduler (see :ref:`Running On Other Schedulers`). .. note:: The ``image`` string in ``specs.Role`` is an identifier to a container image supported by the scheduler. Refer to the scheduler documentation to find out what container image is supported by the scheduler you want to use. For ``local`` scheduler we can see that it supports both a local directory and docker as the image: .. code-block:: shell-session $ torchx runopts local { 'image_type': { 'default': 'dir', 'help': 'image type. One of [dir, docker]', 'type': 'str'}, ... ... .. note:: Before proceeding, you will need docker installed. If you have not done so already follow the install instructions on: https://docs.docker.com/get-docker/ Now lets try running ``echo`` from a docker container. Modify echo's ``AppDef`` in ``~/test.py`` you created in the previous section to make the ``image="ubuntu:latest"``. :: import torchx.specs as specs def echo(num_replicas: int, msg: str = "hello world") -> specs.AppDef: """ Echos a message to stdout (calls /bin/echo) Args: num_replicas: number of copies (in parallel) to run msg: message to echo """ return specs.AppDef( name="echo", roles=[ specs.Role( name="echo", entrypoint="/bin/echo", image="ubuntu:latest", # IMAGE NOW POINTS TO THE UBUNTU DOCKER IMAGE args=[f"replica #{specs.macros.replica_id}: {msg}"], num_replicas=num_replicas, ) ], ) Try running the echo app .. code-block:: shell-session $ torchx run --scheduler local \ --scheduler_args image_type=docker \ ~/test.py:echo \ --num_replicas 4 \ --msg "foobar from docker!" Running On Other Schedulers ----------------------------- So far we've launched components locally. Lets take a look at how to run this on real schedulers. .. note:: This section assumes you have already setup a running instance of one of the supported schedulers Lets take a look at which schedulers we can launch into and pick one that you have already setup. .. code-block:: shell-session $ torchx schedulers For most schedulers you will have to specify an additional ``--scheduler_args`` parameter. These are launch-time parameters to the scheduler that are associated to the run **instance** of your application (job) rather than the job definition (app spec) of your application, for example job ``priority``. Scheduler args are scheduler specific so you'll have to find out what scheduler args are required by the scheduler you are planning to use .. code-block:: shell-session $ torchx runopts $ torchx runopts local Now that you've figured out what scheduler args are required, launch your app .. code-block:: shell-session $ torchx run --scheduler --scheduler_args \ ~/my_app.py $ torchx run --scheduler local --scheduler_args image_type=dir,log_dir=/tmp \ ~/my_app.py --foo=bar .. note:: If your app args overlap with the ``run`` subcommand's args, you have to use the ``--`` delimiter for argparse to not get confused. For example, if your app also takes a ``--scheduler`` argument, run it as shown below. .. code-block:: shell-session $ torchx run --scheduler local ~/my_app.py -- --scheduler foobar Next Steps ------------ 1. Checkout other features of the :ref:`torchx CLI` 2. Learn how to author more complex app specs by referencing :ref:`torchx.specs` 3. Browse through the collection of :ref:`builtin components` 4. Take a look at the :ref:`list of schedulers` supported by the runner 5. See which :ref:`ML pipeline platforms` you can run components on