Metrics¶
For metrics we recommend using Tensorboard to log metrics directly to cloud storage along side your model. As the model trains you can launch a tensorboard instance locally to monitor your model progress:
$ tensorboard --log-dir provider://path/to/logs
Or you can use the torchx.components.metrics.tensorboard() component as
part of your pipeline.
Reference¶
PyTorch Tensorboard Tutorial https://pytorch.org/tutorials/intermediate/tensorboard_tutorial.html
PyTorch Lightning Loggers https://pytorch-lightning.readthedocs.io/en/stable/extensions/logging.html
- torchx.components.metrics.tensorboard(logdir: str, image: str = 'ghcr.io/pytorch/torchx:0.8.0dev0', timeout: float = 3600, port: int = 6006, start_on_file: str = '', exit_on_file: str = '') AppDef[source]¶
This component runs a Tensorboard server which will render the logs specified by logdir.
Since Tensorboard runs as a service you need to specify the termination conditions. This consists of a timeout as well as an optional
exit_on_filewhich will cause the service to quit when that path is created.The files are periodically polled for existence via fsspec and will trigger the corresponding behavior when created.
- Parameters:
logdir – fsspec path to the Tensorboard logs
image – image to use
timeout – maximum time to run before exiting (seconds)
start_on_file – start the server when the fsspec path is created
exit_on_file – shutdown the server when the fsspec path is created