setup_torch_profiler¶
- torchtune.training.setup_torch_profiler(enabled: bool = False, cpu: bool = True, cuda: bool = True, profile_memory: bool = False, with_stack: bool = False, record_shapes: bool = True, with_flops: bool = False, wait_steps: Optional[int] = None, warmup_steps: Optional[int] = None, active_steps: Optional[int] = None, num_cycles: Optional[int] = None, output_dir: Optional[str] = None) Tuple[profile, DictConfig][source]¶
Sets up
profileand returns the profiler config with post-setup updates.The profiler config can be provided in configs under the
profilerkey with the following layout:profiler: _component_: torchtune.training.setup_torch_profiler enabled: bool # Output directory of trace artifacts output_dir: str # torch.profiler.ProfilerActivity types to trace cpu: bool cuda: bool # Trace options profile_memory: bool with_stack: bool record_shapes: bool with_flops: bool # torch.profiler.schedule args wait_steps: int warmup_steps: int active_steps: int num_cycles: int
The profiler schedule updates with respect to an optimizer step (e.g., if
gradient_accumulation = 2, then the profiler will step every 2 batches).Sensible defaults will be chosen if the config is missing options:
If no activities are specified, profiler will default to CPU + CUDA
If no schedule is specified, profiler will default to
DEFAULT_SCHEDULECertain options will be overridden (
with_stackandrecord_shapes) depending on requirements of other options (e.g.,profile_memoryrequireswith_stackandrecord_shapes).
Note
Enabling the profiler will result in training speed reduction.
Setting
profile_memory: Truewill generate large trace files.The profiler schedule is context dependent. Calling
profiler.step()at each batch iteration but outside the gradient accumulation scope willstepthe profiler each forward / backward step. Callingprofiler.step()each batch iteration but within the gradient accumulation scope willstepthe profiler each optimizer update step such that eachstepcontains multiple forward / backward passes.
- Parameters:
enabled (bool) – Enable pytorch profiler. Default is False.
cpu (bool) – Enable cpu profiling. Default is True.
cuda (bool) – Enable cuda profiling. Default is True.
profile_memory (bool) – Profile memory usage. Default is False.
with_stack (bool) – Profile stack. Default is False.
record_shapes (bool) – Record shapes. Default is True.
with_flops (bool) – Profile flops. Default is False.
wait_steps (Optional[int]) – Wait time in steps. Maps to
waitkwarg oftorch.profiler.schedule.warmup_steps (Optional[int]) – Warmup time in steps. Maps to
warmupkwarg oftorch.profiler.schedule.active_steps (Optional[int]) – Active time in steps. Maps to
activekwarg oftorch.profiler.schedule.num_cycles (Optional[int]) – Number of profiling cycles. Maps to
repeatkwarg oftorch.profiler.schedule.output_dir (Optional[str]) – Tracing file output path.
- Returns:
Tuple[torch.profiler.profile, DictConfig]