.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "generated_examples/decoding/performance_tips.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_generated_examples_decoding_performance_tips.py:


.. meta::
   :description: Learn how to optimize TorchCodec video decoding performance with batch APIs, approximate seeking, multi-threading, and CUDA acceleration.

==============================================
TorchCodec Performance Tips and Best Practices
==============================================

This tutorial consolidates performance optimization techniques for video
decoding with TorchCodec. Learn when and how to apply various strategies
to increase performance.

.. GENERATED FROM PYTHON SOURCE LINES 21-34

Overview
--------

When decoding videos with TorchCodec, several techniques can significantly
improve performance depending on your use case. This guide covers:

1. **Batch APIs** - Decode multiple frames at once
2. **Approximate Mode & Keyframe Mappings** - Trade accuracy for speed
3. **Multi-threading** - Parallelize decoding across videos or chunks
4. **CUDA Acceleration** - Use GPU decoding for supported formats
5. **Decoder Native Transforms** - Apply transforms during decoding for memory efficiency

We'll explore each technique and when to use it.

.. GENERATED FROM PYTHON SOURCE LINES 36-58

1. Use Batch APIs When Possible
--------------------------------

If you need to decode multiple frames at once, the batch methods are faster than calling single-frame decoding methods multiple times.
For example, :meth:`~torchcodec.decoders.VideoDecoder.get_frames_at` is faster than calling :meth:`~torchcodec.decoders.VideoDecoder.get_frame_at` multiple times.
TorchCodec's batch APIs reduce overhead and can leverage internal optimizations.

**Key Methods:**

For index-based frame retrieval:

- :meth:`~torchcodec.decoders.VideoDecoder.get_frames_at` for specific indices
- :meth:`~torchcodec.decoders.VideoDecoder.get_frames_in_range` for ranges

For timestamp-based frame retrieval:

- :meth:`~torchcodec.decoders.VideoDecoder.get_frames_played_at` for timestamps
- :meth:`~torchcodec.decoders.VideoDecoder.get_frames_played_in_range` for time ranges

**When to use:**

- Decoding multiple frames

.. GENERATED FROM PYTHON SOURCE LINES 60-64

.. note::

    For complete examples with runnable code demonstrating batch decoding,
    iteration, and frame retrieval, see :ref:`sphx_glr_generated_examples_decoding_basic_example.py`

.. GENERATED FROM PYTHON SOURCE LINES 66-73

2. Approximate Mode & Keyframe Mappings
----------------------------------------

By default, TorchCodec uses ``seek_mode="exact"``, which performs a :term:`scan` when
you create the decoder to build an accurate internal index of frames. This
ensures frame-accurate seeking but takes longer for decoder initialization,
especially on long videos.

.. GENERATED FROM PYTHON SOURCE LINES 75-88

**Approximate Mode**
~~~~~~~~~~~~~~~~~~~~

Setting ``seek_mode="approximate"`` skips the initial :term:`scan` and relies on the
video file's metadata headers. This dramatically speeds up
:class:`~torchcodec.decoders.VideoDecoder` creation, particularly for long
videos, but may result in slightly less accurate seeking in some cases.


**Which mode should you use:**

- If you care about exactness of frame seeking, use “exact”.
- If the video is long and you're only decoding a small amount of frames, approximate mode should be faster.

.. GENERATED FROM PYTHON SOURCE LINES 90-103

**Custom Frame Mappings**
~~~~~~~~~~~~~~~~~~~~~~~~~

For advanced use cases, you can pre-compute a custom mapping between desired
frame indices and actual keyframe locations. This allows you to speed up :class:`~torchcodec.decoders.VideoDecoder`
instantiation while maintaining the frame seeking accuracy of ``seek_mode="exact"``

**When to use:**

- Frame accuracy is critical, so you cannot use approximate mode
- You can preprocess videos once and then decode them many times

**Performance impact:** speeds up decoder instantiation, similarly to ``seek_mode="approximate"``.

.. GENERATED FROM PYTHON SOURCE LINES 105-110

.. note::

    For complete benchmarks showing actual speedup numbers, accuracy comparisons,
    and implementation examples, see :ref:`sphx_glr_generated_examples_decoding_approximate_mode.py`
    and :ref:`sphx_glr_generated_examples_decoding_custom_frame_mappings.py`

.. GENERATED FROM PYTHON SOURCE LINES 112-122

3. Multi-threading for Parallel Decoding
-----------------------------------------

When decoding multiple videos or decoding a large number of frames from a single video, there are a few parallelization strategies to speed up the decoding process:

- **FFmpeg-based parallelism** - Using FFmpeg's internal threading capabilities for intra-frame parallelism, where parallelization happens within individual frames rather than across frames. For that, use the `num_ffmpeg_threads` parameter of the :class:`~torchcodec.decoders.VideoDecoder`
- **Multiprocessing** - Distributing work across multiple processes
- **Multithreading** - Using multiple threads within a single process

You can use both multiprocessing and multithreading to decode multiple videos in parallel, or to decode a single long video in parallel by splitting it into chunks.

.. GENERATED FROM PYTHON SOURCE LINES 124-129

.. note::

    For complete examples comparing
    sequential, ffmpeg-based parallelism, multi-process, and multi-threaded approaches, see
    :ref:`sphx_glr_generated_examples_decoding_parallel_decoding.py`

.. GENERATED FROM PYTHON SOURCE LINES 131-158

4. CUDA Acceleration
--------------------

TorchCodec supports GPU-accelerated decoding using NVIDIA's hardware decoder
(NVDEC) on supported hardware. This keeps decoded tensors in GPU memory,
avoiding expensive CPU-GPU transfers for downstream GPU operations.

Pass ``device="cuda"`` to enable CUDA decoding:

.. code-block:: python

    decoder = VideoDecoder("file.mp4", device="cuda")

**When to use:**

- Decoding large resolution videos
- Large batch of videos saturating the CPU

**When NOT to use:**

- You need bit-exact results with CPU decoding
- Small resolution videos and the PCI-e transfer latency is large
- GPU is already busy and CPU is idle

**Performance impact:** CUDA decoding can significantly outperform CPU decoding,
especially for high-resolution videos and when decoding a lot of frames.
Actual speedup varies by hardware, resolution, and codec.

.. GENERATED FROM PYTHON SOURCE LINES 160-181

**Checking for CPU Fallback**
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In some cases, CUDA decoding may silently fall back to CPU decoding when the
video codec or format is not supported by NVDEC. You can detect this using
the :attr:`~torchcodec.decoders.VideoDecoder.cpu_fallback` attribute:

.. code-block:: python

    decoder = VideoDecoder("file.mp4", device="cuda")

    # Print detailed fallback status
    print(decoder.cpu_fallback)

.. note::

    Fallback status is determined upfront, so you can check
    ``decoder.cpu_fallback`` immediately after creating the decoder.

    For installation instructions, detailed examples, and visual comparisons
    between CPU and CUDA decoding, see :ref:`sphx_glr_generated_examples_decoding_basic_cuda_example.py`

.. GENERATED FROM PYTHON SOURCE LINES 183-215

5. Decoder Native Transforms
----------------------------

TorchCodec supports applying transforms like resize and crop *during* the
decoding process itself, rather than as a separate post-processing step.
This can lead to significant memory savings, especially when decoding
high-resolution videos that will be resized to smaller dimensions.

:class:`~torchcodec.decoders.VideoDecoder` accepts both TorchCodec
:class:`~torchcodec.transforms.DecoderTransform` objects and TorchVision
:class:`~torchvision.transforms.v2.Transform` objects as transform
specifications. TorchVision is **not required** to use decoder transforms.

**Example:**

.. code-block:: python

    from torchcodec.decoders import VideoDecoder
    from torchcodec.transforms import Resize

    decoder = VideoDecoder(
        "file.mp4",
        transforms=[Resize(size=(480, 640))]
    )

**When to use:**

- If you are applying a transform pipeline that significantly reduces the
  dimensions of your input frames and memory efficiency matters.
- If you are using multiple FFmpeg threads, decoder transforms may be faster.
  Experiment with your setup to verify.


.. GENERATED FROM PYTHON SOURCE LINES 217-222

.. note::

    For complete examples with memory benchmarks, transform pipelines, and
    detailed comparisons between decoder transforms and TorchVision transforms,
    see :ref:`sphx_glr_generated_examples_decoding_transforms.py`

.. GENERATED FROM PYTHON SOURCE LINES 224-250

6. WavDecoder for WAV Files
---------------------------

If you are decoding WAV files and don't need resampling (``sample_rate``
parameter) or channel remixing (``num_channels`` parameter), consider using
:class:`~torchcodec.decoders.WavDecoder` instead of
:class:`~torchcodec.decoders.AudioDecoder`.
:class:`~torchcodec.decoders.WavDecoder` bypasses FFmpeg's demuxer and
decoder and reads WAV data directly, resulting in significantly faster
decoding.

:class:`~torchcodec.decoders.WavDecoder` has the same
:meth:`~torchcodec.decoders.WavDecoder.get_all_samples` and
:meth:`~torchcodec.decoders.WavDecoder.get_samples_played_in_range` methods
as :class:`~torchcodec.decoders.AudioDecoder`, so switching between them is
straightforward.

**When to use:**

- Decoding WAV files without resampling or channel remixing
- Latency-sensitive applications

**When NOT to use:**

- Non-WAV audio formats (mp3, flac, etc.)
- You need resampling or channel remixing

.. GENERATED FROM PYTHON SOURCE LINES 252-275

Conclusion
----------

TorchCodec offers multiple performance optimization strategies, each suited to
different scenarios. Use batch APIs for multi-frame decoding, approximate mode
for faster initialization, parallel processing for high throughput, CUDA
acceleration to offload the CPU, decoder native transforms for memory
efficiency, and :class:`~torchcodec.decoders.WavDecoder` for fast WAV
decoding.

The best results often come from combining techniques. Profile your specific
use case and apply optimizations incrementally, using the benchmarks in the
linked examples as a guide.

For more information, see:

- :ref:`sphx_glr_generated_examples_decoding_basic_example.py` - Basic decoding examples
- :ref:`sphx_glr_generated_examples_decoding_approximate_mode.py` - Approximate mode benchmarks
- :ref:`sphx_glr_generated_examples_decoding_custom_frame_mappings.py` - Custom frame mappings
- :ref:`sphx_glr_generated_examples_decoding_parallel_decoding.py` - Parallel decoding strategies
- :ref:`sphx_glr_generated_examples_decoding_basic_cuda_example.py` - CUDA acceleration guide
- :ref:`sphx_glr_generated_examples_decoding_transforms.py` - Decoder transforms guide
- :class:`torchcodec.decoders.VideoDecoder` - Full API reference


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 0.000 seconds)


.. _sphx_glr_download_generated_examples_decoding_performance_tips.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: performance_tips.ipynb <performance_tips.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: performance_tips.py <performance_tips.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: performance_tips.zip <performance_tips.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_