.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "generated_examples/decoding/performance_tips.py" .. LINE NUMBERS ARE GIVEN BELOW. .. rst-class:: sphx-glr-example-title .. _sphx_glr_generated_examples_decoding_performance_tips.py: .. meta:: :description: Learn how to optimize TorchCodec video decoding performance with batch APIs, approximate seeking, multi-threading, and CUDA acceleration. ============================================== TorchCodec Performance Tips and Best Practices ============================================== This tutorial consolidates performance optimization techniques for video decoding with TorchCodec. Learn when and how to apply various strategies to increase performance. .. GENERATED FROM PYTHON SOURCE LINES 22-34 Overview -------- When decoding videos with TorchCodec, several techniques can significantly improve performance depending on your use case. This guide covers: 1. **Batch APIs** - Decode multiple frames at once 2. **Approximate Mode & Keyframe Mappings** - Trade accuracy for speed 3. **Multi-threading** - Parallelize decoding across videos or chunks 4. **CUDA Acceleration** - Use GPU decoding for supported formats We'll explore each technique and when to use it. .. GENERATED FROM PYTHON SOURCE LINES 36-58 1. Use Batch APIs When Possible -------------------------------- If you need to decode multiple frames at once, the batch methods are faster than calling single-frame decoding methods multiple times. For example, :meth:`~torchcodec.decoders.VideoDecoder.get_frames_at` is faster than calling :meth:`~torchcodec.decoders.VideoDecoder.get_frame_at` multiple times. TorchCodec's batch APIs reduce overhead and can leverage internal optimizations. **Key Methods:** For index-based frame retrieval: - :meth:`~torchcodec.decoders.VideoDecoder.get_frames_at` for specific indices - :meth:`~torchcodec.decoders.VideoDecoder.get_frames_in_range` for ranges For timestamp-based frame retrieval: - :meth:`~torchcodec.decoders.VideoDecoder.get_frames_played_at` for timestamps - :meth:`~torchcodec.decoders.VideoDecoder.get_frames_played_in_range` for time ranges **When to use:** - Decoding multiple frames .. GENERATED FROM PYTHON SOURCE LINES 60-64 .. note:: For complete examples with runnable code demonstrating batch decoding, iteration, and frame retrieval, see :ref:`sphx_glr_generated_examples_decoding_basic_example.py` .. GENERATED FROM PYTHON SOURCE LINES 66-73 2. Approximate Mode & Keyframe Mappings ---------------------------------------- By default, TorchCodec uses ``seek_mode="exact"``, which performs a :term:`scan` when you create the decoder to build an accurate internal index of frames. This ensures frame-accurate seeking but takes longer for decoder initialization, especially on long videos. .. GENERATED FROM PYTHON SOURCE LINES 75-88 **Approximate Mode** ~~~~~~~~~~~~~~~~~~~~ Setting ``seek_mode="approximate"`` skips the initial :term:`scan` and relies on the video file's metadata headers. This dramatically speeds up :class:`~torchcodec.decoders.VideoDecoder` creation, particularly for long videos, but may result in slightly less accurate seeking in some cases. **Which mode should you use:** - If you care about exactness of frame seeking, use “exact”. - If the video is long and you're only decoding a small amount of frames, approximate mode should be faster. .. GENERATED FROM PYTHON SOURCE LINES 90-103 **Custom Frame Mappings** ~~~~~~~~~~~~~~~~~~~~~~~~~ For advanced use cases, you can pre-compute a custom mapping between desired frame indices and actual keyframe locations. This allows you to speed up :class:`~torchcodec.decoders.VideoDecoder` instantiation while maintaining the frame seeking accuracy of ``seek_mode="exact"`` **When to use:** - Frame accuracy is critical, so you cannot use approximate mode - You can preprocess videos once and then decode them many times **Performance impact:** speeds up decoder instantiation, similarly to ``seek_mode="approximate"``. .. GENERATED FROM PYTHON SOURCE LINES 105-110 .. note:: For complete benchmarks showing actual speedup numbers, accuracy comparisons, and implementation examples, see :ref:`sphx_glr_generated_examples_decoding_approximate_mode.py` and :ref:`sphx_glr_generated_examples_decoding_custom_frame_mappings.py` .. GENERATED FROM PYTHON SOURCE LINES 112-122 3. Multi-threading for Parallel Decoding ----------------------------------------- When decoding multiple videos or decoding a large number of frames from a single video, there are a few parallelization strategies to speed up the decoding process: - **FFmpeg-based parallelism** - Using FFmpeg's internal threading capabilities for intra-frame parallelism, where parallelization happens within individual frames rather than across frames. For that, use the `num_ffmpeg_threads` parameter of the :class:`~torchcodec.decoders.VideoDecoder` - **Multiprocessing** - Distributing work across multiple processes - **Multithreading** - Using multiple threads within a single process You can use both multiprocessing and multithreading to decode multiple videos in parallel, or to decode a single long video in parallel by splitting it into chunks. .. GENERATED FROM PYTHON SOURCE LINES 124-129 .. note:: For complete examples comparing sequential, ffmpeg-based parallelism, multi-process, and multi-threaded approaches, see :ref:`sphx_glr_generated_examples_decoding_parallel_decoding.py` .. GENERATED FROM PYTHON SOURCE LINES 131-162 4. CUDA Acceleration -------------------- TorchCodec supports GPU-accelerated decoding using NVIDIA's hardware decoder (NVDEC) on supported hardware. This keeps decoded tensors in GPU memory, avoiding expensive CPU-GPU transfers for downstream GPU operations. **Recommended: use the Beta Interface!!** ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We recommend you use the new "beta" CUDA interface which is significantly faster than the previous one, and supports the same features: .. code-block:: python with set_cuda_backend("beta"): decoder = VideoDecoder("file.mp4", device="cuda") **When to use:** - Decoding large resolution videos - Large batch of videos saturating the CPU **When NOT to use:** - You need bit-exact results with CPU decoding - Small resolution videos and the PCI-e transfer latency is large - GPU is already busy and CPU is idle **Performance impact:** CUDA decoding can significantly outperform CPU decoding, especially for high-resolution videos and when decoding a lot of frames. Actual speedup varies by hardware, resolution, and codec. .. GENERATED FROM PYTHON SOURCE LINES 164-189 **Checking for CPU Fallback** ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In some cases, CUDA decoding may silently fall back to CPU decoding when the video codec or format is not supported by NVDEC. You can detect this using the :attr:`~torchcodec.decoders.VideoDecoder.cpu_fallback` attribute: .. code-block:: python with set_cuda_backend("beta"): decoder = VideoDecoder("file.mp4", device="cuda") # Print detailed fallback status print(decoder.cpu_fallback) .. note:: The timing of when you can detect CPU fallback differs between backends: with the **FFmpeg backend**, you can only check fallback status after decoding at least one frame, because FFmpeg determines codec support lazily during decoding; with the **BETA backend**, you can check fallback status immediately after decoder creation, as the backend checks codec support upfront. For installation instructions, detailed examples, and visual comparisons between CPU and CUDA decoding, see :ref:`sphx_glr_generated_examples_decoding_basic_cuda_example.py` .. GENERATED FROM PYTHON SOURCE LINES 191-211 Conclusion ---------- TorchCodec offers multiple performance optimization strategies, each suited to different scenarios. Use batch APIs for multi-frame decoding, approximate mode for faster initialization, parallel processing for high throughput, and CUDA acceleration to offload the CPU. The best results often come from combining techniques. Profile your specific use case and apply optimizations incrementally, using the benchmarks in the linked examples as a guide. For more information, see: - :ref:`sphx_glr_generated_examples_decoding_basic_example.py` - Basic decoding examples - :ref:`sphx_glr_generated_examples_decoding_approximate_mode.py` - Approximate mode benchmarks - :ref:`sphx_glr_generated_examples_decoding_custom_frame_mappings.py` - Custom frame mappings - :ref:`sphx_glr_generated_examples_decoding_parallel_decoding.py` - Parallel decoding strategies - :ref:`sphx_glr_generated_examples_decoding_basic_cuda_example.py` - CUDA acceleration guide - :class:`torchcodec.decoders.VideoDecoder` - Full API reference .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.000 seconds) .. _sphx_glr_download_generated_examples_decoding_performance_tips.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: performance_tips.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: performance_tips.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: performance_tips.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_