.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "generated_examples/decoding/performance_tips.py" .. LINE NUMBERS ARE GIVEN BELOW. .. rst-class:: sphx-glr-example-title .. _sphx_glr_generated_examples_decoding_performance_tips.py: .. meta:: :description: Learn how to optimize TorchCodec video decoding performance with batch APIs, approximate seeking, multi-threading, and CUDA acceleration. ============================================== TorchCodec Performance Tips and Best Practices ============================================== This tutorial consolidates performance optimization techniques for video decoding with TorchCodec. Learn when and how to apply various strategies to increase performance. .. GENERATED FROM PYTHON SOURCE LINES 21-34 Overview -------- When decoding videos with TorchCodec, several techniques can significantly improve performance depending on your use case. This guide covers: 1. **Batch APIs** - Decode multiple frames at once 2. **Approximate Mode & Keyframe Mappings** - Trade accuracy for speed 3. **Multi-threading** - Parallelize decoding across videos or chunks 4. **CUDA Acceleration** - Use GPU decoding for supported formats 5. **Decoder Native Transforms** - Apply transforms during decoding for memory efficiency We'll explore each technique and when to use it. .. GENERATED FROM PYTHON SOURCE LINES 36-58 1. Use Batch APIs When Possible -------------------------------- If you need to decode multiple frames at once, the batch methods are faster than calling single-frame decoding methods multiple times. For example, :meth:`~torchcodec.decoders.VideoDecoder.get_frames_at` is faster than calling :meth:`~torchcodec.decoders.VideoDecoder.get_frame_at` multiple times. TorchCodec's batch APIs reduce overhead and can leverage internal optimizations. **Key Methods:** For index-based frame retrieval: - :meth:`~torchcodec.decoders.VideoDecoder.get_frames_at` for specific indices - :meth:`~torchcodec.decoders.VideoDecoder.get_frames_in_range` for ranges For timestamp-based frame retrieval: - :meth:`~torchcodec.decoders.VideoDecoder.get_frames_played_at` for timestamps - :meth:`~torchcodec.decoders.VideoDecoder.get_frames_played_in_range` for time ranges **When to use:** - Decoding multiple frames .. GENERATED FROM PYTHON SOURCE LINES 60-64 .. note:: For complete examples with runnable code demonstrating batch decoding, iteration, and frame retrieval, see :ref:`sphx_glr_generated_examples_decoding_basic_example.py` .. GENERATED FROM PYTHON SOURCE LINES 66-73 2. Approximate Mode & Keyframe Mappings ---------------------------------------- By default, TorchCodec uses ``seek_mode="exact"``, which performs a :term:`scan` when you create the decoder to build an accurate internal index of frames. This ensures frame-accurate seeking but takes longer for decoder initialization, especially on long videos. .. GENERATED FROM PYTHON SOURCE LINES 75-88 **Approximate Mode** ~~~~~~~~~~~~~~~~~~~~ Setting ``seek_mode="approximate"`` skips the initial :term:`scan` and relies on the video file's metadata headers. This dramatically speeds up :class:`~torchcodec.decoders.VideoDecoder` creation, particularly for long videos, but may result in slightly less accurate seeking in some cases. **Which mode should you use:** - If you care about exactness of frame seeking, use “exact”. - If the video is long and you're only decoding a small amount of frames, approximate mode should be faster. .. GENERATED FROM PYTHON SOURCE LINES 90-103 **Custom Frame Mappings** ~~~~~~~~~~~~~~~~~~~~~~~~~ For advanced use cases, you can pre-compute a custom mapping between desired frame indices and actual keyframe locations. This allows you to speed up :class:`~torchcodec.decoders.VideoDecoder` instantiation while maintaining the frame seeking accuracy of ``seek_mode="exact"`` **When to use:** - Frame accuracy is critical, so you cannot use approximate mode - You can preprocess videos once and then decode them many times **Performance impact:** speeds up decoder instantiation, similarly to ``seek_mode="approximate"``. .. GENERATED FROM PYTHON SOURCE LINES 105-110 .. note:: For complete benchmarks showing actual speedup numbers, accuracy comparisons, and implementation examples, see :ref:`sphx_glr_generated_examples_decoding_approximate_mode.py` and :ref:`sphx_glr_generated_examples_decoding_custom_frame_mappings.py` .. GENERATED FROM PYTHON SOURCE LINES 112-122 3. Multi-threading for Parallel Decoding ----------------------------------------- When decoding multiple videos or decoding a large number of frames from a single video, there are a few parallelization strategies to speed up the decoding process: - **FFmpeg-based parallelism** - Using FFmpeg's internal threading capabilities for intra-frame parallelism, where parallelization happens within individual frames rather than across frames. For that, use the `num_ffmpeg_threads` parameter of the :class:`~torchcodec.decoders.VideoDecoder` - **Multiprocessing** - Distributing work across multiple processes - **Multithreading** - Using multiple threads within a single process You can use both multiprocessing and multithreading to decode multiple videos in parallel, or to decode a single long video in parallel by splitting it into chunks. .. GENERATED FROM PYTHON SOURCE LINES 124-129 .. note:: For complete examples comparing sequential, ffmpeg-based parallelism, multi-process, and multi-threaded approaches, see :ref:`sphx_glr_generated_examples_decoding_parallel_decoding.py` .. GENERATED FROM PYTHON SOURCE LINES 131-158 4. CUDA Acceleration -------------------- TorchCodec supports GPU-accelerated decoding using NVIDIA's hardware decoder (NVDEC) on supported hardware. This keeps decoded tensors in GPU memory, avoiding expensive CPU-GPU transfers for downstream GPU operations. Pass ``device="cuda"`` to enable CUDA decoding: .. code-block:: python decoder = VideoDecoder("file.mp4", device="cuda") **When to use:** - Decoding large resolution videos - Large batch of videos saturating the CPU **When NOT to use:** - You need bit-exact results with CPU decoding - Small resolution videos and the PCI-e transfer latency is large - GPU is already busy and CPU is idle **Performance impact:** CUDA decoding can significantly outperform CPU decoding, especially for high-resolution videos and when decoding a lot of frames. Actual speedup varies by hardware, resolution, and codec. .. GENERATED FROM PYTHON SOURCE LINES 160-181 **Checking for CPU Fallback** ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In some cases, CUDA decoding may silently fall back to CPU decoding when the video codec or format is not supported by NVDEC. You can detect this using the :attr:`~torchcodec.decoders.VideoDecoder.cpu_fallback` attribute: .. code-block:: python decoder = VideoDecoder("file.mp4", device="cuda") # Print detailed fallback status print(decoder.cpu_fallback) .. note:: Fallback status is determined upfront, so you can check ``decoder.cpu_fallback`` immediately after creating the decoder. For installation instructions, detailed examples, and visual comparisons between CPU and CUDA decoding, see :ref:`sphx_glr_generated_examples_decoding_basic_cuda_example.py` .. GENERATED FROM PYTHON SOURCE LINES 183-215 5. Decoder Native Transforms ---------------------------- TorchCodec supports applying transforms like resize and crop *during* the decoding process itself, rather than as a separate post-processing step. This can lead to significant memory savings, especially when decoding high-resolution videos that will be resized to smaller dimensions. :class:`~torchcodec.decoders.VideoDecoder` accepts both TorchCodec :class:`~torchcodec.transforms.DecoderTransform` objects and TorchVision :class:`~torchvision.transforms.v2.Transform` objects as transform specifications. TorchVision is **not required** to use decoder transforms. **Example:** .. code-block:: python from torchcodec.decoders import VideoDecoder from torchcodec.transforms import Resize decoder = VideoDecoder( "file.mp4", transforms=[Resize(size=(480, 640))] ) **When to use:** - If you are applying a transform pipeline that significantly reduces the dimensions of your input frames and memory efficiency matters. - If you are using multiple FFmpeg threads, decoder transforms may be faster. Experiment with your setup to verify. .. GENERATED FROM PYTHON SOURCE LINES 217-222 .. note:: For complete examples with memory benchmarks, transform pipelines, and detailed comparisons between decoder transforms and TorchVision transforms, see :ref:`sphx_glr_generated_examples_decoding_transforms.py` .. GENERATED FROM PYTHON SOURCE LINES 224-250 6. WavDecoder for WAV Files --------------------------- If you are decoding WAV files and don't need resampling (``sample_rate`` parameter) or channel remixing (``num_channels`` parameter), consider using :class:`~torchcodec.decoders.WavDecoder` instead of :class:`~torchcodec.decoders.AudioDecoder`. :class:`~torchcodec.decoders.WavDecoder` bypasses FFmpeg's demuxer and decoder and reads WAV data directly, resulting in significantly faster decoding. :class:`~torchcodec.decoders.WavDecoder` has the same :meth:`~torchcodec.decoders.WavDecoder.get_all_samples` and :meth:`~torchcodec.decoders.WavDecoder.get_samples_played_in_range` methods as :class:`~torchcodec.decoders.AudioDecoder`, so switching between them is straightforward. **When to use:** - Decoding WAV files without resampling or channel remixing - Latency-sensitive applications **When NOT to use:** - Non-WAV audio formats (mp3, flac, etc.) - You need resampling or channel remixing .. GENERATED FROM PYTHON SOURCE LINES 252-275 Conclusion ---------- TorchCodec offers multiple performance optimization strategies, each suited to different scenarios. Use batch APIs for multi-frame decoding, approximate mode for faster initialization, parallel processing for high throughput, CUDA acceleration to offload the CPU, decoder native transforms for memory efficiency, and :class:`~torchcodec.decoders.WavDecoder` for fast WAV decoding. The best results often come from combining techniques. Profile your specific use case and apply optimizations incrementally, using the benchmarks in the linked examples as a guide. For more information, see: - :ref:`sphx_glr_generated_examples_decoding_basic_example.py` - Basic decoding examples - :ref:`sphx_glr_generated_examples_decoding_approximate_mode.py` - Approximate mode benchmarks - :ref:`sphx_glr_generated_examples_decoding_custom_frame_mappings.py` - Custom frame mappings - :ref:`sphx_glr_generated_examples_decoding_parallel_decoding.py` - Parallel decoding strategies - :ref:`sphx_glr_generated_examples_decoding_basic_cuda_example.py` - CUDA acceleration guide - :ref:`sphx_glr_generated_examples_decoding_transforms.py` - Decoder transforms guide - :class:`torchcodec.decoders.VideoDecoder` - Full API reference .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.000 seconds) .. _sphx_glr_download_generated_examples_decoding_performance_tips.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: performance_tips.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: performance_tips.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: performance_tips.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_