Shortcuts

TorchCodec Performance Tips and Best Practices

This tutorial consolidates performance optimization techniques for video decoding with TorchCodec. Learn when and how to apply various strategies to increase performance.

Overview

When decoding videos with TorchCodec, several techniques can significantly improve performance depending on your use case. This guide covers:

  1. Batch APIs - Decode multiple frames at once

  2. Approximate Mode & Keyframe Mappings - Trade accuracy for speed

  3. Multi-threading - Parallelize decoding across videos or chunks

  4. CUDA Acceleration - Use GPU decoding for supported formats

We’ll explore each technique and when to use it.

1. Use Batch APIs When Possible

If you need to decode multiple frames at once, the batch methods are faster than calling single-frame decoding methods multiple times. For example, get_frames_at() is faster than calling get_frame_at() multiple times. TorchCodec’s batch APIs reduce overhead and can leverage internal optimizations.

Key Methods:

For index-based frame retrieval:

For timestamp-based frame retrieval:

When to use:

  • Decoding multiple frames

Note

For complete examples with runnable code demonstrating batch decoding, iteration, and frame retrieval, see Decoding a video with VideoDecoder

2. Approximate Mode & Keyframe Mappings

By default, TorchCodec uses seek_mode="exact", which performs a scan when you create the decoder to build an accurate internal index of frames. This ensures frame-accurate seeking but takes longer for decoder initialization, especially on long videos.

Approximate Mode

Setting seek_mode="approximate" skips the initial scan and relies on the video file’s metadata headers. This dramatically speeds up VideoDecoder creation, particularly for long videos, but may result in slightly less accurate seeking in some cases.

Which mode should you use:

  • If you care about exactness of frame seeking, use “exact”.

  • If the video is long and you’re only decoding a small amount of frames, approximate mode should be faster.

Custom Frame Mappings

For advanced use cases, you can pre-compute a custom mapping between desired frame indices and actual keyframe locations. This allows you to speed up VideoDecoder instantiation while maintaining the frame seeking accuracy of seek_mode="exact"

When to use:

  • Frame accuracy is critical, so you cannot use approximate mode

  • You can preprocess videos once and then decode them many times

Performance impact: speeds up decoder instantiation, similarly to seek_mode="approximate".

Note

For complete benchmarks showing actual speedup numbers, accuracy comparisons, and implementation examples, see Exact vs Approximate seek mode: Performance and accuracy comparison and Decoding with custom frame mappings

3. Multi-threading for Parallel Decoding

When decoding multiple videos or decoding a large number of frames from a single video, there are a few parallelization strategies to speed up the decoding process:

  • FFmpeg-based parallelism - Using FFmpeg’s internal threading capabilities for intra-frame parallelism, where parallelization happens within individual frames rather than across frames. For that, use the num_ffmpeg_threads parameter of the VideoDecoder

  • Multiprocessing - Distributing work across multiple processes

  • Multithreading - Using multiple threads within a single process

You can use both multiprocessing and multithreading to decode multiple videos in parallel, or to decode a single long video in parallel by splitting it into chunks.

Note

For complete examples comparing sequential, ffmpeg-based parallelism, multi-process, and multi-threaded approaches, see Parallel video decoding: multi-processing and multi-threading

4. CUDA Acceleration

TorchCodec supports GPU-accelerated decoding using NVIDIA’s hardware decoder (NVDEC) on supported hardware. This keeps decoded tensors in GPU memory, avoiding expensive CPU-GPU transfers for downstream GPU operations.

Checking for CPU Fallback

In some cases, CUDA decoding may silently fall back to CPU decoding when the video codec or format is not supported by NVDEC. You can detect this using the cpu_fallback attribute:

with set_cuda_backend("beta"):
    decoder = VideoDecoder("file.mp4", device="cuda")

# Print detailed fallback status
print(decoder.cpu_fallback)

Note

The timing of when you can detect CPU fallback differs between backends: with the FFmpeg backend, you can only check fallback status after decoding at least one frame, because FFmpeg determines codec support lazily during decoding; with the BETA backend, you can check fallback status immediately after decoder creation, as the backend checks codec support upfront.

For installation instructions, detailed examples, and visual comparisons between CPU and CUDA decoding, see Accelerated video decoding on GPUs with CUDA and NVDEC

Conclusion

TorchCodec offers multiple performance optimization strategies, each suited to different scenarios. Use batch APIs for multi-frame decoding, approximate mode for faster initialization, parallel processing for high throughput, and CUDA acceleration to offload the CPU.

The best results often come from combining techniques. Profile your specific use case and apply optimizations incrementally, using the benchmarks in the linked examples as a guide.

For more information, see:

Total running time of the script: (0 minutes 0.000 seconds)

Gallery generated by Sphinx-Gallery

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources