Decoding with custom frame mappings¶

In this example, we will describe the custom_frame_mappings parameter of the VideoDecoder class. This parameter allows you to provide pre-computed frame mapping information to speed up VideoDecoder instantiation, while maintaining the frame seeking accuracy of seek_mode="exact".

This makes it ideal for workflows where:

Frame accuracy is critical, so approximate mode cannot be used

Videos can be preprocessed once and then decoded many times

First, some boilerplate: we’ll download a short video from the web, and use ffmpeg to create a longer version by repeating it multiple times. We’ll end up with two videos: a short one of approximately 14 seconds and a long one of about 12 minutes. You can ignore this part and skip below to Creating custom frame mappings with ffprobe.

import tempfile
from pathlib import Path
import subprocess
import requests

# Video source: https://www.pexels.com/video/dog-eating-854132/
# License: CC0. Author: Coverr.
url = "https://videos.pexels.com/video-files/854132/854132-sd_640_360_25fps.mp4"
response = requests.get(url, headers={"User-Agent": ""})
if response.status_code != 200:
    raise RuntimeError(f"Failed to download video. {response.status_code = }.")

temp_dir = tempfile.mkdtemp()
short_video_path = Path(temp_dir) / "short_video.mp4"
with open(short_video_path, 'wb') as f:
    for chunk in response.iter_content():
        f.write(chunk)

long_video_path = Path(temp_dir) / "long_video.mp4"
ffmpeg_command = [
    "ffmpeg",
    "-stream_loop", "50",  # repeat video 50 times to get a ~12 min video
    "-i", f"{short_video_path}",
    "-c", "copy",
    f"{long_video_path}"
]
subprocess.run(ffmpeg_command, check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

from torchcodec.decoders import VideoDecoder
print(f"Short video duration: {VideoDecoder(short_video_path).metadata.duration_seconds} seconds")
print(f"Long video duration: {VideoDecoder(long_video_path).metadata.duration_seconds / 60} minutes")

Short video duration: 13.8 seconds
Long video duration: 11.729999999999999 minutes

Creating custom frame mappings with ffprobe¶

To generate JSON files containing the required video metadata, we recommend using ffprobe. The following frame metadata fields are needed (the pkt_ prefix is needed for older versions of FFmpeg):

pts / pkt_pts: Presentation timestamps for each frame
duration / pkt_duration: Duration of each frame
key_frame: Boolean indicating which frames are key frames

from pathlib import Path
import subprocess
import tempfile
from time import perf_counter_ns
import json


# Lets define a simple function to run ffprobe on a video's first stream index, then writes the results in output_json_path.
def generate_frame_mappings(video_path, output_json_path, stream_index):
    ffprobe_cmd = ["ffprobe", "-i", f"{video_path}", "-select_streams", f"{stream_index}", "-show_frames", "-show_entries", "frame=pts,duration,key_frame", "-of", "json"]
    print(f"Running ffprobe:\n{' '.join(ffprobe_cmd)}\n")
    ffprobe_result = subprocess.run(ffprobe_cmd, check=True, capture_output=True, text=True)
    with open(output_json_path, "w") as f:
        f.write(ffprobe_result.stdout)


stream_index = 0
long_json_path = Path(temp_dir) / "long_custom_frame_mappings.json"
short_json_path = Path(temp_dir) / "short_custom_frame_mappings.json"

generate_frame_mappings(long_video_path, long_json_path, stream_index)
generate_frame_mappings(short_video_path, short_json_path, stream_index)
with open(short_json_path) as f:
    sample_data = json.loads(f.read())
print("Sample of fields in custom frame mappings:")
for frame in sample_data["frames"][:3]:
    print(f"{frame['key_frame'] = }, {frame['pts'] = }, {frame['duration'] = }")

Running ffprobe:
ffprobe -i /tmp/tmpx4qs2oi7/long_video.mp4 -select_streams 0 -show_frames -show_entries frame=pts,duration,key_frame -of json

Running ffprobe:
ffprobe -i /tmp/tmpx4qs2oi7/short_video.mp4 -select_streams 0 -show_frames -show_entries frame=pts,duration,key_frame -of json

Sample of fields in custom frame mappings:
frame['key_frame'] = 1, frame['pts'] = 0, frame['duration'] = 1
frame['key_frame'] = 0, frame['pts'] = 1, frame['duration'] = 1
frame['key_frame'] = 0, frame['pts'] = 2, frame['duration'] = 1

Performance: `VideoDecoder` creation¶

Custom frame mappings affect the creation of a VideoDecoder object. As video length or resolution increases, the performance gain compared to exact mode increases.

import torch


# Here, we define a benchmarking function, with the option to seek to the start of a file_like.
def bench(f, file_like=False, average_over=50, warmup=2, **f_kwargs):
    for _ in range(warmup):
        f(**f_kwargs)
        if file_like:
            f_kwargs["custom_frame_mappings"].seek(0)

    times = []
    for _ in range(average_over):
        start = perf_counter_ns()
        f(**f_kwargs)
        end = perf_counter_ns()
        times.append(end - start)
        if file_like:
            f_kwargs["custom_frame_mappings"].seek(0)

    times = torch.tensor(times) * 1e-6  # ns to ms
    std = times.std().item()
    med = times.median().item()
    print(f"{med = :.2f}ms +- {std:.2f}")


for video_path, json_path in ((short_video_path, short_json_path), (long_video_path, long_json_path)):
    print(f"\nRunning benchmarks on {Path(video_path).name}")

    print("Creating a VideoDecoder object with custom_frame_mappings:")
    with open(json_path, "r") as f:
        bench(VideoDecoder, file_like=True, source=video_path, stream_index=stream_index, custom_frame_mappings=f)

    # Compare against exact seek_mode
    print("Creating a VideoDecoder object with seek_mode='exact':")
    bench(VideoDecoder, source=video_path, stream_index=stream_index, seek_mode="exact")

Running benchmarks on short_video.mp4
Creating a VideoDecoder object with custom_frame_mappings:
med = 7.67ms +- 0.02
Creating a VideoDecoder object with seek_mode='exact':
med = 8.00ms +- 0.03

Running benchmarks on long_video.mp4
Creating a VideoDecoder object with custom_frame_mappings:
med = 33.59ms +- 0.31
Creating a VideoDecoder object with seek_mode='exact':
med = 59.50ms +- 0.73

Performance: Frame decoding with custom frame mappings¶

Although using custom_frame_mappings only impacts the initialization speed of VideoDecoder, decoding workflows involve creating a VideoDecoder instance, so the performance benefits are realized.

def decode_frames(video_path, seek_mode = "exact", custom_frame_mappings = None):
    decoder = VideoDecoder(
        source=video_path,
        seek_mode=seek_mode,
        custom_frame_mappings=custom_frame_mappings
    )
    decoder.get_frames_in_range(start=0, stop=10)


for video_path, json_path in ((short_video_path, short_json_path), (long_video_path, long_json_path)):
    print(f"\nRunning benchmarks on {Path(video_path).name}")
    print("Decoding frames with custom_frame_mappings:")
    with open(json_path, "r") as f:
        bench(decode_frames, file_like=True, video_path=video_path, custom_frame_mappings=f)

    print("Decoding frames with seek_mode='exact':")
    bench(decode_frames, video_path=video_path, seek_mode="exact")

Running benchmarks on short_video.mp4
Decoding frames with custom_frame_mappings:
med = 23.32ms +- 0.04
Decoding frames with seek_mode='exact':
med = 23.63ms +- 0.04

Running benchmarks on long_video.mp4
Decoding frames with custom_frame_mappings:
med = 49.27ms +- 0.12
Decoding frames with seek_mode='exact':
med = 75.91ms +- 0.47

Accuracy: Metadata and frame retrieval¶

In addition to the instantiation speed up compared to seek_mode="exact", using custom frame mappings also retains the benefit of exact metadata and frame seeking.

print("Metadata of short video with custom_frame_mappings:")
with open(short_json_path, "r") as f:
    print(VideoDecoder(short_video_path, custom_frame_mappings=f).metadata)
print("Metadata of short video with seek_mode='exact':")
print(VideoDecoder(short_video_path, seek_mode="exact").metadata)

with open(short_json_path, "r") as f:
    custom_frame_mappings_decoder = VideoDecoder(short_video_path, custom_frame_mappings=f)
exact_decoder = VideoDecoder(short_video_path, seek_mode="exact")
for i in range(len(exact_decoder)):
    torch.testing.assert_close(
        exact_decoder.get_frame_at(i).data,
        custom_frame_mappings_decoder.get_frame_at(i).data,
        atol=0, rtol=0,
    )
print("Frame seeking is the same for this video!")

Metadata of short video with custom_frame_mappings:
VideoStreamMetadata:
  duration_seconds_from_header: 13.8
  begin_stream_seconds_from_header: 0.0
  bit_rate: 505790.0
  codec: h264
  stream_index: 0
  begin_stream_seconds_from_content: 0.0
  end_stream_seconds_from_content: 13.8
  width: 640
  height: 360
  num_frames_from_header: 345
  num_frames_from_content: 345
  average_fps_from_header: 25.0
  pixel_aspect_ratio: 1
  duration_seconds: 13.8
  begin_stream_seconds: 0.0
  end_stream_seconds: 13.8
  num_frames: 345
  average_fps: 25.0

Metadata of short video with seek_mode='exact':
VideoStreamMetadata:
  duration_seconds_from_header: 13.8
  begin_stream_seconds_from_header: 0.0
  bit_rate: 505790.0
  codec: h264
  stream_index: 0
  begin_stream_seconds_from_content: 0.0
  end_stream_seconds_from_content: 13.8
  width: 640
  height: 360
  num_frames_from_header: 345
  num_frames_from_content: 345
  average_fps_from_header: 25.0
  pixel_aspect_ratio: 1
  duration_seconds: 13.8
  begin_stream_seconds: 0.0
  end_stream_seconds: 13.8
  num_frames: 345
  average_fps: 25.0

Frame seeking is the same for this video!

How do custom_frame_mappings help?¶

Custom frame mappings contain the same frame index information that would normally be computed during the scan operation in exact mode. By providing this information to the VideoDecoder as a JSON, it eliminates the need for the expensive scan while preserving the accuracy benefits.

Which mode should I use?¶

For fastest decoding when speed is more important than exact seeking accuracy, “approximate” mode is recommended.
For exact frame seeking, custom frame mappings will benefit workflows where the same videos are decoded repeatedly, and some preprocessing work can be done.
For exact frame seeking without preprocessing, use “exact” mode.

Total running time of the script: (0 minutes 31.497 seconds)

Gallery generated by Sphinx-Gallery

Decoding with custom frame mappings¶

Creating custom frame mappings with ffprobe¶

Performance: `VideoDecoder` creation¶

Performance: Frame decoding with custom frame mappings¶

Accuracy: Metadata and frame retrieval¶

How do custom_frame_mappings help?¶

Which mode should I use?¶

Docs

Tutorials

Resources

Decoding with custom frame mappings¶

Creating custom frame mappings with ffprobe¶

Performance: VideoDecoder creation¶

Performance: Frame decoding with custom frame mappings¶

Accuracy: Metadata and frame retrieval¶

How do custom_frame_mappings help?¶

Which mode should I use?¶

Docs

Tutorials

Resources

Performance: `VideoDecoder` creation¶