VideoEncoder#

class torchcodec.encoders.VideoEncoder(frames: Tensor, *, frame_rate: float)[source]#

A single-stream video encoder on CPU or CUDA.

Note

This is a convenience class for simple, one-shot video encoding. For multi-stream encoding (e.g. video + audio), incremental encoding, or mixing CPU and CUDA streams, use Encoder instead. See Encoding video with the Encoder for a tutorial.

Parameters:

frames (torch.Tensor) – The frames to encode. This must be a 4D tensor of shape (N, C, H, W) where N is the number of frames, C is 3 channels (RGB), H is height, and W is width. Values must be uint8 in the range [0, 255]. The tensor can be on CPU or CUDA. The device of the tensor determines which encoder is used (CPU or GPU).
frame_rate (float) – The frame rate of the input frames. Also defines the encoded output frame rate.

Examples using VideoEncoder:

Encoding video with the Encoder

Encode frames into a file.

Parameters:

dest (str or pathlib.Path) – The path to the output file, e.g. video.mp4. The extension of the file determines the video container format.
codec (str, optional) – The codec to use for encoding (e.g., “libx264”, “h264”). If not specified, the default codec for the container format will be used. See Codec Selection for details.
pixel_format (str, optional) – The pixel format for encoding (e.g., “yuv420p”, “yuv444p”). If not specified, uses codec’s default format. Must be left as None when encoding CUDA tensors. See Pixel Format for details.
crf (int or float, optional) – Constant Rate Factor for encoding quality. Lower values mean better quality. Valid range depends on the encoder (e.g. 0-51 for libx264). Defaults to None (which will use encoder’s default). See CRF (Constant Rate Factor) for details.
preset (str or int, optional) – Encoder option that controls the tradeoff between encoding encoding speed and compression (output size). Valid on the encoder (commonly a string: “fast”, “medium”, “slow”). Defaults to None (which will use encoder’s default). See Preset for details.
extra_options (dict[str, Any], optional) – A dictionary of additional encoder options to pass, e.g. {"qp": 5, "tune": "film"}. See Extra Options for details.

Encode frames into a file-like object.

Parameters:

file_like – A file-like object that supports write() and seek() methods, such as io.BytesIO(), an open file in binary write mode, etc. Methods must have the following signature: write(data: bytes) -> int and seek(offset: int, whence: int = 0) -> int.
format (str) – The container format of the encoded frames, e.g. “mp4”, “mov”, “mkv”, “avi”, “webm”, “flv”, etc.
codec (str, optional) – The codec to use for encoding (e.g., “libx264”, “h264”). If not specified, the default codec for the container format will be used. See Codec Selection for details.
pixel_format (str, optional) – The pixel format for encoding (e.g., “yuv420p”, “yuv444p”). If not specified, uses codec’s default format. Must be left as None when encoding CUDA tensors. See Pixel Format for details.
crf (int or float, optional) – Constant Rate Factor for encoding quality. Lower values mean better quality. Valid range depends on the encoder (e.g. 0-51 for libx264). Defaults to None (which will use encoder’s default). See CRF (Constant Rate Factor) for details.
preset (str or int, optional) – Encoder option that controls the tradeoff between encoding encoding speed and compression (output size). Valid on the encoder (commonly a string: “fast”, “medium”, “slow”). Defaults to None (which will use encoder’s default). See Preset for details.
extra_options (dict[str, Any], optional) – A dictionary of additional encoder options to pass, e.g. {"qp": 5, "tune": "film"}. See Extra Options for details.

Encode frames into raw bytes, as a 1D uint8 Tensor.

Parameters:

format (str) – The container format of the encoded frames, e.g. “mp4”, “mov”, “mkv”, “avi”, “webm”, “flv”, etc.
codec (str, optional) – The codec to use for encoding (e.g., “libx264”, “h264”). If not specified, the default codec for the container format will be used. See Codec Selection for details.
pixel_format (str, optional) – The pixel format to encode frames into (e.g., “yuv420p”, “yuv444p”). If not specified, uses codec’s default format. Must be left as None when encoding CUDA tensors. See Pixel Format for details.
crf (int or float, optional) – Constant Rate Factor for encoding quality. Lower values mean better quality. Valid range depends on the encoder (e.g. 0-51 for libx264). Defaults to None (which will use encoder’s default). See CRF (Constant Rate Factor) for details.
preset (str or int, optional) – Encoder option that controls the tradeoff between encoding encoding speed and compression (output size). Valid on the encoder (commonly a string: “fast”, “medium”, “slow”). Defaults to None (which will use encoder’s default). See Preset for details.
extra_options (dict[str, Any], optional) – A dictionary of additional encoder options to pass, e.g. {"qp": 5, "tune": "film"}. See Extra Options for details.

Returns:

The raw encoded bytes as 1D uint8 Tensor on CPU regardless of the device of the input frames.

Return type:

Tensor

VideoEncoder#

Docs

Tutorials

Resources