Rate this Page

Encoder#

class torchcodec.encoders.Encoder[source]#

A multi-stream encoder for encoding video and/or audio streams.

Unlike VideoEncoder and AudioEncoder which encode a single stream in one shot, Encoder supports multiple streams and incremental (streaming) encoding. Frames and samples can be added progressively, which is useful when data is generated on-the-fly or when encoding both audio and video into the same container.

Use add_video() and add_audio() to configure output streams, then open an output destination with open_file() or open_file_like(), feed data via the returned stream objects, and finally call close() (or use the encoder as a context manager).

Example

encoder = Encoder()
video_stream = encoder.add_video(height=256, width=256, frame_rate=30)
audio_stream = encoder.add_audio(sample_rate=16000, num_channels=1)
with encoder.open_file("output.mp4"):
    video_stream.add_frames(frames_tensor)
    audio_stream.add_samples(samples_tensor)
    # Add more frames by calling video_stream.add_frames again
    # Add more samples by calling audio_stream.add_samples again

To encode to a file-like object (e.g. io.BytesIO()), use open_file_like() instead:

import io

buf = io.BytesIO()
encoder = Encoder()
video_stream = encoder.add_video(height=256, width=256, frame_rate=30)
with encoder.open_file_like(buf, format="mp4"):
    video_stream.add_frames(frames_tensor)
encoded_bytes = buf.getvalue()
# Optionally convert to a uint8 tensor of bytes with
# bytes_tensor = torch.frombuffer(encoded_bytes, dtype=torch.uint8)

Examples using Encoder:

Encoding audio samples with AudioEncoder

Encoding audio samples with AudioEncoder

Encoding audio and video streams with the Encoder

Encoding audio and video streams with the Encoder

Encoding video with the Encoder

Encoding video with the Encoder
add_audio(*, sample_rate: int, num_channels: int, bit_rate: int | None = None, out_num_channels: int | None = None, out_sample_rate: int | None = None) AudioStream[source]#

Add an audio stream to the encoder.

Must be called before open_file() or open_file_like().

Parameters:
  • sample_rate (int) – The sample rate of the input samples.

  • num_channels (int) – The number of channels of the input samples.

  • bit_rate (int, optional) – The output bit rate. Encoders typically support a finite set of bit rate values, so bit_rate will be matched to one of those supported values. The default is chosen by FFmpeg.

  • out_num_channels (int, optional) – The number of channels of the encoded output. By default, the input num_channels is used.

  • out_sample_rate (int, optional) – The sample rate of the encoded output. By default, the input sample_rate is used.

Returns:

An audio stream object. Use its add_samples() method to feed samples into the stream.

add_video(*, height: int, width: int, frame_rate: float, device: str | device | None = None, codec: str | None = None, pixel_format: str | None = None, crf: int | float | None = None, preset: str | int | None = None, extra_options: dict[str, Any] | None = None) VideoStream[source]#

Add a video stream to the encoder.

Must be called before open_file() or open_file_like().

Parameters:
  • height (int) – The height of the input video frames.

  • width (int) – The width of the input video frames.

  • frame_rate (float) – The frame rate of the input video frames. Also defines the encoded output frame rate.

  • device (str or torch.device, optional) – The device to use for encoding, e.g. "cpu" or "cuda". If None (default), uses the current default device.

  • codec (str, optional) – The codec to use for encoding (e.g., "libx264"). If not specified, the default codec for the container format will be used. See Codec Selection for details.

  • pixel_format (str, optional) – The pixel format for encoding (e.g., "yuv420p"). If not specified, uses codec’s default format. Must be left as None when encoding on CUDA. See Pixel Format for details.

  • crf (int or float, optional) – Constant Rate Factor for encoding quality. Lower values mean better quality. Valid range depends on the encoder (e.g. 0-51 for libx264). Defaults to None (which will use encoder’s default). See CRF (Constant Rate Factor) for details.

  • preset (str or int, optional) – Encoder option that controls the tradeoff between encoding speed and compression (output size). Commonly a string: "fast", "medium", "slow". Defaults to None (which will use encoder’s default). See Preset for details.

  • extra_options (dict[str, Any], optional) – A dictionary of additional encoder options to pass, e.g. {"qp": 5, "tune": "film"}. See Extra Options for details.

Returns:

A video stream object. Use its add_frames() method to feed frames into the stream.

close() None[source]#

Flush all remaining data and close the encoder.

This must be called when encoding is complete to ensure all buffered data is written. Using the encoder as a context manager (with statement) calls this automatically.

open_file(dest: str | Path) Encoder[source]#

Open a file for writing the encoded output.

Must be called after all streams have been added via add_video() and/or add_audio(). The file extension determines the container format (e.g. .mp4, .mkv).

Parameters:

dest (str or pathlib.Path) – The path to the output file.

Returns:

Returns self for method chaining.

Return type:

Encoder

open_file_like(dest, *, format: str) Encoder[source]#

Open a file-like object for writing the encoded output.

Must be called after all streams have been added via add_video() and/or add_audio().

Parameters:
  • dest – A file-like object that supports write() and seek() methods, such as io.BytesIO(), an open file in binary write mode, etc. Methods must have the following signature: write(data: bytes) -> int and seek(offset: int, whence: int = 0) -> int.

  • format (str) – The container format of the encoded output, e.g. "mp4", "mov", "mkv", "avi", "webm", etc.

Returns:

Returns self for method chaining.

Return type:

Encoder