Encoder#
- class torchcodec.encoders.Encoder[source]#
A multi-stream encoder for encoding video and/or audio streams.
Unlike
VideoEncoderandAudioEncoderwhich encode a single stream in one shot,Encodersupports multiple streams and incremental (streaming) encoding. Frames and samples can be added progressively, which is useful when data is generated on-the-fly or when encoding both audio and video into the same container.Use
add_video()andadd_audio()to configure output streams, then open an output destination withopen_file()oropen_file_like(), feed data via the returned stream objects, and finally callclose()(or use the encoder as a context manager).Example
encoder = Encoder() video_stream = encoder.add_video(height=256, width=256, frame_rate=30) audio_stream = encoder.add_audio(sample_rate=16000, num_channels=1) with encoder.open_file("output.mp4"): video_stream.add_frames(frames_tensor) audio_stream.add_samples(samples_tensor) # Add more frames by calling video_stream.add_frames again # Add more samples by calling audio_stream.add_samples again
To encode to a file-like object (e.g.
io.BytesIO()), useopen_file_like()instead:import io buf = io.BytesIO() encoder = Encoder() video_stream = encoder.add_video(height=256, width=256, frame_rate=30) with encoder.open_file_like(buf, format="mp4"): video_stream.add_frames(frames_tensor) encoded_bytes = buf.getvalue() # Optionally convert to a uint8 tensor of bytes with # bytes_tensor = torch.frombuffer(encoded_bytes, dtype=torch.uint8)
Examples using
Encoder:- add_audio(*, sample_rate: int, num_channels: int, bit_rate: int | None = None, out_num_channels: int | None = None, out_sample_rate: int | None = None) AudioStream[source]#
Add an audio stream to the encoder.
Must be called before
open_file()oropen_file_like().- Parameters:
sample_rate (int) – The sample rate of the input samples.
num_channels (int) – The number of channels of the input samples.
bit_rate (int, optional) – The output bit rate. Encoders typically support a finite set of bit rate values, so
bit_ratewill be matched to one of those supported values. The default is chosen by FFmpeg.out_num_channels (int, optional) – The number of channels of the encoded output. By default, the input
num_channelsis used.out_sample_rate (int, optional) – The sample rate of the encoded output. By default, the input
sample_rateis used.
- Returns:
An audio stream object. Use its
add_samples()method to feed samples into the stream.
- add_video(*, height: int, width: int, frame_rate: float, device: str | device | None = None, codec: str | None = None, pixel_format: str | None = None, crf: int | float | None = None, preset: str | int | None = None, extra_options: dict[str, Any] | None = None) VideoStream[source]#
Add a video stream to the encoder.
Must be called before
open_file()oropen_file_like().- Parameters:
height (int) – The height of the input video frames.
width (int) – The width of the input video frames.
frame_rate (float) – The frame rate of the input video frames. Also defines the encoded output frame rate.
device (str or torch.device, optional) – The device to use for encoding, e.g.
"cpu"or"cuda". IfNone(default), uses the current default device.codec (str, optional) – The codec to use for encoding (e.g.,
"libx264"). If not specified, the default codec for the container format will be used. See Codec Selection for details.pixel_format (str, optional) – The pixel format for encoding (e.g.,
"yuv420p"). If not specified, uses codec’s default format. Must be left asNonewhen encoding on CUDA. See Pixel Format for details.crf (int or float, optional) – Constant Rate Factor for encoding quality. Lower values mean better quality. Valid range depends on the encoder (e.g. 0-51 for libx264). Defaults to None (which will use encoder’s default). See CRF (Constant Rate Factor) for details.
preset (str or int, optional) – Encoder option that controls the tradeoff between encoding speed and compression (output size). Commonly a string:
"fast","medium","slow". Defaults to None (which will use encoder’s default). See Preset for details.extra_options (dict[str, Any], optional) – A dictionary of additional encoder options to pass, e.g.
{"qp": 5, "tune": "film"}. See Extra Options for details.
- Returns:
A video stream object. Use its
add_frames()method to feed frames into the stream.
- close() None[source]#
Flush all remaining data and close the encoder.
This must be called when encoding is complete to ensure all buffered data is written. Using the encoder as a context manager (
withstatement) calls this automatically.
- open_file(dest: str | Path) Encoder[source]#
Open a file for writing the encoded output.
Must be called after all streams have been added via
add_video()and/oradd_audio(). The file extension determines the container format (e.g..mp4,.mkv).- Parameters:
dest (str or
pathlib.Path) – The path to the output file.- Returns:
Returns
selffor method chaining.- Return type:
- open_file_like(dest, *, format: str) Encoder[source]#
Open a file-like object for writing the encoded output.
Must be called after all streams have been added via
add_video()and/oradd_audio().- Parameters:
dest – A file-like object that supports
write()andseek()methods, such asio.BytesIO(), an open file in binary write mode, etc. Methods must have the following signature:write(data: bytes) -> intandseek(offset: int, whence: int = 0) -> int.format (str) – The container format of the encoded output, e.g.
"mp4","mov","mkv","avi","webm", etc.
- Returns:
Returns
selffor method chaining.- Return type: