.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "generated_examples/encoding/multi_stream_encoding.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_generated_examples_encoding_multi_stream_encoding.py:


=================================================
Encoding audio and video streams with the Encoder
=================================================

In this example, we'll learn how to encode multiple video and audio streams
into a single container using the :class:`~torchcodec.encoders.Encoder` class.
We'll also see how to feed audio samples and video frames incrementally, and how
to mix CPU and CUDA video streams.

For details on video encoding parameters (codec, CRF, preset, etc.), see
:ref:`sphx_glr_generated_examples_encoding_video_encoding.py`.

.. GENERATED FROM PYTHON SOURCE LINES 22-28

Video + audio encoding
-----------------------

Let's start by encoding a video alongside an audio track into the same MP4
file. We'll decode some video frames from an existing video and generate a
simple sine-wave audio tone.

.. GENERATED FROM PYTHON SOURCE LINES 28-60

.. code-block:: Python


    import subprocess
    import tempfile
    from pathlib import Path

    import requests
    import torch
    from torchcodec.decoders import VideoDecoder
    from torchcodec.encoders import Encoder


    # Video source: https://www.pexels.com/video/adorable-cats-on-the-lawn-4977395/
    # Author: Altaf Shah.
    url = "https://videos.pexels.com/video-files/4977395/4977395-hd_1920_1080_24fps.mp4"

    response = requests.get(url, headers={"User-Agent": ""})
    if response.status_code != 200:
        raise RuntimeError(f"Failed to download video. {response.status_code = }.")

    decoder = VideoDecoder(response.content)
    frames = decoder.get_frames_in_range(0, 60).data
    frame_rate = decoder.metadata.average_fps

    # Generate a 440 Hz sine wave that lasts as long as the video
    audio_sample_rate = 16000
    duration_seconds = len(frames) / frame_rate
    t = torch.linspace(
        0, duration_seconds, int(audio_sample_rate * duration_seconds),
        dtype=torch.float32,
    )
    audio_samples = torch.sin(2 * torch.pi * 440 * t).unsqueeze(0)  # shape: (1, num_samples)


.. GENERATED FROM PYTHON SOURCE LINES 62-67

Now we create an :class:`~torchcodec.encoders.Encoder`, add one video stream
and one audio stream, and encode everything into a single file. Each call to
:meth:`~torchcodec.encoders.Encoder.add_video` or
:meth:`~torchcodec.encoders.Encoder.add_audio` returns a stream object that
we use to feed data.

.. GENERATED FROM PYTHON SOURCE LINES 67-82

.. code-block:: Python


    output_path = tempfile.NamedTemporaryFile(suffix=".mp4", delete=False).name
    encoder = Encoder()
    video_stream = encoder.add_video(
        height=frames.shape[2], width=frames.shape[3], frame_rate=frame_rate,
    )
    audio_stream = encoder.add_audio(sample_rate=audio_sample_rate, num_channels=1)

    with encoder.open_file(output_path):
        video_stream.add_frames(frames)
        audio_stream.add_samples(audio_samples)

    print(f"Encoded video + audio to {output_path}")
    print(f"Output size: {Path(output_path).stat().st_size} bytes")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Encoded video + audio to /tmp/tmpk8afi88e.mp4
    Output size: 2526289 bytes


.. GENERATED FROM PYTHON SOURCE LINES 83-84

Let's verify that both streams are present in the output file:

.. GENERATED FROM PYTHON SOURCE LINES 84-95

.. code-block:: Python


    result = subprocess.run(
        [
            "ffprobe", "-v", "error",
            "-show_entries", "stream=index,codec_type,codec_name",
            "-of", "default=noprint_wrappers=1", output_path,
        ],
        capture_output=True, text=True,
    )
    print(result.stdout)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    index=0
    codec_name=h264
    codec_type=video
    index=1
    codec_name=aac
    codec_type=audio


.. GENERATED FROM PYTHON SOURCE LINES 96-107

Incremental encoding
---------------------

You don't need to have all your data ready upfront. You can call
:meth:`~torchcodec.encoders.VideoStream.add_frames` and
:meth:`~torchcodec.encoders.AudioStream.add_samples` multiple times to feed
data incrementally. This is useful when frames or samples are generated
on-the-fly (e.g. from a model or a processing pipeline).

Here, we'll split our frames and audio into chunks and feed them one batch at
a time:

.. GENERATED FROM PYTHON SOURCE LINES 107-130

.. code-block:: Python


    chunk_output = tempfile.NamedTemporaryFile(suffix=".mp4", delete=False).name
    encoder = Encoder()
    video_stream = encoder.add_video(
        height=frames.shape[2], width=frames.shape[3], frame_rate=frame_rate,
    )
    audio_stream = encoder.add_audio(sample_rate=audio_sample_rate, num_channels=1)

    video_chunk_size = 10
    samples_per_video_chunk = int(audio_sample_rate / frame_rate * video_chunk_size)

    with encoder.open_file(chunk_output):
        for i in range(0, len(frames), video_chunk_size):
            video_chunk = frames[i : i + video_chunk_size]
            video_stream.add_frames(video_chunk)

            audio_start = int(i / frame_rate * audio_sample_rate)
            audio_chunk = audio_samples[:, audio_start : audio_start + samples_per_video_chunk]
            audio_stream.add_samples(audio_chunk)

    print(f"Incrementally encoded to {chunk_output}")
    print(f"Output size: {Path(chunk_output).stat().st_size} bytes")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Incrementally encoded to /tmp/tmpzx93of9e.mp4
    Output size: 2526660 bytes


.. GENERATED FROM PYTHON SOURCE LINES 131-168

Multiple video streams, multiple audio streams
------------------------------------------------

You can add as many video and audio streams as you need. Each video stream can
independently target CPU or CUDA encoding — just pass the desired ``device``
to :meth:`~torchcodec.encoders.Encoder.add_video`. This means you can mix CPU
and CUDA video streams in the same container, for example encoding a
high-resolution stream on GPU for speed and a low-resolution stream on CPU.

Similarly, you can add multiple audio streams with different settings (sample
rate, number of channels, bit rate, etc.).

Here's an example with two video streams and two audio streams:

.. code-block:: python

    encoder = Encoder()

    # Two video streams: one on CPU, one on CUDA
    cpu_video = encoder.add_video(
        height=1080, width=1920, frame_rate=30,
        device="cpu",
    )
    cuda_video = encoder.add_video(
        height=720, width=1280, frame_rate=30,
        device="cuda",
    )

    # Two audio streams with different settings
    audio_en = encoder.add_audio(sample_rate=44100, num_channels=2)
    audio_fr = encoder.add_audio(sample_rate=44100, num_channels=2)

    with encoder.open_file("multi_stream_output.mkv"):
        cpu_video.add_frames(cpu_frames)
        cuda_video.add_frames(cuda_frames)
        audio_en.add_samples(english_samples)
        audio_fr.add_samples(french_samples)

.. GENERATED FROM PYTHON SOURCE LINES 170-180

Encoding to a file-like object
--------------------------------

Instead of encoding to a file path, you can encode to any file-like object
(e.g. ``io.BytesIO()``) using
:meth:`~torchcodec.encoders.Encoder.open_file_like`. This is useful for
example when you need to upload the encoded data directly to a remote server
or cloud storage without writing it to disk.  In this case, you must specify
the container ``format`` explicitly since there is no file extension to infer
it from.

.. GENERATED FROM PYTHON SOURCE LINES 180-200

.. code-block:: Python


    import io

    buf = io.BytesIO()
    encoder = Encoder()
    video_stream = encoder.add_video(
        height=frames.shape[2], width=frames.shape[3], frame_rate=frame_rate,
    )
    audio_stream = encoder.add_audio(sample_rate=audio_sample_rate, num_channels=1)

    with encoder.open_file_like(buf, format="mp4"):
        video_stream.add_frames(frames)
        audio_stream.add_samples(audio_samples)

    encoded_bytes = buf.getvalue()
    print(f"Encoded to BytesIO, size: {len(encoded_bytes)} bytes")

    # Or convert to a bytes tensor:
    bytes_tensor = torch.frombuffer(encoded_bytes, dtype=torch.uint8)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Encoded to BytesIO, size: 2526289 bytes


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 5.799 seconds)


.. _sphx_glr_download_generated_examples_encoding_multi_stream_encoding.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: multi_stream_encoding.ipynb <multi_stream_encoding.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: multi_stream_encoding.py <multi_stream_encoding.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: multi_stream_encoding.zip <multi_stream_encoding.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_