.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "generated_examples/encoding/audio_encoding.py"
.. LINE NUMBERS ARE GIVEN BELOW.
.. rst-class:: sphx-glr-example-title
.. _sphx_glr_generated_examples_encoding_audio_encoding.py:
========================================
Encoding audio samples with AudioEncoder
========================================
In this example, we'll learn how to encode audio samples to a file or to raw
bytes using the :class:`~torchcodec.encoders.AudioEncoder` class.
.. note::
This is a convenience class for simple, one-shot audio encoding. For
multi-stream encoding (e.g. video + audio), incremental encoding, or
encoding multiple audio streams, use
:class:`~torchcodec.encoders.Encoder` instead. See
:ref:`sphx_glr_generated_examples_encoding_multi_stream_encoding.py` for
a tutorial.
.. GENERATED FROM PYTHON SOURCE LINES 25-27
Let's first generate some samples to be encoded. The data to be encoded could
also just come from an :class:`~torchcodec.decoders.AudioDecoder`!
.. GENERATED FROM PYTHON SOURCE LINES 27-44
.. code-block:: Python
import torch
from IPython.display import Audio as play_audio
def make_sinewave() -> tuple[torch.Tensor, int]:
freq_A = 440 # Hz
sample_rate = 16000 # Hz
duration_seconds = 3 # seconds
t = torch.linspace(0, duration_seconds, int(sample_rate * duration_seconds), dtype=torch.float32)
return torch.sin(2 * torch.pi * freq_A * t), sample_rate
samples, sample_rate = make_sinewave()
print(f"Encoding samples with {samples.shape = } and {sample_rate = }")
play_audio(samples, rate=sample_rate)
.. rst-class:: sphx-glr-script-out
.. code-block:: none
Encoding samples with samples.shape = torch.Size([48000]) and sample_rate = 16000
.. raw:: html
.. GENERATED FROM PYTHON SOURCE LINES 46-57
We first instantiate an :class:`~torchcodec.encoders.AudioEncoder`. We pass it
the samples to be encoded. The samples must be a 2D tensors of shape
``(num_channels, num_samples)``, or in this case, a 1D tensor where
``num_channels`` is assumed to be 1. The values must be float values
normalized in ``[-1, 1]``: this is also what the
:class:`~torchcodec.decoders.AudioDecoder` would return.
.. note::
The ``sample_rate`` parameter corresponds to the sample rate of the
*input*, not the desired encoded sample rate.
.. GENERATED FROM PYTHON SOURCE LINES 57-62
.. code-block:: Python
from torchcodec.encoders import AudioEncoder
encoder = AudioEncoder(samples=samples, sample_rate=sample_rate)
.. GENERATED FROM PYTHON SOURCE LINES 63-70
:class:`~torchcodec.encoders.AudioEncoder` supports encoding samples into a
file via the :meth:`~torchcodec.encoders.AudioEncoder.to_file` method, or to
raw bytes via :meth:`~torchcodec.encoders.AudioEncoder.to_tensor`. For the
purpose of this tutorial we'll use
:meth:`~torchcodec.encoders.AudioEncoder.to_tensor`, so that we can easily
re-decode the encoded samples and check their properies. The
:meth:`~torchcodec.encoders.AudioEncoder.to_file` method works very similarly.
.. GENERATED FROM PYTHON SOURCE LINES 70-75
.. code-block:: Python
encoded_samples = encoder.to_tensor(format="mp3")
print(f"{encoded_samples.shape = }, {encoded_samples.dtype = }")
.. rst-class:: sphx-glr-script-out
.. code-block:: none
encoded_samples.shape = torch.Size([9512]), encoded_samples.dtype = torch.uint8
.. GENERATED FROM PYTHON SOURCE LINES 76-80
That's it!
Now that we have our encoded data, we can decode it back, to make sure it
looks and sounds as expected:
.. GENERATED FROM PYTHON SOURCE LINES 80-87
.. code-block:: Python
from torchcodec.decoders import AudioDecoder
samples_back = AudioDecoder(encoded_samples).get_all_samples()
print(samples_back)
play_audio(samples_back.data, rate=samples_back.sample_rate)
.. rst-class:: sphx-glr-script-out
.. code-block:: none
AudioSamples:
data (shape): torch.Size([1, 48000])
pts_seconds: 0.0690625
duration_seconds: 3.0
sample_rate: 16000
.. raw:: html
.. GENERATED FROM PYTHON SOURCE LINES 88-91
The encoder supports some encoding options that allow you to change how to
data is encoded. For example, we can decide to encode our mono data (1
channel) into stereo data (2 channels), and to specify an output sample rate:
.. GENERATED FROM PYTHON SOURCE LINES 91-100
.. code-block:: Python
desired_sample_rate = 32000
encoded_samples = encoder.to_tensor(format="wav", num_channels=2, sample_rate=desired_sample_rate)
stereo_samples_back = AudioDecoder(encoded_samples).get_all_samples()
print(stereo_samples_back)
play_audio(stereo_samples_back.data, rate=desired_sample_rate)
.. rst-class:: sphx-glr-script-out
.. code-block:: none
AudioSamples:
data (shape): torch.Size([2, 96000])
pts_seconds: 0.0
duration_seconds: 3.0
sample_rate: 32000
.. raw:: html
.. GENERATED FROM PYTHON SOURCE LINES 101-103
Check the docstring of the encoding methods to learn about the different
encoding options.
.. rst-class:: sphx-glr-timing
**Total running time of the script:** (0 minutes 0.029 seconds)
.. _sphx_glr_download_generated_examples_encoding_audio_encoding.py:
.. only:: html
.. container:: sphx-glr-footer sphx-glr-footer-example
.. container:: sphx-glr-download sphx-glr-download-jupyter
:download:`Download Jupyter notebook: audio_encoding.ipynb `
.. container:: sphx-glr-download sphx-glr-download-python
:download:`Download Python source code: audio_encoding.py `
.. container:: sphx-glr-download sphx-glr-download-zip
:download:`Download zipped: audio_encoding.zip `
.. only:: html
.. rst-class:: sphx-glr-signature
`Gallery generated by Sphinx-Gallery `_