.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "generated_examples/decoding/audio_decoding.py" .. LINE NUMBERS ARE GIVEN BELOW. .. rst-class:: sphx-glr-example-title .. _sphx_glr_generated_examples_decoding_audio_decoding.py: ======================================== Decoding audio streams with AudioDecoder ======================================== In this example, we'll learn how to decode an audio file using the :class:`~torchcodec.decoders.AudioDecoder` class. If you're decoding WAV files, also check out the :ref:`wav_decoder_section` section below. .. GENERATED FROM PYTHON SOURCE LINES 19-22 First, a bit of boilerplate: we'll download an audio file from the web and define an audio playing utility. You can ignore that part and jump right below to :ref:`creating_decoder_audio`. .. GENERATED FROM PYTHON SOURCE LINES 22-41 .. code-block:: Python import requests from IPython.display import Audio def play_5s(samples): # Play 5 seconds of the audio. Playing the entire file would take too much # space in our docs (~40Mb!). return Audio(samples.data[:, :5 * samples.sample_rate], rate=samples.sample_rate) # Audio source is CC0: https://opengameart.org/content/town-theme-rpg # Attribution: cynicmusic.com pixelsphere.org url = "https://opengameart.org/sites/default/files/TownTheme.mp3" response = requests.get(url, headers={"User-Agent": ""}) if response.status_code != 200: raise RuntimeError(f"Failed to download video. {response.status_code = }.") raw_audio_bytes = response.content .. GENERATED FROM PYTHON SOURCE LINES 43-51 .. _creating_decoder_audio: Creating a decoder ------------------ We can now create a decoder from the raw (encoded) audio bytes. You can of course use a local audio file and pass the path as input. You can also decode audio streams from videos! .. GENERATED FROM PYTHON SOURCE LINES 51-56 .. code-block:: Python from torchcodec.decoders import AudioDecoder decoder = AudioDecoder(raw_audio_bytes) .. GENERATED FROM PYTHON SOURCE LINES 57-60 The has not yet been decoded by the decoder, but we already have access to some metadata via the ``metadata`` attribute which is an :class:`~torchcodec.decoders.AudioStreamMetadata` object. .. GENERATED FROM PYTHON SOURCE LINES 60-62 .. code-block:: Python print(decoder.metadata) .. rst-class:: sphx-glr-script-out .. code-block:: none AudioStreamMetadata: duration_seconds_from_header: 97.48897959183674 begin_stream_seconds_from_header: 0.02505668934240363 bit_rate: 108039 codec: mp3 stream_index: 0 duration_seconds: 97.48897959183674 begin_stream_seconds: 0.02505668934240363 sample_rate: 44100 num_channels: 2 sample_format: fltp .. GENERATED FROM PYTHON SOURCE LINES 63-69 Decoding samples ---------------- To get decoded samples, we just need to call the :meth:`~torchcodec.decoders.AudioDecoder.get_all_samples` method, which returns an :class:`~torchcodec.AudioSamples` object: .. GENERATED FROM PYTHON SOURCE LINES 69-75 .. code-block:: Python samples = decoder.get_all_samples() print(samples) play_5s(samples) .. rst-class:: sphx-glr-script-out .. code-block:: none AudioSamples: data (shape): torch.Size([2, 4297722]) pts_seconds: 0.02505668934240363 duration_seconds: 97.45401360544217 sample_rate: 44100 .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 76-90 The ``.data`` field is a tensor of shape ``(num_channels, num_samples)`` and of float dtype with values in [-1, 1]. The ``.pts_seconds`` field indicates the starting time of the output samples. Here it's 0.025 seconds, even though we asked for samples starting from 0. Not all streams start exactly at 0! This is not a bug in TorchCodec, this is a property of the file that was defined when it was encoded. Specifying a range ------------------ If we don't need all the samples, we can use :meth:`~torchcodec.decoders.AudioDecoder.get_samples_played_in_range` to decode the samples within a custom range: .. GENERATED FROM PYTHON SOURCE LINES 90-96 .. code-block:: Python samples = decoder.get_samples_played_in_range(start_seconds=10, stop_seconds=70) print(samples) play_5s(samples) .. rst-class:: sphx-glr-script-out .. code-block:: none AudioSamples: data (shape): torch.Size([2, 2646000]) pts_seconds: 10.0 duration_seconds: 60.0 sample_rate: 44100 .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 97-104 Custom sample rate ------------------ We can also decode the samples into a desired sample rate using the ``sample_rate`` parameter of :class:`~torchcodec.decoders.AudioDecoder`. The ouput will sound similar, but note that the number of samples greatly decreased: .. GENERATED FROM PYTHON SOURCE LINES 104-111 .. code-block:: Python decoder = AudioDecoder(raw_audio_bytes, sample_rate=16_000) samples = decoder.get_all_samples() print(samples) play_5s(samples) .. rst-class:: sphx-glr-script-out .. code-block:: none AudioSamples: data (shape): torch.Size([2, 1559264]) pts_seconds: 0.02505668934240363 duration_seconds: 97.454 sample_rate: 16000 .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 112-125 .. _wav_decoder_section: WavDecoder for WAV files ------------------------ If your audio source is a WAV file and you don't need resampling or channel remixing, you can use :class:`~torchcodec.decoders.WavDecoder` for significantly faster decoding. It has the same :meth:`~torchcodec.decoders.WavDecoder.get_all_samples` and :meth:`~torchcodec.decoders.WavDecoder.get_samples_played_in_range` methods as :class:`~torchcodec.decoders.AudioDecoder`. See :ref:`sphx_glr_generated_examples_decoding_performance_tips.py` for more details. .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 0.955 seconds) .. _sphx_glr_download_generated_examples_decoding_audio_decoding.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: audio_decoding.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: audio_decoding.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: audio_decoding.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_