Decoding audio streams with AudioDecoder#

In this example, we’ll learn how to decode an audio file using the AudioDecoder class.

First, a bit of boilerplate: we’ll download an audio file from the web and define an audio playing utility. You can ignore that part and jump right below to Creating a decoder.

import requests
from IPython.display import Audio


def play_audio(samples):
    return Audio(samples.data, rate=samples.sample_rate)


# Audio source is CC0: https://opengameart.org/content/town-theme-rpg
# Attribution: cynicmusic.com pixelsphere.org
url = "https://opengameart.org/sites/default/files/TownTheme.mp3"
response = requests.get(url, headers={"User-Agent": ""})
if response.status_code != 200:
    raise RuntimeError(f"Failed to download video. {response.status_code = }.")

raw_audio_bytes = response.content

Creating a decoder#

We can now create a decoder from the raw (encoded) audio bytes. You can of course use a local audio file and pass the path as input. You can also decode audio streams from videos!

from torchcodec.decoders import AudioDecoder

decoder = AudioDecoder(raw_audio_bytes)

The has not yet been decoded by the decoder, but we already have access to some metadata via the metadata attribute which is an AudioStreamMetadata object.

print(decoder.metadata)

AudioStreamMetadata:
  duration_seconds_from_header: 97.48897959183674
  begin_stream_seconds_from_header: 0.02505668934240363
  bit_rate: 108039
  codec: mp3
  stream_index: 0
  duration_seconds: 97.48897959183674
  begin_stream_seconds: 0.02505668934240363
  sample_rate: 44100
  num_channels: 2
  sample_format: fltp

Decoding samples#

To get decoded samples, we just need to call the get_all_samples() method, which returns an AudioSamples object:

samples = decoder.get_all_samples()

print(samples)
play_audio(samples)

AudioSamples:
  data (shape): torch.Size([2, 4297722])
  pts_seconds: 0.02505668934240363
  duration_seconds: 97.45401360544217
  sample_rate: 44100

The .data field is a tensor of shape (num_channels, num_samples) and of float dtype with values in [-1, 1].

The .pts_seconds field indicates the starting time of the output samples. Here it’s 0.025 seconds, even though we asked for samples starting from 0. Not all streams start exactly at 0! This is not a bug in TorchCodec, this is a property of the file that was defined when it was encoded.

Specifying a range#

If we don’t need all the samples, we can use get_samples_played_in_range() to decode the samples within a custom range:

samples = decoder.get_samples_played_in_range(start_seconds=10, stop_seconds=70)

print(samples)
play_audio(samples)

AudioSamples:
  data (shape): torch.Size([2, 2646000])
  pts_seconds: 10.0
  duration_seconds: 60.0
  sample_rate: 44100

Custom sample rate#

We can also decode the samples into a desired sample rate using the sample_rate parameter of AudioDecoder. The ouput will sound similar, but note that the number of samples greatly decreased:

decoder = AudioDecoder(raw_audio_bytes, sample_rate=16_000)
samples = decoder.get_all_samples()

print(samples)
play_audio(samples)

AudioSamples:
  data (shape): torch.Size([2, 1559264])
  pts_seconds: 0.02505668934240363
  duration_seconds: 97.454
  sample_rate: 16000

Total running time of the script: (0 minutes 1.314 seconds)

Gallery generated by Sphinx-Gallery

Decoding audio streams with AudioDecoder#

Creating a decoder#

Decoding samples#

Specifying a range#

Custom sample rate#

Docs

Tutorials

Resources