Rate this Page

Decoding audio streams with AudioDecoder#

In this example, we’ll learn how to decode an audio file using the AudioDecoder class.

First, a bit of boilerplate: we’ll download an audio file from the web and define an audio playing utility. You can ignore that part and jump right below to Creating a decoder.

import requests
from IPython.display import Audio


def play_audio(samples):
    return Audio(samples.data, rate=samples.sample_rate)


# Audio source is CC0: https://opengameart.org/content/town-theme-rpg
# Attribution: cynicmusic.com pixelsphere.org
url = "https://opengameart.org/sites/default/files/TownTheme.mp3"
response = requests.get(url, headers={"User-Agent": ""})
if response.status_code != 200:
    raise RuntimeError(f"Failed to download video. {response.status_code = }.")

raw_audio_bytes = response.content

Creating a decoder#

We can now create a decoder from the raw (encoded) audio bytes. You can of course use a local audio file and pass the path as input. You can also decode audio streams from videos!

from torchcodec.decoders import AudioDecoder

decoder = AudioDecoder(raw_audio_bytes)

The has not yet been decoded by the decoder, but we already have access to some metadata via the metadata attribute which is an AudioStreamMetadata object.

print(decoder.metadata)
AudioStreamMetadata:
  duration_seconds_from_header: 97.48897959183674
  begin_stream_seconds_from_header: 0.02505668934240363
  bit_rate: 108039
  codec: mp3
  stream_index: 0
  duration_seconds: 97.48897959183674
  begin_stream_seconds: 0.02505668934240363
  sample_rate: 44100
  num_channels: 2
  sample_format: fltp

Decoding samples#

To get decoded samples, we just need to call the get_all_samples() method, which returns an AudioSamples object:

samples = decoder.get_all_samples()

print(samples)
play_audio(samples)
AudioSamples:
  data (shape): torch.Size([2, 4297722])
  pts_seconds: 0.02505668934240363
  duration_seconds: 97.45401360544217
  sample_rate: 44100