# Introduction

This tutorial illustrates the different ways to handle multi-channel audio, specifically for when you have multiple source files, different audio layouts, and a need to transform your inputs into streamable audio streams.

  • First, we go through terminology, and some simple use cases, to introduce the main concepts and API resources that are involved in the encoding configuration when it comes to audio.

  • Then, we look at use cases that involve channel (re-)mixing from an input stream to an output stream.

  • Following that, we look at how to (re-)map multiple streams between inputs and outputs.

  • Finally, in the last section, we put it all together and look at how to merge or blend multiple multi-channel streams into a single output stream.

# Terminology

But first, it helps to understand what we are talking about and agree on the vocabulary.

  • A **channel** is the real audio signal, which is usually associated with a speaker in a multi-speaker setup.

  • Multiple channels are often grouped together into **streams**. For example a 5.1 surround sound audio stream contains 6 channels. We will often talk about channel layout to refer to the number and order of channels in the stream. Sometimes the term **track** is used as well: a track refers to the logical entity, the stream refers to the track encoded with a specific codec.

  • A Stream is contained in a **file**, and a file can contain multiple streams. A file can also contain video streams alongside audio streams.

For this tutorial, we will use illustrations such as the following to help depict the concepts:

This represents a single file, with 3 streams.

  • The first stream (stream 0) is a video stream.

  • The next stream (stream 1) is a stereo audio stream with 2 channels: left and right

  • The last stream (stream 2) is a surround audio stream with 6 channels: front left, front right, center, low frequency effects (for a subwoofer), back left and back right

# Simple Stream Handling

## Use Case 1 - Implicit Handling

In the simplest case, there is no real audio manipulation taking place. You have an input with an audio stream with a particular channel layout, and you want basically the same to be in your output, with the appropriate codec and bitrate. This use case is our baseline, which we’ll need in order to introduce additional concepts for the following, more complex use cases. Note that we do not currently support a passthrough options in most cases; an input audio stream still needs to be encoded in order to be properly aligned to the encoded video content, especially when creating segmented content.

Let’s take the following example, with a single stereo stream:

We will not go into all aspects of the encoding configuration (there are other tutorials on our website that cover that), but will look specifically at the audio aspect of the configuration. For that, you will need a similar set of resources as for other streams:

  • An `IngestInputStream`, which defines where your source file is located on the `Input` storage

  • An audio `CodecConfiguration` for the codec of your choice, configured appropriately

  • An (output) `Stream` that defines how the configuration is applied to the input stream

  • One or more `Muxings` to define how the stream is containerised in a file and transferred to the `Output`

**Note:** For the purpose of this tutorial and in the example code files associated with it, we will always generate a single MP4 file as output, with all audio streams multiplexed with the video stream. In most situations that involve adaptive bitrate streaming with manifests such as HLS and DASH, each stream will need to be output in its own separate muxing.

As you can see from this snippet of code, there was no need to handle any aspect of the channel mapping between the input and output for this simple use case. `StreamSelectionMode.AUTO` (which is the default mode so does not even need to be specified) tells the encoder to do its best at finding an audio stream in the source that makes sense to use as input stream.

**Note:** if you have previously configured encodings without an `IngestInputStream` and instead referred to the input file directly in the Stream's creation payload, read [this FAQ](🔗) to understand why we recommend that you switch to the first method.

A full code sample with our various SDKs is provided [in our example repository.](🔗)

## Use Case 2 - Distinct Input Files

A situation that occurs regularly is one in which each input stream comes in a distinct file. In particular, this is what you will have if you are receiving IMF packages from your content provider, since they store each track in a separate “essence” (hear: file).

Bitmovin also allows you to work with streams that are in separate files, whether or not the files also contain a video stream.

This is handled in much the same way as in the previous example, but now you need to have multiple `IngestInputStreams`, `CodecConfigurations` and `Streams`. If we refactor the code to have functions for creation of each of the resources above, the code might therefore look like the following:

A full code sample can be found [in our example repository.](🔗)

# Channel Mixing

In this section, we look at use cases that require manipulation of the audio channels present in the input audio stream.

## Use Case 3 - Swapping Channels

As before, let’s imagine that your input file has a stereo audio stream, but that somehow, the left and right channels have been reversed. It’s more of a theoretical use case (or you should have a serious word with your content provider), but it helps us illustrate in a simple way the next concept: Audio Mixing

With Audio Mixing, you can go down to the level of the channel (instead of the stream itself), and manipulate the channel layout.

Just as before, you will need an `IngestInputStream` to point to the source file. But in addition, you now also need to involve an `AudioMixInputStream` to apply a transformation to that input stream, before generating your output `Stream`.

The transformation here is simple: take each channel (by position in the input stream) and re-map it to the opposite output channel.

Let’s highlight a couple of important points from that snippet of code:

  • where before we created the `Stream` from an `IngestInputStream`, it is now from the `AudioMixInputStream` that we do so, since it represents the result of the transformation of the input stream (line 40).

  • the same audio `IngestInputStream` is used in the definition of both output channels in the configuration above (lines 17 and 23).

You can find a full code sample [in our example repository](🔗)

Another similar use case is if one of your logical source channels is hard “panned” to either the left or right channel of an input stereo pair, leaving the other channel empty. You can then simply select it and copy it to both output channels to center it.

## Use Case 4 - Downmixing 5.1 to 2.0

Let’s look now at a more complex use case, which involves multiple source channels being combined into the same output channels.

In this example, we have a source file with a 5.1 surround audio stream, which we want to convert into a stereo-only stream in the output. We have decided that the output channels should be mixing the corresponding front channel, the center channel at lower volume, and the corresponding back channel at even lower volume.

**Warning:** Note that we are not claiming that this is the right or only way of downmixing 5.1 to 2.0. You will need to discuss the correct mechanism with your content provider. The example above is only for the purpose of illustrating the advanced concepts of audio mixing.

Enter the concept of **gain**. This allows us to define at what level (volume) one of the source channels should be combined with others.

Code-wise, this looks fairly similar to the previous use case, but (naturally) with more configuration. Let’s look at just the left channel’s mixing definition:

Notice also another difference with the previous use case. Instead of defining source channels by their position in the stream, we choose them by **channel type**. If your input file has been correctly tagged, this simplifies the code and also ensures that you can handle input files that may have slightly different channel layouts (since there are multiple ways of laying out 5.1 channels in a stream in the industry)

You can find a full code sample [in our example repository](🔗), which streamlines the example above by using functions and helper classes to improve readability.

## What about the AudioMix Filter?

You may have seen in our API documentation that we also support a [filter for audio mixing](🔗). It is functionally equivalent to the `AudioMixInputStream` when it comes to audio mixing configurations and use cases and was our initial approach to provide this feature. `AudioMixInputStreams` are the successor of it, so it is highly recommended to use these instead of the `AudiomixFilter` going forward as the latter will be deprecated. In particular, if you are also intending to use other functionality enabled through `InputStreams` resources, such as [trimming and concatenation](🔗), you will only be able to define the audio manipulation through an `AudioMixInputStream` in the chain of `InputStreams`.

# Stream Mapping

In the previous section, we looked at how to (re-)mix audio channels in the audio input stream. In this third part, we now look at use cases that have a different number of audio streams between the input and output, and where multiple audio streams need to be combined with each other to generate an output stream.

## Use Case 5 - Mono Input Tracks

It is a very frequent use case, particular in broadcast workflows. The source file has multiple (often PCM) streams/tracks, each with a single mono channel that represents one of the output channels. Let’s take a middle-of-the-road example here, with 8 mono streams that contain the channels for a stereo pair and a surround audio layout.

We have so far met all the resources and concepts necessary to configure an encoding for this use case, but this example allows us to revisit a couple of points highlighted earlier in the tutorial.

In use cases 3 and 4, we had a single `IngestInputStream` to select the single audio stream from the stream, which could be used in all aspects of the configuration thereafter. As shown in use case 1, we could also use the automatic `StreamSelectionMode` to let the encoder implicitly select that audio input stream in the source file.

We now will have to be more explicit, and select exactly the right audio stream as input stream for our configuration, and map it appropriately in the `AudioMixInputStream` configuration to the relevant output channel.

Let’s look at the code for the mapping of the first two channels into the stereo pair

In this code sample, notice how each `AudioMixInputStreamChannel` uses a different `IngestInputStream` to grab a specific audio stream in the input file. As for each `IngestInputStream`, it selects the exact source stream by its position in the input file, in relative order between all audio streams. You could also use `StreamSelectionMode.POSITION_ABSOLUTE` if you prefer.

**Tip:** to determine how many streams your input file has, in what order, and with what audio layout, use `mediainfo` or `ffprobe`\*

You can find a full code sample [in our example repository](🔗), which streamlines the example above by using functions and helper classes to improve code readability.

## Use Case 6 - Separate Input Files for different Channels

Although this use case is not frequent, it is just a special case of the previous example, and can be handled in the exact same way. The only difference is that the input file path may now be different for different `IngestInputStreams`, instead of (or in addition to) the source channel number.

With the same mechanism, you can also now hopefully see how you could replace a single channel in a multi-channel stream with one from a different file. We will not even ask why you would want to do such a thing…

# Stream Merging (or Blending)

In the previous section, we saw how we can map channels from distinct input streams onto output channels. In this fourth and final part, we will see how we can go one step further and use all the concepts seen so far together, and merge multiple input streams into output streams.

## Use Case 7 - Voice-Over or Background Music

In this use case, let’s imagine that we have a source file containing a stereo audio track. In a separate file, we have another stereo track such as a voice-over commentary or background music. To combine those, we need one final tool at our disposal: the ability to blend streams together:

This can be achieved very simply by providing a list of `InputStreams` when creating a `Stream`. Blending is done channel by channel, so all input streams need to have identical channel layouts.

**Note:** The behavior is different if all input streams are mono: In this case, they will not be blended but combined into a multi-channel stream (e.g. stereo for 2 input streams, 5.1 for 6 input streams)

#### With volume adjustment

To make this example a little more advanced, and to show how all features can be combined, let’s also reduce the volume on the original track, so that the voice-over can be heard more clearly. To apply a volume adjustment, we need to pass the original track trough an `AudioMixInputStream` and define the `gain` value for each channel, as demonstrated earlier.

You can find a full code sample [in our example repository](🔗), which shows a slightly different use case, in which the same source file contains 2 stereo streams, instead of them coming from different files.

# Summary

To conclude, let’s summarise the concepts involved and how they translate into the Bitmovin API:

  • Use an `IngestInputStream` ([endpoint](🔗)) to define where your input file is

    • and set an explicit `StreamSelectionMode` if you need to select a specific stream from that file

    • use multiple `IngestInputStreams` if you have multiple streams that contain your source channels

  • Add an `AudioMixInputStream` ([endpoint](🔗)) if you need to mix and map different input channels into output channels

    • and set a `gain` on individual source channels if you want to alter their volume

  • Merge multiple input streams with the same audio layout by adding them to the output `Stream` ([endpoint](🔗))

**Tip:** Remember also that if you want your encoding workflow to be able to cater for inputs with different audio layouts (for example if you receive some files with 1 stream with 6 channels, and some with 6 mono streams), you could use [Stream Conditions](🔗) to apply the correct logic.