This article describes the changes to fMP4 outputs starting from version 2.153.0 of the Bitmovin Encoder. Starting with this version, fMP4 outputs with codecs H.264, AAC, HE-AAC and HE-AACv2 will use an overhauled implementation that aims to improve stability and correctness of ISO-BMFF files. AV1 has already been using the overhauled fMP4 implementation.
In this article, we're going to explicitly show the differences in MP4 outputs, by comparing excerpts from [mp4dump](🔗) outputs of fMP4 encodings up to version 2.152.0 versus encodings starting from 2.153.0 for the same configuration.
Finally, we'll list the devices/platforms used for testing playback.
# Changes to ISO-BMFF boxes
## ftyp boxes
Up to version 2.152.0, the ftyp box for an H.264 initialization segment looks like
With version 2.153.0, the ftyp box of an H.264 initialization segment looks like
Audio initialization segments should look the same, except for the `
avc1` compatible brand, which is not present.
This should not have any practical side effect for demuxers, so we don't expect and have not found any issues with the change.
## Timescale in mvhd and tkhd boxes
Up to version 2.152.0, the timescale in Movie Header (mvhd) box had value of 1000, and the timescale in Track Header (tkhd) box depended on:
frame rate for video
sampling rate for audio.
From version 2.153.0, the mvhd timescale is the same as the tkhd timescale.
### Video timescale
The video timescale is the video framerate rounded to nearest integer multiplied by 1000. Some examples:
24 FPS: Timescale 24000
23.976 FPS: Timescale 24000
### Audio timescale
The audio timescale equals the sampling rate. This applies to all audio codecs, including HE-AAC and HE-AACv2 codecs, which up to version 2.152.0 had a timescale of half the sampling rate.
## max\_bitrate and avg\_bitrate in DecoderConfig for audio
We noticed that our audio initialization segments, up to version 2.152.0, always reported 96 kbps as max\_bitrate and avg\_bitrate under the DecoderConfig box for audio, regardless of the configured bitrate. This was a problem only in muxing. The audio was encoded at the correct bitrate.
Starting from version 2.153.0, this issue is fixed and the values are correctly signaled depending on the specified bitrate from the codec configuration.
## Sample flags in trun entries
In video segments up to version 2.152.0, the sample flags were always optimized using `
default sample flags` from the tfhd box and `
first sample flags` from the trun box, as shown below:
While this optimization is good for reducing muxing overhead, our muxer was optimizing the flags even in situations when it shouldn't, e.g. when there are more key frames than the first frame. This resulted in warnings on the Media inspector tab on Chrome:
Starting from version 2.153.0, sample flags in the trun boxes in a given segment are only optimized if it is possible. Otherwise, the flags are written per sample, like below:
While this may increase the muxing overhead with 4 extra bytes for sample when the optimization can not be applied, it provides the most correct outputs, which may also fix playback in older players. Our muxer will still optimize the flags when it is possible.
Furthermore, the value set in sample\_flags at version 2.153.0 (0 and 0x10000) differ slightly from the ones set in 2.152.0 (0x2000000 and 0x1010000) because we don't make use of the `
sample_depends_on` bits anymore (ISO/IEC 14496-12:2012 220.127.116.11).
## Edit Lists with the edts box
We noticed that up to version 2.152.0, an edit list could be missing for H.264 streams that make use of B-frames. This edit list is required due to a delay between decoding and presenting frames (signaled in trun entries via `
sample_composition_time_offset`), and due to this, the first segment of some streams could have a non-zero reported start time, for example when checking it with ffprobe:
From version 2.153.0, the edit list is placed in the initialization segment when needed:
With this, the first segment correctly starts at 0 now:
For avoiding the use of edit lists, it is also possible to use of the ALIGN\_ZERO\_NEGATIVE\_CTO in the "ptsAlignMode" configuration of fMP4 muxings, which makes use of trun v1 boxes that allow for negative `
# Playback testing
Before releasing this change, Bitmovin has conducted extensive device testing to make sure that the new outputs won't have playback regressions. The following devices/platforms were tested with non-DRM and DRM outputs, using the Bitmovin player:
Chrome and Edge (stable, beta and dev) on MacOS, Linux and Windows
Firefox on MacOS, Linux and Windows
Safari on iPad Air 2 (iOS 13), iPad Mini 6 (iOS 15), iPhone 11 (iOS 14), iPhone 8+ (iOS 12)
Samsung Tizen TVs, from 2016 to 2022 models
LG WebOS TVs, from 2016 to 2022 models
Panasonic TV 2018
Xbox One and Xbox Series S
Chromecast and Chromecast Ultra
Android Pixel2 with browsers Chrome, Firefox and Samsung Internet
Fire TV Stick 4K and Fire TV Stick 4K Max
Roku Streaming Stick, Roku Streaming Stick 4K