Understanding Differences Between Exported Data and the Observability Dashboard
Overview
When comparing numbers from the Observability Dashboard with data retrieved through Exporting your Data or the API Explorer, you may notice small differences in values. This is expected behavior, not a data loss or reporting error.
This page explains the two data pipelines behind these tools and why they produce slightly different numbers.
How the Observability Dashboard Processes Data
The Observability Dashboard uses two different data types depending on how recent the data is, and merges them into a single view.
Recent data (last 2 days): Uncompressed, high-granularity data collected at the finest available resolution.
Historical data (older than 2 days): Compressed data. After 2 days, you can be certain that compression for a given time window is complete and the data is final.
Within the last 2 days, compression may not have completed yet for all data points. The exact cutover point is not visible in the dashboard.
Why the Dashboard Uses Compressed Data for Historical Metrics
Computing exact distinct counts (such as unique viewers) on large datasets requires scanning massive volumes of data, which makes historical queries prohibitively slow at scale. Instead, Bitmovin uses HyperLogLog++, an industry-standard cardinality estimation algorithm that provides near-accurate distinct counts with a small error margin of up to 4%.
Once compression completes for a given time window, all metrics become exact and the HyperLogLog++ error margin no longer applies. The data is also deduplicated at this stage, removing any duplicate samples present in the raw data. Bitmovin continuously monitors compressed data to ensure it stays within 2% of the raw source.
The 200-Category Limit
When breaking down a metric by a high-cardinality dimension (for example, Video Title or User ID), the Dashboard returns up to approximately 200 distinct categorical values per query. Categories beyond that threshold are excluded, which causes the sum of individual category values to be lower than the reported Total.
This limit applies to both the Dashboard breakdown view and the CSV exported from the API Explorer. See API Explorer: Design Limitations for details.
To work around this limit:
- Shorten the time range to reduce the number of active categories below 200.
- Apply additional filters (by platform, country, or other dimensions) to narrow the result set.
- Use Exporting your Data to export the full raw dataset and perform the breakdown independently.
How Exported Data and the API Explorer Work
Both the Export Data feature and the API Explorer always return raw, uncompressed (high-granularity) data. The compression pipeline used by the dashboard is not applied.
This means:
- Distinct count metrics use the HyperLogLog++ approximation with a small error margin of up to 4%.
- The data may include duplicate samples that compression would otherwise remove.
- No post-processing or deduplication is applied.
Exports are well suited for session-level analysis, custom data pipelines, and long-term storage in your own data warehouse. Bitmovin continuously monitors systems to ensure compressed data stays below the 2% error margin compared to raw data, so this slight variance will not significantly impact your analysis.
Why the Numbers Differ
| Data Source | Data Type | Distinct Count Method | Deduplication Applied |
|---|---|---|---|
| Dashboard (last ~2 days) and Export Data / API Explorer | Uncompressed | HyperLogLog++ (small margin, up to 4%) | No |
| Dashboard (older than ~2 days) | Compressed (after finalization) | Exact | Yes |
For recent data, the dashboard and exports use the same underlying data type. Small differences can still appear due to timing and in-flight deduplication. For historical data, the dashboard shows finalized, deduplicated, exact values while exports show the original raw approximation. This is the most common source of visible discrepancy.
Replicating Dashboard Numbers from Exports
Exactly reproducing Observability Dashboard numbers from raw exports is not recommended. The dashboard relies on a multi-stage pipeline (compression, deduplication, and per-metric optimizations) that is not exposed externally.
Raw exports are accurate and reliable for independent analysis. The differences are small and expected: a margin of up to 4% for uncompressed data and within 2% for compressed data compared to the raw source.
Further Reading
Updated about 4 hours ago