What's Analytics Cardinality and why it triggers Errors?
When querying the Bitmovin Analytics API, you might encounter an error like this:
"status": "ERROR",
"data": {
"code": 5003,
"message": "Error querying analytics!",
"developerMessage": "Attempting to groupBy a very high cardinality column that cannot be aggregated. Please filter the query result."
}
This message is related to a concept called cardinality. Below, we’ll explain what this means and how you can avoid this error in your queries.
What is "Cardinality" in Analytics?
Cardinality refers to the number of unique values a column (dimension
) can have in your data. In simple terms:
- Low cardinality = few unique values (e.g.
player_version
might have 10 versions) - High cardinality = many unique values (e.g.
impressionId
,user_id
, orvideo_title
might have thousands or millions of unique entries)
Dimensions like impressionId
, video_id
, custom_user_id
, custom_data fields
, user_id
or video_title
are commonly high cardinality.
Why Does High Cardinality Cause Errors?
Bitmovin Analytics is optimised for performance and cost-efficiency. When you query and groupBy
a high cardinality dimension without enough filters, the system must process and return a very large number of groups—sometimes millions. This can overload the query engine, leading to errors like 5003
.
How Can I Fix or Avoid This?
Here are ways to prevent or resolve high cardinality errors:
- ✅ Filter your query: Always use filters (e.g.,
date ranges
,country
,player version
,video ID
) to reduce the dataset size. - ✅ Avoid grouping by high-cardinality fields: Only
groupBy
dimensions that have a manageable number of unique values (e.g.,browser
,country
,video_id
for filtered datasets). - ✅ Use filters instead of groupBy: If you're interested in a specific
custom_user_id
, use it in a filter rather than agroupBy
.
Example
❌ Problematic query:
"groupBy": ["impression_id"]
This will likely trigger a 5003 error unless you apply very specific filters.
✅ Better approach:
"filters": [
{ "name": "impression_id", "operator": "IN", "value": ["value1","value2"] }
],
"groupBy": ["player_version"]
This narrows down the dataset and uses a low-cardinality dimension.
Summary
High cardinality refers to fields with many unique values. Grouping by such fields without proper filtering can overwhelm the Analytics engine, triggering error 5003
. To avoid this:
- Use filters to reduce data size
- Avoid grouping by high-cardinality fields unless absolutely necessary
- Prefer grouping by standard dimensions like
country
,browser
, etc...
For more guidance, refer to our Analytics API design limitations and community discussion on Error 5003.
Updated 2 days ago