What's Analytics Cardinality and why it triggers Errors?

When querying the Bitmovin Analytics API, you might encounter an error like this:

"status": "ERROR",
"data": {
  "code": 5003,
  "message": "Error querying analytics!",
  "developerMessage": "Attempting to groupBy a very high cardinality column that cannot be aggregated. Please filter the query result."
}

This message is related to a concept called cardinality. Below, we’ll explain what this means and how you can avoid this error in your queries.

What is "Cardinality" in Analytics?

Cardinality refers to the number of unique values a column (dimension) can have in your data. In simple terms:

  • Low cardinality = few unique values (e.g. player_version might have 10 versions)
  • High cardinality = many unique values (e.g. impressionId, user_id, or video_title might have thousands or millions of unique entries)

Dimensions like impressionId, video_id, custom_user_id, custom_data fields, user_id or video_title are commonly high cardinality.

Why Does High Cardinality Cause Errors?

Bitmovin Analytics is optimised for performance and cost-efficiency. When you query and groupBy a high cardinality dimension without enough filters, the system must process and return a very large number of groups—sometimes millions. This can overload the query engine, leading to errors like 5003.

How Can I Fix or Avoid This?

Here are ways to prevent or resolve high cardinality errors:

  • Filter your query: Always use filters (e.g., date ranges, country, player version, video ID) to reduce the dataset size.
  • Avoid grouping by high-cardinality fields: Only groupBy dimensions that have a manageable number of unique values (e.g., browser, country, video_id for filtered datasets).
  • Use filters instead of groupBy: If you're interested in a specific custom_user_id, use it in a filter rather than a groupBy.

Example

❌ Problematic query:

"groupBy": ["impression_id"]

This will likely trigger a 5003 error unless you apply very specific filters.

✅ Better approach:

"filters": [
  { "name": "impression_id", "operator": "IN", "value": ["value1","value2"] }
],
"groupBy": ["player_version"]

This narrows down the dataset and uses a low-cardinality dimension.

Summary

High cardinality refers to fields with many unique values. Grouping by such fields without proper filtering can overwhelm the Analytics engine, triggering error 5003. To avoid this:

  • Use filters to reduce data size
  • Avoid grouping by high-cardinality fields unless absolutely necessary
  • Prefer grouping by standard dimensions like country, browser, etc...

For more guidance, refer to our Analytics API design limitations and community discussion on Error 5003.