Encoding Incident Operational Playbook

1. Check Service Status Page

2. Switch Cloud Regions or Providers

  • Cloud Region: If the issue is isolated to a specific region, switch to another region by setting the cloud_region while creating the encoding job.

  • Cloud Provider: If a specific cloud provider is impacted, switch to another supported provider or region by setting the cloud_region while creating the encoding job.

Supported Regions:

3. Raise a priority (P0) ticket if it impacts all encodings

  • If all encoding jobs are affected, raise a Priority (P0) ticket via https://dashboard.bitmovin.com/support/tickets/create.
  • Post in the designated Slack channel to notify the wider team about the issue and include the incident description, error messages, and any relevant details.
  • Escalation: If you do not receive a response within a reasonable time frame, escalate the issue to your Customer Success Manager and the Account Owner.

4. Reduce the number of parallel encoding tasks

  • To reduce system load and improve performance during the incident, decrease the number of concurrent encoding jobs.

Best Practices

1. Add fallback regions when creating encodings

  • Specify a list of fallback regions to be used if the preferred region is unavailable. To configure fallback regions, refer to the Encoding API Documentation.
  • Fallback regions are not a global setting and can only be configured during encoding creation.
  • The fallback region(s) must belong to the same cloud provider as the default one. You can configure up to 3 fallback regions.
  • This is applicable only for VOD encodings and managed encodings.
  • Cloud egress charges apply for different regions.

2. Make your integration configurable

  • Allow cloud regions and providers to be configured dynamically, enabling quick adaptation to outages and ensuring continuous service.