Encoding Incident Operational Playbook
1. Check Service Status Page
- Visit https://status.bitmovin.com/ for the latest updates on any ongoing incidents or outages.
2. Switch Cloud Regions or Providers
-
Cloud Region: If the issue is isolated to a specific region, switch to another region by setting the
cloud_region
while creating the encoding job. -
Cloud Provider: If a specific cloud provider is impacted, switch to another supported provider or region by setting the
cloud_region
while creating the encoding job.
Supported Regions:
- GCP: GCP Region API
- AWS: AWS Region API
- Azure: Azure Region API
- Akamai: Akamai Region API
- OCI: OCI Region API
3. Raise a priority (P0) ticket if it impacts all encodings
- If all encoding jobs are affected, raise a Priority (P0) ticket via https://dashboard.bitmovin.com/support/tickets/create.
- Post in the designated Slack channel to notify the wider team about the issue and include the incident description, error messages, and any relevant details.
- Escalation: If you do not receive a response within a reasonable time frame, escalate the issue to your Customer Success Manager and the Account Owner.
4. Reduce the number of parallel encoding tasks
- To reduce system load and improve performance during the incident, decrease the number of concurrent encoding jobs.
Best Practices
1. Add fallback regions when creating encodings
- Specify a list of fallback regions to be used if the preferred region is unavailable. To configure fallback regions, refer to the Encoding API Documentation.
- Fallback regions are not a global setting and can only be configured during encoding creation.
- The fallback region(s) must belong to the same cloud provider as the default one. You can configure up to 3 fallback regions.
- This is applicable only for VOD encodings and managed encodings.
- Cloud egress charges apply for different regions.
2. Make your integration configurable
- Allow cloud regions and providers to be configured dynamically, enabling quick adaptation to outages and ensuring continuous service.
Updated about 18 hours ago