Bitmovin's Distributed Encoding Architecture

Why we deploy directly to cloud compute instances instead of Kubernetes

Overview

Bitmovin’s distributed encoding service is built for massive scalability, high performance, and cloud cost-efficiency. To meet these goals, we deploy encoding workloads directly to preemptible compute resources (like AWS EC2 spot instances) instead of running them inside a Kubernetes cluster. This document outlines the reasons behind this architectural decision and why our control plane is a better solution for orchestrating encoding workflows across cloud providers.

Benefits of Using Bitmovin’s Control Plane for Media Workflows

Purpose-built for Media Encoding

  • Deploying a robust media encoder on Kubernetes introduces significant complexities, particularly in:
    • Resource management (high CPU, memory, I/O demands)
    • Stateful workload handling for encoding progress and files
    • Real-time performance requirements for low latency and consistent throughput
    • Scalability to manage fluctuating encoding demand
    • Networking for media streaming

Simplified Deployment

  • Bitmovin’s control plane abstracts these complexities, offering:
    • Simplified Infrastructure: Hiding underlying infrastructure details
    • Optimized Scheduling: Efficient job allocation across cloud providers
    • Managed Scaling: Automatic resource adjustments
    • Integrated Monitoring: Real-time performance insights
    • Cross-Cloud Support: Built-in support for multiple cloud providers

Increased Operational and Cost Efficiency

  • Bitmovin’s control plane reduces operational and monetary overhead by:
    • Accelerating deployment
    • Simplifying maintenance tasks
    • Allowing K8s experts to focus on applications where containerized orchestration is more suitable
    • Optimizing resource utilization and leveraging multi-cloud pricing differences for cost savings

Why Bitmovin Deploys Directly to Cloud Instances Instead of Kubernetes Clusters

Massively Scalable, Ephemeral Workloads

Each encoding job can trigger hundreds of compute worker nodes. These nodes are:
  • Ephemeral – spun up on demand and terminated as soon as the task is complete
  • Dedicated – each job runs in isolation on a single host
  • Highly Parallel – enabling rapid, horizontal scale-out of encoding tasks

This job-based model is fundamentally different from long-lived containerized services and is not compatible with the typical lifecycle expectations of Kubernetes workloads.

Full Host Resource Utilization

Each encoding worker is designed to fully utilize the host machine, including:
  • 100% of CPU resources
  • Full memory bandwidth
  • High disk I/O throughput (especially when using NVMe or local SSDs)

Such intensive, single-tenant workloads are not suitable for Kubernetes clusters, where containers are typically scheduled to share resources. Doing so would result in inefficient scheduling, resource contention, and degraded performance.

Intelligent Use of Preemptible Instances

To optimize costs, we extensively use preemptible (spot) instances for our encoding workers. Our infrastructure can:
  • Select the most cost-efficient and available instance types dynamically
  • Fall back to on-demand instances if spot capacity is unavailable
  • Adapt in real-time to capacity fluctuations across instance families or regions

While Kubernetes does support Spot Instances, effectively managing them for such volatile and high-throughput workloads would require significant complexity and custom orchestration logic.

Granular Infrastructure Control

Deploying directly to cloud compute instances allows us to precisely control:
  • Instance selection – such as compute-optimized or storage-intensive types
  • Disk configuration – choosing between EBS or ephemeral NVMe depending on workload
  • Lifecycle management – provisioning, monitoring, and terminating resources with minimal latency

This level of control is difficult to achieve with Kubernetes without introducing significant custom scheduling, storage management, and operational overhead.

Cloud Resilience and Flexibility

Direct integration with the compute instances gives us the ability to:
  • Detect and respond to capacity shortages instantly
  • Failover to alternate instance families or regions automatically
  • Avoid the cost and complexity of maintaining a pre-scaled cluster that may sit idle for most of the time

This ensures that we maintain high availability and performance while controlling cloud spend effectively.

Encoding Orchestrator Architecture

At the heart of our infrastructure is our encoding orchestrator — a SaaS-based control plane responsible for coordinating all encoding workloads in the cloud. The orchestrator:
  • Integrates directly with the hyperscaler APIs to provision, monitor, and terminate instances
  • Tracks job progress, health status, and node lifecycle in real time
  • Makes dynamic infrastructure decisions such as instance type selection, zone fallback, or spot/on-demand switching

This orchestrator is fully managed and hosted by Bitmovin, meaning customers do not need to operate or maintain any orchestration layer themselves.

Deployment Model

In a typical deployment, customers provide a dedicated hyperscaler (AWS, GCP, Azure) account exclusively for Bitmovin encoding workloads. This account:
  • Does not require access to any other customer resources
  • Is used solely for the purpose of deploying encoding worker nodes
  • Is accessed securely via IAM cross-account roles, with minimal permissions strictly scoped to required compute functionality

This model ensures maximum security, isolation, and operational simplicity for the customer, while allowing Bitmovin to deliver a seamless, highly scalable encoding experience.