Why we built Pantheon with Containers instead of Virtual Machines

Why we built Pantheon with Containers instead of Virtual Machines

We believe that the future cloud will run on containers, not virtual machines. Pantheon's container based infrastructure is a huge departure from traditional virtual machine and server based "hosting" model. This is why:

Pantheon currently runs more than 55,000 custom Drupal sites. Developers can install and develop their own code, modules, and themes — and we run their sites at scale on our unified platform. Each "site" actually has a dev site, a test site, and a live site. Some even have many dev sites using Multidev. This means we run three or more custom-code applications per site. When you add all of them up you get more than 100,000 custom Drupal web applications running on Pantheon, making it the single largest Drupal infrastructure in the world.

Each application has to be secure and resource isolated. If Pantheon was built with "hosting" architecture each site would have to be run on its own separate virtual machine (unless we resorted to shared hosting, which would provide terrible service for our customers). A custom Drupal app needs to run on at least an Amazon EC2 m1.medium to keep from keeling over, and that's just for a development environment.

So, why containers? If we built Pantheon with hosting architecture we would need at least 100,000 virtual machines! Can you imagine running 100,000+ Amazon EC2 m1.medium's all with custom applications? Never mind breaking the bank, how would we even roll out a simple operating system upgrade or platform update? It would be insane.

Even if we could afford to leverage "the cloud" and run hundreds of thousands virtual machines on-demand, how would individual customers scale? The "VM per customer" architecture where the site is a pile of code sitting on a server would make that virtually impossible.

We needed a breakthrough, something inherently scalable, more along the lines of the 12-Factor App manifesto, with stateless application workers and data (media files, relational db) provided by services. That's why we used containers.

Building with containers

We've built Pantheon as a new kind of infrastructure for running sites at scale, one that is built on containers instead of virtual machines, and abstracts away the site and environments from the OS/machine. This architectural approach has let us rewrite the rules on what is possible for website infrastructure.

Pantheon's container-based architecture is similar to technology already used by companies like Google Apps, Heroku, and Salesforce. It also draws on a number of emerging open-source models such as the quickly evolving container standard Docker, and Redhat's Open Shift.

The Pantheon Platform replaces traditional or Cloud infrastructure for your site.

Here is how Pantheon's container based "Runtime Matrix" works:

  • Each "site" on Pantheon consists of a dev site, a test site, and a live site. (Some developers also fork off many dev sites using our Multidev tools.) Developers are free to install and develop their own code, modules, and themes and use our workflow tools to manage everything through their dashboard.
  • Our infrastructure consists of hundreds of very large host machines (endpoints) that power the runtime matrix. Each endpoint holds a number of containers, which allows us to abstract away the site/environment from the OS/machine.
  • We are able to migrate customer containers around easily, and we do it a lot. The average Pantheon server age is around 50 days. We prefer a fresh machine to having to reboot one for a kernel update.
  • Each runtime container has a tuned Nginx + PHP-FPM engine ready to run a Drupal application, and everything from our Content Base needed to make it yours:
    • A clone of your code.
    • A database connection for that environment.
    • A mount into our distributed Pantheon File Store.
    • Additional services like search, key-value cache, etc.
  • We also use containers to manage these backing services: database servers (each site/environment gets its own), Redis key-value cache servers, and so on.
  • The runtime matrix is fronted by a sophisticated caching and load-balancing mesh that accelerates every site on the platform, and dynamically routes and balances traffic for customer domains across their containers.

Five things containers can do that Virtual Machines can't

  1. Very fast provisioning: Containers are provisioned via software into already-operating infrastructure. We can add, remove and re-distribute containers in seconds. In fact, when development sites are idle we spin them down and resurrect their containers in real time as the first page requests come in.
  2. Simple, high availability: We run the containers on different underlying hardware. So if one host goes down, we can route traffic from the edge to live application containers running elsewhere.
  3. Smooth scaling: Containers let us take sites from hundreds of pageviews to hundreds of millions of pageviews without any downtime or architectural changes. This is hard when you have a VM-centric hosting architecture — vertical scalability requires reboots to resize, and there's a painful architectural gap to start scaling horizontally.
  4. Machine-precision consistency: Every app container running on a site on Pantheon is exactly the same. It's a giant, robotic, "share-nothing" matrix. They're all provisioned automatically on identical infrastructure, and can only be managed via Pantheon's automated tools, there are no gotchas when servers get out of sync.
  5. Better performance: Containers make scaling up much more affordable and granular. Since the resource cost of a small set of processes is so much less than even the tiniest cloud instances capable of running a Drupal installation, you can spread out across many machines without breaking the bank.

The Runtime Matrix allows Pantheon to rewrite the rules of what's possible for website infrastructure. It's key to how we operate the single largest Drupal platform in the world.