Back

Stateless Clusters: Ephemeral In Production

Every service in the cloud or piece of cloud infrastructure we create is nothing more than an API call.Whether youʼre using Terraform, Pulumi, or an SDK, underneath the hood, youʼre making an API call to the particular set of APIs that make managing/creating services/infrastructure possible. The thing is, there are only a few ways many engineers feel comfortable using because theyʼre worried about “stateˮ, but they shouldn't be.In this blog post, you will learn about what state is, why itʼs important in todayʼs world, and why to stop caring about it.

Every service in the cloud or piece of cloud infrastructure we create is nothing more than an API call.

Whether youʼre using Terraform, Pulumi, or an SDK, underneath the hood, youʼre making an API call to the particular set of APIs that make managing/creating services/infrastructure possible.

The thing is, there are only a few ways many engineers feel comfortable using because theyʼre worried about “stateˮ, but they shouldn't be.

In this blog post, you will learn about what state is, why itʼs important in todayʼs world, and why to stop caring about it.

Whatʼs State?

“Stateˮ has three primary purposes:

  1. Performance for infrastructure (itʼs almost like a cache)
  2. Metadata
  3. Versioning

Source: https://getcoal.medium.com/terraform-the-importance-of-the-state-file-bc95d6e94180

When you ask people what they care about, itʼs primarily number 2. Engineersknow that infrastructure is fragile, thereʼs a lot to configure for it to run properly,and it takes a long time to get up and running.

The goal of state is to keep almost like a copy (this is the metadata piece) of your environment. That way, if you ever need to deploy it again, you donʼt have to go through the hassle that you did the first time.

Itʼs a great concept because as anyone whoʼs ever managed infrastructure knows, itʼs delicate. Thereʼs a reason why hot/hot, hot/cold, and VM migrations exist -because every once in a while, despite best efforts, infrastructure will go down.Itʼs no different in the cloud either. Even if you have a better chance of infrastructure/services staying up in the cloud, regions go down a fair amount, which means you need a method of high availability, migration, or state. That way, you can re-deploy where necessary.

The question is “why?ˮ.

Why Clusters Are Treated So Delicately

Over the decades, weʼve seen a healthy timeline of change and innovation when itcomes to comptuers.

Room-sized computers  Refrigerator-sized computers  Servers no bigger than a desktop  Virtualized servers  Containers.

The key takeaway from this timeline is compute has become more efficient to use, implemented with a smaller form factor, and perhaps most importantly in the worldwe live in, managed with an API call.

A lot of mindsets have been changed since the early days of computing, but one thing that didn't change - compute/servers/clusters are still handled in such a delicate way like they were when the size of a computer took up an entire room.

Hereʼs a question for everyone to ponder - if infrastructure/clusters are available at almost a snap of your fingers (aside from the time it takes to deploy them) and it can all be done in an automated, programmatic fashion, why do we still care so much about it? If the cloud (despite opinions) gave engineers the ability to spin up compute whenever they want, wherever they want, however they want, why are we still treating it like a delicate flower that we donʼt want to break?

Sure, there are a few reasons that come to mind. Delicate networks for large organizations, third-party tools/add-ons Argo, Istio, etc.) that are necessary for some organizations, and perhaps some storage concerns, but all of these arenʼt dependent on the infrastructure/clusters that are running. Itʼs all implementation that are dependent on outside forces.

Thatʼs why clusters should be treated like containers.

Clusters == Containers

One of the greatest things that containers gave us was the ability to decouple pieces of the application stack. Because the application stack was split up (some call it microservices) and scalable (into multiple containers/Pods), it theoretically no longer mattered if the container/Pod stayed up. If one goes down, a new one comes up (assuming that you have proper resource optimization in place).

How is this done?

In short, application stacks have configs embedded into container images. Then, any other configuration thatʼs needed outside of the container image configs (for example, a config that changes and isnʼt static) can be pushed in at runtime with a dynamic config.

If this can all be done at a container/container image level, it can be done at the infrastructure/compute/cluster level.

Closing Thoughts

Every time thereʼs a mindset shift in the way that things are done in tech, everyone has doubts. This is normal, even for humanity as a whole outside of tech. In reality, no one likes change from what theyʼre comfortable doing.

But in some instances (particularly this one), change is good. You donʼt have to spend your time managing IaC, pipelines, and deployment times.Infrastructure/clusters/compute can be ephemeral.

Michael Levan
Copy article link
Link copied to your clipboard