Luckily, the cloud actually did provide engineers and organizations with all of this if you compare it to its predecessor (spinning up your own data center).
Unfortunately, even though the tools and best practices exist, cloud costs still ended up getting out of hand.
In this article, youʼll learn a few key methodologies for how you can implement a good cost optimization practice for your environment and what tools are at your disposal.
The cloud can be incredibly expensive for one primary purpose - itʼs easy to use.
Letʼs take a step back and discuss what “easy to useˮ actually means in this context, as it can mean many things depending on the context of speaking about the cloud.
In the days when only data centers existed to run workloads, there were a few steps that were needed to run applications. Letʼs talk about them at a high level. First, you needed an actual facility to run the application stacks. This includes the near for proper ventilation and power. Next, you had to obtain servers. Buying servers consisted of contacting a reseller for the brand you wanted to use (Dell, HP, etc.), talking prices, agreeing on a price, and then waiting (if you were lucky and there wasnʼt a backorder) 4-8 weeks to get the server delivered. After it was delivered, you had to rack it, power it up, configure an Operating System, and configure the networking for the server.
Now letʼs compare that to the cloud - you can click a few buttons and have a server up and running. Boom, done. You could even get fancy and run a command on the terminal to get a server up and running.
Comparing datacenters and the cloud, itʼs pretty much a 1-2% effort to get cloud environments deployed in comparison to datacenters.
Sounds good, right? Well, thatʼs actually one of the main reasons why the cloud is so expensive. Itʼs incredibly easy to spin up multiple environments across multiple regions using various instance sizes. Month to month, that adds up. Thereʼs also the fact that, letʼs face it, environments and services are spun up and completely forgotten about and organizations still have to pay for said forgotten environment/service.
With the previous section in mind while youʼre frantically logging into your cloud environment to check what services you forgot about (donʼt worry, I did the same thing too), letʼs discuss a few methods to optimize cloud environments.
First, thereʼs alerting. A key implementation for cloud optimization is to set billing alerts based on a particular number. For example, you can set an alert that states if you go over $20,000 USD, alert someone via email.
Second, youʼll want to ensure that you completely understand the performance of what youʼre running. If youʼre running an application stack and itʼs expected to have 30,000 users per day, you can run performance benchmarks with something like virtual users to test how much the environment will cost you, which also tells you how much resources (CPU, memory, GPU, storage) is necessary to run your workloads. A great open-source method for conducting performance-based tests is Vegeta.
The third topic to discuss is implementing a particular tool for cost and resource optimization. There are several tools out there ranging from open-source to enterprise-grade tools. The goal here is to optimize your environment without hindering performance. A big piece of the puzzle that is included in resource optimization is scalability. Using tools like Karpenter for Kubernetes cluster autoscaling (scaling nodes up and down) and KEDA for application autoscaling can potentially save you thousands.
Aside from these three topics, thereʼs the implementation of hybrid cloud if you already have a data center and application modernization.
Thereʼs been some talk of cloud repatriation, which is the act of moving workloads from cloud-based environments back on-premises. For some organizations, this works. For others, itʼll be a headache. The long and short of it is if you don't already have a data center and the systems to run the data center, thereʼs no point to move off of the cloud.
However, if you have a data center and want to utilize it along with the cloud, you can utilize a hybrid cloud strategy. Hybrid cloud is when you have certain workloads running on-prem, certain workloads in the cloud, and there is connectivity between the two. For example, a database running on-prem and the application stack using the database is in the cloud.
Aside from security, compliance, and performance needs for running workloads on-prem, you could also save costs by using infrastructure that already exists. If you have unused servers and have the engineers in-house who can build a scalable application stack on those servers instead of spending the money in the cloud each month to do the same thing, thereʼs a big potential cost savings.
It turns out that running legacy applications in a cloud-native-centric world could potentially be quite pricey. Not just the infrastructure needed to run them, but the people needed to ensure that itʼs running as expected and the corners cut to continue running the applications due to the technology world moving ahead and the application stack staying behind.
Upfront costs of app modernization may be a bit higher, but overall, costs will go down later. For example, letʼs say you have a .NET app running on a few servers that canʼt move anywhere else because itʼs not optimized. If you converted that application to .NET Core and then ran it in, say, a serverless function, you could potentially have better performance, scalability, and optimization. There will be a heavy upfront cost to convert it from .NET to .NET Core and prepare it to run in a serverless function, but the end state will be cheaper.
All of these solutions are key to a successful cloud optimization strategy, but if hindsight is 20/20, it all really boils down to proper resource and cost optimization. Luckily, there are a lot of tools and products in this space that can help. Letʼs see how the three of them work.
A vendor that started early on in the resource optimization realm and has grown since is Cast.ai.
With Cast.ai, you can perform every aspect of resource and cost optimization.
Youʼll see options ranging from saving money to scaling workloads to implementing optimization for databases.
Within Codiac, thereʼs a feature called Zombie Mode which gives you zero-node capabilities.
Within your environments, youʼll see a little zombie icon.
When you open it up, youʼll see a Calendar-like UI.
You can then, from a graphical perspective, specify the days and times that you want zero nodes running in your Kubernetes cluster.
When it comes to scaling nodes in the realm of Kubernetes, Karpenter has been a de facto standard in the AWS space and is now available in preview for Azure.
Within a Kubernetes Manifest using the Karpenter Operator, you can specify what node types, instances, sizes, and limits are available to your cluster.
You can then implement the configuration to ensure that any nodes running in your cluster must follow these particular requirements. For example, per the NodePool above, the Kubernetes cluster youʼd run this configuration in cannot use
M class instance types in AWS. This allows you to control and specify how workloads can be run.
Ultimately, cost optimization in the cloud for 2025 will come down to understanding the workloads that are running within your environment, knowing the performance needs based on how many people or systems are accessing the application, and implementing proper resource optimization based on needed predictions and ensuring that if those predictions are updated if necessary.
If you’re looking for an easier way to manage Kubernetes workloads, try Codiac for free—no business email required to sign up, so you can avoid spam.