The promise of cloud computing was simple: pay for what you use. Elastic scaling. No more buying hardware that sits idle. The reality for most companies? They're paying for 2-3x the resources they actually consume. Gartner estimates that 30% of cloud spend is wasted — and from what we see in practice, that number is conservative. The problem isn't the cloud itself. The problem is how it's configured, sized, and managed.
This article breaks down where cloud money disappears, the mistakes that compound costs, and the specific techniques that actually reduce spend without sacrificing performance or reliability.
Why Cloud Bills Spiral Out of Control
Cloud cost overruns don't happen overnight. They accumulate through a series of small, reasonable-sounding decisions that compound over months. Here's how it typically plays out:
- Provisioned for peak, running 24/7. Your application handles 500 concurrent users during business hours and 20 at 3 AM. But your infrastructure runs at peak capacity around the clock. That's 16 hours per day of paying for 25x more compute than you need.
- No auto-scaling configured. Fixed instance sizes mean you're always over-provisioned. A t3.xlarge running at 8% average CPU utilization is burning money. But nobody set up auto-scaling because "it's complicated" or "we'll get to it."
- Oversized instances "just in case." When in doubt, engineers pick the bigger instance. A 4-vCPU machine when 2 would suffice. 16 GB of RAM when the application uses 3 GB. This instinct is understandable — nobody wants to get paged at 2 AM — but it's expensive without data to back it up.
- Unused EBS volumes and snapshots accumulating. Every terminated instance can leave behind orphaned EBS volumes. Every snapshot taken "just in case" before a deployment stays forever if nobody cleans it up. We've seen accounts with 40+ TB of snapshots dating back years, costing hundreds per month for data nobody will ever restore.
- No reserved capacity planning. On-demand pricing is the most expensive way to run cloud infrastructure. It exists for burst capacity. Running baseline workloads on-demand is like renting a car daily instead of leasing — you pay a 40-60% premium for flexibility you're not using.
- Dev and staging at production specs. Your staging environment doesn't need a multi-AZ RDS deployment with 64 GB of RAM. Your dev environment doesn't need 3 Elasticsearch nodes. But they're often carbon copies of production because that's how the Terraform was written.
- No tagging strategy = no cost attribution. If you can't attribute costs to teams, projects, or environments, you can't manage them. "The cloud bill went up 20%" is useless. "The data pipeline team's staging Elasticsearch cluster costs €800/month and nobody's using it" is actionable.
Common Mistakes We See Repeatedly
Beyond the structural issues, there are specific technical mistakes that inflate cloud bills:
Assuming Bigger Instances = Better Performance
This is the most common misconception. A c5.4xlarge won't make your application faster if the bottleneck is a poorly optimized database query that full-scans a 50-million-row table. Throwing compute at an I/O-bound or code-level problem is the cloud equivalent of buying a faster car when you're stuck in traffic. As we covered in our guide to improving server performance, the bottleneck is rarely where you assume it is — profiling before scaling saves both time and money.
Not Analyzing Actual Utilization
Most cloud instances run at 10-30% average CPU utilization. That's not an opinion — that's what AWS's own data shows, and it matches what we observe across client accounts. If your m5.2xlarge averages 12% CPU and 4 GB of 32 GB RAM used, you're paying for an 8-vCPU machine to do the work of a 2-vCPU machine. CloudWatch, Datadog, or even basic sar output over 30 days will tell you exactly what you need.
Snapshot and Volume Hoarding
EBS snapshots are incremental, so engineers assume they're cheap. Individually, yes. But 200 daily snapshots across 15 volumes over 2 years? That adds up to terabytes of storage at $0.05/GB/month. Without lifecycle policies, this only grows. We routinely find $200-500/month in orphaned snapshots during audits.
Over-Tiered Managed Services
Running RDS on a db.r5.2xlarge with Multi-AZ and provisioned IOPS when your database is 20 GB and handles 50 queries per second? That's €1,200/month for a workload that would run fine on a db.t3.large at €150/month. The same applies to ElastiCache, OpenSearch, and every other managed service — start with the data, not with the tier.
Paying On-Demand for Predictable Workloads
If a workload has been running continuously for 6 months, it's predictable. Reserved Instances or Savings Plans save 40-60% over on-demand pricing for 1-3 year commitments. Even the no-upfront, 1-year convertible reserved instance option saves 30%. There is no reason to pay on-demand rates for baseline infrastructure.
What Actually Works
Cost optimization isn't about cutting corners. It's about matching resources to actual demand. Here's what produces real savings:
Right-Sizing Based on Data
Pull 30 days of CPU, memory, network, and disk utilization metrics. Not peaks — averages, P95, and P99. If your P99 CPU is 40% on a 4-vCPU instance, a 2-vCPU instance handles it with headroom. AWS Compute Optimizer and third-party tools like Spot.io, CloudHealth, or Vantage automate this analysis. The key: make decisions from data, not assumptions.
Auto-Scaling That Matches Real Traffic
Configure auto-scaling policies based on actual traffic patterns. If your application sees 80% of traffic between 8 AM and 6 PM, scale down aggressively overnight. Use target tracking policies on the metrics that matter — CPU, request count, or custom application metrics. A well-configured auto-scaling group can cut compute costs by 50-60% for workloads with clear peak/off-peak patterns.
Reserved Instances for Baseline + On-Demand for Peaks
Analyze your minimum sustained usage over the past 6-12 months. Buy reserved capacity for that baseline. Everything above it runs on-demand or spot. For example: if you always run at least 4 instances and burst to 12 during peaks, reserve the 4 and auto-scale the rest on-demand. The 4 reserved instances cost 40-60% less than on-demand.
Monthly Resource Cleanup
Schedule a monthly review to identify and remove: unattached EBS volumes, old snapshots beyond retention policy, unused Elastic IPs (they cost $3.65/month each when unattached), idle load balancers, and stale security groups pointing to terminated instances. Automate this with Lambda functions or use AWS Trusted Advisor.
Spot Instances for Batch Processing
Batch jobs, CI/CD pipelines, data processing, and any workload that can handle interruption should run on spot instances. Savings of 60-90% over on-demand. Use spot fleets with multiple instance types and availability zones to minimize interruption risk. We run all non-critical batch processing on spot — the savings are significant.
Separate Environment Sizing
Development environments don't need high availability. Staging doesn't need production-scale databases. Define environment tiers: production (full HA, right-sized), staging (single-AZ, smaller instances), development (minimal, shutdown overnight). This alone typically saves 30-40% of non-production costs.
Cloud Repatriation for Predictable Workloads
This is the option nobody talks about. For workloads with stable, predictable resource needs — databases, application servers, file storage — dedicated managed infrastructure can be 50-70% cheaper than equivalent cloud resources. The economics are straightforward: a dedicated server with 32 cores and 128 GB RAM costs a fraction of an equivalent cloud instance running 24/7. If you're considering this path, we've documented how to migrate hosting without downtime — the process is less disruptive than most teams expect.
Real-World Scenario: From €8,000 to €2,400
Here's a case study from a SaaS company we worked with. Their monthly cloud bill: €8,000. Monthly active users: ~5,000. The application was a standard three-tier architecture — web servers, API layer, PostgreSQL database, Redis cache, and an async job processor.
What we found during the audit:
- 3 application instances oversized by 4x. Running m5.2xlarge (8 vCPU, 32 GB RAM) with average utilization of 12% CPU and 4 GB RAM. Right-sized to t3.medium (2 vCPU, 4 GB RAM) with auto-scaling.
- 2 instances completely unused. A legacy staging deployment and a "temporary" analytics server from 8 months ago. Both terminated.
- No reserved capacity. Everything on-demand. Purchased 1-year savings plans for baseline compute.
- RDS on db.r5.xlarge Multi-AZ. Database was 15 GB. Downsized to db.t3.large single-AZ with automated backups (Multi-AZ is not a backup strategy for a 5,000-user SaaS — automated snapshots plus a promotion runbook is sufficient).
- 800 GB of orphaned EBS snapshots. Cleaned up and implemented lifecycle policies.
Result after cloud optimization: €3,200/month. Same performance, same reliability for the workload profile. Zero user-facing impact.
Phase two: migrated the predictable workloads — database, application servers, Redis — to managed dedicated infrastructure. Kept the cloud for auto-scaling burst capacity and the async job processor on spot instances.
Final result: €2,400/month total. Better performance (dedicated CPU vs. shared tenancy), predictable billing, and 70% reduction from the original spend.
The Hidden Cost: Engineering Time
There's a cost that doesn't appear on your cloud invoice: engineering time spent managing cloud complexity. Every hour your developers spend debugging IAM policies, analyzing cost allocation reports, configuring auto-scaling, or investigating why an instance type was deprecated is an hour not spent building product.
Cloud providers offer hundreds of services, thousands of configuration options, and a pricing model that requires a spreadsheet to understand. The cognitive overhead is real. A senior engineer spending 5-10 hours per month on cloud operations represents €2,000-4,000 in loaded cost — sometimes more than the infrastructure itself.
This is where a managed infrastructure partner changes the equation. Instead of your engineers learning the intricacies of reserved instance marketplace pricing or debugging VPC peering configurations, you get a predictable monthly cost and a team that optimizes continuously. Your engineers ship features. Infrastructure engineers handle infrastructure.
This applies to performance work too. When scaling web applications, the infrastructure decisions compound — getting them right from the start avoids expensive re-architecture later.
Start With an Audit
If your cloud bill surprises you every month, something is wrong with your architecture. Not catastrophically wrong — the kind of wrong that accumulates through hundreds of small decisions made without utilization data.
The fix starts with visibility: what are you running, what does it actually use, and what would the right-sized version cost? From there, the optimizations are straightforward — right-size, reserve, clean up, and evaluate whether cloud is even the right choice for every workload.
We run infrastructure audits that answer these questions with specifics, not generalities. No commitment, no sales pitch — just data on what you're spending and what you could be spending.
Request an infrastructure audit — we'll show you exactly where your cloud spend is going and what to do about it.