Cloud & Platform Engineering
−52% monthly AWS bill · ~$72k saved per year
A FinOps engagement that reduced a SaaS analytics platform's AWS bill from ~$11,500 to ~$5,500 per month — through right-sizing, non-prod automation, autoscaling, and Savings Plans. No product changes. No incidents. Permanent.
Challenge
The cloud bill was growing faster than the customer base. Instances were over-provisioned from an early growth phase, non-prod environments ran 24/7, and autoscaling was tuned to peak load rather than actual patterns. No one had done a systematic audit.
Approach
Four moves over six weeks: a full usage audit and right-sizing pass, automated shutdown schedules for non-prod, autoscaling tuned to real traffic patterns, and Savings Plans covering the stable compute baseline. Each move was validated against production metrics before proceeding.
Outcome
Monthly AWS spend dropped from ~$11,500 to ~$5,500 — a 52% reduction that held across subsequent months. Zero production incidents during the engagement. The savings are structural, not one-time: the right-sizing and autoscaling changes compound as usage grows.
The background
The client ran a SaaS analytics platform on AWS — EC2 for compute, RDS for the data layer, and an assortment of supporting services. The infrastructure had been provisioned aggressively during an early growth phase, with instance sizes chosen for anticipated load rather than measured usage. Over time, actual usage patterns had stabilised, but the provisioning hadn't followed. Non-production environments ran continuously because no one had set up shutdown schedules. Autoscaling groups were configured to the peak traffic their worst quarter had ever seen.
The result was a bill that tracked infrastructure age rather than business growth. The CTO had flagged it as a line item to fix but there had never been a structured effort — cost optimisation kept getting deprioritised against feature work. When the monthly bill crossed $11,000, the team decided to bring in outside help rather than try to fit it into the engineering backlog.
The challenge
What was built
Each move was sequenced to be independently safe — no change required another to be in place first, and each was validated against production metrics before the next began.
A full analysis of EC2, RDS, and supporting service usage using Cost Explorer, CloudWatch metrics, and AWS Compute Optimizer recommendations. Over-provisioned instances were identified, tested at smaller sizes in staging, and progressively resized in production. EC2 costs dropped 30% from this step alone. RDS instances followed a similar pattern — several had been provisioned for a multi-AZ active-active configuration that wasn't being used as designed.
Lambda functions triggered by EventBridge schedules shut down development and staging environments at 7 PM and restarted them at 8 AM on working days. Weekends went dark entirely unless a specific override was set. The shutdown logic was implemented as Terraform, applied to all non-prod accounts, and instrumented with CloudWatch alerts if an environment failed to shut down on schedule. This change alone recovered ~$1,200/month.
Autoscaling groups had been configured with minimum instance counts that reflected worst-case historical traffic. CloudWatch metrics from the previous 90 days were used to build a realistic traffic profile. Minimum counts were reduced to actual trough load, scale-out thresholds were recalibrated against measured p95 latency rather than CPU percentage alone, and scale-in cooldown periods were shortened. The fleet now tracked real demand rather than a fear-of-outage cushion.
After right-sizing and autoscaling were stable for two weeks, the consistent compute baseline — the capacity that ran at predictable levels regardless of traffic — was covered with a 1-year Compute Savings Plan. The commitment level was set conservatively at 80% of the measured stable baseline to preserve flexibility. The Savings Plan discount applied immediately across the matched usage, reducing the On-Demand component by 35% of what remained.
Where the savings came from
Results