Stop paying for test environments at night

date: Dec 15 2024 3 min read

Rightsizing once is not enough. Schedule instance types by hour and save 30-60% on non-production workloads.

.finops.aws.cloud.cost-optimization

Your staging environment runs on m5.xlarge. 24 hours a day. 7 days a week. Nobody uses it between 8pm and 8am. Nobody uses it on weekends

You’re paying for peak capacity 168 hours a week when you need it maybe 50

Rightsizing is not a one-time thing

Most teams rightsize once. They look at CPU and memory usage, pick an instance that fits, and move on. The instance stays fixed forever

But usage patterns change throughout the day. A test environment that needs 4 vCPUs during business hours needs 1 vCPU at night. An analytics cluster that processes data during the day sits idle until morning

Static rightsizing ignores time. You optimize for the peak and overpay for everything else

Schedule your instance types

Instead of one instance size, define two or three based on when people actually work:

TimeInstanceWhy
8am-8pm weekdaysm5.xlargeDev team active, CI running
8pm-8am weekdaysm5.mediumMaybe one person, minimal load
Weekendsm5.small or stoppedNobody working

A Lambda function runs at 8am, changes the instance type to xlarge. Another runs at 8pm, drops it to medium. A third stops it Friday night and starts it Monday morning

The automation

AWS Lambda with EventBridge schedule:

import boto3

ec2 = boto3.client('ec2')

def upsize(event, context):
    ec2.stop_instances(InstanceIds=['i-1234567890abcdef0'])
    waiter = ec2.get_waiter('instance_stopped')
    waiter.wait(InstanceIds=['i-1234567890abcdef0'])

    ec2.modify_instance_attribute(
        InstanceId='i-1234567890abcdef0',
        InstanceType={'Value': 'm5.xlarge'}
    )
    ec2.start_instances(InstanceIds=['i-1234567890abcdef0'])

def downsize(event, context):
    ec2.stop_instances(InstanceIds=['i-1234567890abcdef0'])
    waiter = ec2.get_waiter('instance_stopped')
    waiter.wait(InstanceIds=['i-1234567890abcdef0'])

    ec2.modify_instance_attribute(
        InstanceId='i-1234567890abcdef0',
        InstanceType={'Value': 'm5.medium'}
    )
    ec2.start_instances(InstanceIds=['i-1234567890abcdef0'])

EventBridge rules:

# Upsize at 8am weekdays
ScheduleExpression: "cron(0 8 ? * MON-FRI *)"

# Downsize at 8pm weekdays
ScheduleExpression: "cron(0 20 ? * MON-FRI *)"

For GCP, use Cloud Scheduler with Cloud Functions. For Azure, use Azure Automation runbooks

The math

m5.xlarge on-demand: ~$0.192/hour m5.medium on-demand: ~$0.048/hour

Running xlarge 24/7 for a month: $140

With scheduling:

  • 12h/day weekdays on xlarge: 60h × $0.192 = $11.52/week
  • 12h/day weekdays on medium: 60h × $0.048 = $2.88/week
  • Weekends stopped: $0

Monthly cost: ~$58

That’s 58% savings on one instance. Scale that to a test environment with 10 instances and you’re saving $800/month

Where this works

Test and staging environments: Nobody runs tests at 3am. Size down or stop entirely

Analytics and reporting: If your dashboards update during business hours, the cluster can shrink at night

ML training environments: Batch jobs run at scheduled times. No need for GPU instances sitting idle

Dev databases: Developers work 9-5. The database doesn’t need production capacity at midnight

Where this doesn’t work

Production. Obviously. If you have global users, there’s no “off hours”

Anything with persistent connections that can’t handle instance restarts

Databases with long startup times where the 2-minute restart window causes problems

Beyond instance types

Same principle applies to:

Auto Scaling groups: Change min/max capacity by schedule. 3 instances during the day, 1 at night

RDS instances: Schedule instance class changes. db.r5.large during hours, db.r5.medium at night

EKS node groups: Scale node count based on time. Or use Karpenter with time-based provisioner configs

Kubernetes requests/limits: Adjust HPA min replicas by schedule

Start simple

Pick one non-production environment. Add two Lambda functions. Run it for a month

Check your cost explorer. You’ll see the drop immediately

Then expand to other environments. Add weekend shutdowns. Add holiday schedules

The goal isn’t perfection. It’s stopping the obvious waste of paying for resources nobody uses

Enjoyed this article? Share it!

Sofiane Djerbi
Sofiane Djerbi

Cloud & Kubernetes Architect, FinOps Expert.

Comments