The Month-Six Bill

Every few weeks, the same thread shows up on r/aws. Different company, different stack, same story: the bill was fine through launch. Fine through month two. Then it tripled somewhere around month five. The team is scrambling — cutting features, switching service tiers, sometimes whispering about a rewrite.

We have seen this happen many times it's not a surprise anymore. And the story teams tell about it is almost always wrong. Cloud costs don't explode at the bill. They explode at the architecture. The bill is just the lagging indicator.

Why the bill arrives late

Most teams design for functionality and reliability first, with cost treated as a future check. That works for the first few months: when usage is low, growth is slow, the bill matches the napkin estimate. Then traffic grows, features ship, and one or two services start scaling in ways the original architecture didn't account for. The bill starts multiplying rapidly.

The story that gets told is “we grew faster than expected.” But the truer story is: the architecture you shipped quietly priced itself at this level on day one. You just hadn't been billed for it yet.

What scales the bill with usage

A short list of the amplifiers we see show up over and over:

Logs. CloudWatch retention defaults to never expire. Debug-level logging gets left on in production. By month five you're paying real money to store traces nobody reads. Fix: explicit retention per log group (7–30 days for most), INFO-level logging in prod, and a one-time log-volume review before launch.

Data transfer. Cross-AZ traffic, cross-region replication, NAT gateway egress. The cost of moving data inside your own VPC is rarely in the napkin estimate. Fix: VPC endpoints for AWS services, careful AZ pinning for chatty services, and a NAT gateway data-processing review every month.

Per-MAU pricing. Cognito is the most cited example: above its free tier — 50,000 MAUs for older accounts, 10,000 for newer ones — you pay per monthly active user. The rate is $0.015/MAU in the entry bracket and steps down at higher volumes. Trivial at 1,000 users, meaningful at 100,000, a real line item at a million. Fix: do the math against your projected user count before picking the service. Cognito is the right answer for many apps. It's the wrong answer for a few that don't realize until later.

Multi-AZ databases. RDS Multi-AZ doubles your compute cost — that's the feature. Often correct for production. Almost always wrong for staging and dev. Fix: per-environment provisioning, not one-size-fits-all.

High-volume DynamoDB. On-demand is the right default until your read/write pattern stabilizes. Above a certain threshold, provisioned with auto-scaling is meaningfully cheaper. Fix: measure access patterns at month two, switch modes if the math warrants.

Compute autoscaling. Lambda is cheap until you have 40 scheduled jobs running every minute, plus queue invocations, plus user traffic. ECS and EKS scaling out under load are doing what they should — but each new task is another full-cost compute unit. Fix: actually count invocations from non-user sources every month. Set ceilings on autoscaling. Consider Fargate Spot for non-critical workloads.

The wrong calls that lock in a higher bill

These are root-cause architectural mistakes — choices that quietly price your system at a higher level before the first user signs up.

SaaS on plan pricing instead of consumption. Plan-based services have cliffs that hit suddenly. Supabase steps from Pro ($25/month) to Team ($599/month) — a 24× jump for the project that outgrew the smaller tier but doesn't yet need the larger one. Vercel's Pro plan starts at $20/seat/month plus usage-based bandwidth and compute overages, then adds $300/month the moment your team needs SAML SSO. Both work well at small scale; both can produce a sudden monthly run-rate increase the day you cross a feature or volume line. Fix: read the pricing page like the contract it is — especially the add-ons section and the “starting at” footnote.

VMs and DBs provisioned for peak. Reserving capacity for the traffic you might have in eighteen months means paying for that capacity today. On-demand and serverless tiers are designed for exactly this case. Commit to capacity once you have data, not before.

VM/Containers for static SPAs. Hosting a React or Next.js SPA on EC2 or ECS Fargate is a real pattern, and it's almost always wrong. The site has no server-side state to manage. CloudFront + S3 will serve the same content for cents. Use it.

Wrong compute service for the workload shape. Lambda for a 24/7 long-running process. EC2 for a job that fires four times a day. The compute service should match the workload, not whatever option the team is most familiar with.

Pay-as-go AI for high-volume, predictable workloads. Pay-per-token services (OpenAI, Anthropic, Bedrock, OpenRouter) are the right default at low or unpredictable volume. They become the wrong default once your AI workload is high and stable — beyond a certain threshold, hosting your own model (open-weight inference on GPU instances, or dedicated provisioned capacity) becomes dramatically cheaper per token. The break-even varies by model and workflow, but every team running serious AI should know roughly where theirs sits. Sticking with pay-as-go because it's familiar is the architectural mistake, not the math.

Designing against the bill you haven't seen yet

Four habits separate teams that survive month five from teams that don't.

Run a cost model alongside the architecture diagram, not after. When you draw a new service into the system, you draw a rough cost line item with it. Even a basic number is fine. The exercise forces honest scoping at design time, when changes are still cheap.

Build per-workload cost observability before launch. Tag every resource with the workload it belongs to. Set up cost-explorer dashboards filtered by tag. We made the broader case for per-invocation attribution in The Month-Three Moment — same discipline, applied to infrastructure rather than AI calls.

Sanity-check growth assumptions monthly. The rough estimates were based on assumptions you made before any real users. Five months in, you have actual data. Compare. Adjust before the bill, not after.

Know your AI cost per transaction. Everyone wants to ship AI agents now. Almost no one can tell you what the AI bill is per request — what the average user's AI footprint costs, or what an active power user does to the monthly run rate. That gap is how a month-five surprise becomes a month-three crisis when AI is in the loop. Pick a representative workflow, measure the token spend per invocation, and have a number before you ship. We made the broader cost case in Tokens Are Not the Metric; this is the architecture-time version of the same discipline.

None of this is technically hard. It's just discipline most teams skip because the bill hasn't hurt yet.

The bill is a lagging indicator

By the time it surprises you, the choices that produced it are several months old. The cheap fixes — different service tier, different region, different auth provider — are still possible but increasingly expensive to make. The expensive fixes — rebuild on different primitives — cost a month of work (unless you want to beat the ** out of your AI agent to do over a weekend) and a feature freeze.

The architecture set the bill. Make sure you set the architecture.