Field Report: Claude Did the Heavy Lifting

Spent a launch day recently shipping a high-volume consumer offer for a large brand. The kind of launch where an email goes out to a huge list and you hold your breath. A small team worked the whole thing alongside claude code (sessions!!), and what surprised us wasn't how much it could do. It was which parts got faster, and which parts stayed human.

What AI made almost trivial

The mechanical work just… vanished. Things that used to take days took minutes:

A live monitoring dashboard. Concurrency, throttles, latency, database load, the full conversion funnel, all wired up and on one screen. Including custom metrics pulled straight out of application logs.
A daily ops report. A clean HTML summary of everything from logins, registrations, redemptions, etc., generated on demand.
Infrastructure-as-code reconciliation. Several config changes I'd made live during the launch, packaged into a clean, reviewable patch for the team to merge later, complete with the finer comments spelled out.
A support-case writeup. A full, accurate technical writeup of incidents assembled from live system data, ready to send.

None of that is the interesting part, though. The interesting part was the investigation loop.

At one point a key number read zero when I expected it to be climbing. Normally that would need some amount of context-switching and log diving: chase the data through a half-dozen services, read the code, prove where it breaks. The AI ran the whole thread in a few minutes, traced the pipeline end to end, and showed me it was healthy. The “zero” was just an upstream delay, not a bug. That kind of fast, thorough, cross-system root-causing is the real unlock.

Where we had to step in

And yet. A few places where our decisions and judgment made the difference, and the agent needed strategic direction.

1. “Not yet.” We had a lever ready that would eliminate a latency problem, but it costs money and adds a moving part (Lambda provisioned concurrency). The AI had it built and was ready to flip it. We held off until we could see the real traffic patterns. Traffic remained manageable under the configurations, so we didn't need to flip that lever. Pulling that lever on a hunch would've been wasted money and unnecessary risk, especially on the night of a launch. That's a cost-vs-risk-vs-timing call, and it lives in the part of our brain that knows what “launch day” actually feels like.

2. “Accept it for now.” We found a real security gap. The clean fix was a code change plus a deploy. Instead of rushing it in, we made the call to document it and accept the risk for V1 rather than introduce the change before go-live, because the realistic threat was low and the change risk was not. The AI laid out the options and the trade-offs. But it could not decide how much risk was acceptable for this launch, with this timeline, for this business. That's not a technical question. It's a judgment one.

3. “That number is wrong.” Later in the day a report told us the people-to-purchases ratio was off. The product rule was strictly one-per-person. The AI presented the numbers cleanly and confidently. But it was wrong, so we insisted on digging in. It turned out the AI's own analysis had two separate flaws: a statistical function that was quietly undercounting, and a label that combined “button presses” with “items claimed.” The real answer was a clean 1:1.

A wrong figure would have slipped into a report that someone would actually read and trust. Although it's still acceptable as most business people also assume that these reports are AI generated and there will be some inconsistencies in the final figures.

The obvious pattern

The AI is extremely fast, tireless, and will hand you a confident answer for almost anything (Terraform, Python scripts, Lambda functions, service configurations, security loopholes, latency calculations, etc.). What doesn't automate is knowing when a confident answer is wrong. Knowing that a number is supposed to be 1:1. Knowing that “the fix exists” and “we should ship the fix tonight” are different scenarios. Knowing that the safest move is sometimes to do nothing and watch.

So the division of labor I keep landing on:

AI owns execution and investigation. Build it, pull it, trace it, prove it, write it up. It compresses hours into minutes and never gets tired or sloppy on the mechanical parts.
Humans own judgment, risk, and disbelief. What's worth doing, what's safe to do now, what to ship versus defer, and which confidently-presented number deserves a second look.

The AI assistance and tools are rapidly getting better at answering. Deciding which answers to trust, and which calls to make, is still the job.

And honestly, that's the fun part.

Field Report: Claude Did the Heavy Lifting

What AI made almost trivial

Where we had to step in

The obvious pattern

Clarity Over Confidence

My API Key Was the Back Door

What AI Isn't Disrupting

Liked this? Get the next one in your inbox.